From: Shakeel Butt <shakeel.butt@linux.dev>
To: YoungJun Park <youngjun.park@lge.com>
Cc: Yosry Ahmed <yosry@kernel.org>, Hao Jia <jiahao.kernel@gmail.com>,
Johannes Weiner <hannes@cmpxchg.org>,
mhocko@kernel.org, tj@kernel.org, mkoutny@suse.com,
roman.gushchin@linux.dev, Nhat Pham <nphamcs@gmail.com>,
akpm@linux-foundation.org, chengming.zhou@linux.dev,
muchun.song@linux.dev, cgroups@vger.kernel.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, Hao Jia <jiahao1@lixiang.com>,
chrisl@kernel.org, kasong@tencent.com, baoquan.he@linux.dev,
joshua.hahnjy@gmail.com
Subject: [swap tier discussion] Re: [PATCH v3 2/4] mm/zswap: Implement proactive writeback
Date: Fri, 12 Jun 2026 10:02:32 -0700 [thread overview]
Message-ID: <aiw2p5ANjsQUCIHA@linux.dev> (raw)
In-Reply-To: <aiu06fbV7rWqY0Bm@yjaykim-PowerEdge-T330>
Changed the subject to separate the discussion on swap tiers.
On Fri, Jun 12, 2026 at 04:27:37PM +0900, YoungJun Park wrote:
> On Thu, Jun 11, 2026 at 12:12:40PM -0700, Shakeel Butt wrote:
> > On Thu, Jun 11, 2026 at 05:45:04PM +0000, Yosry Ahmed wrote:
> > > On Tue, Jun 09, 2026 at 01:19:13PM +0900, YoungJun Park wrote:
> > > > On Mon, Jun 08, 2026 at 03:27:07PM -0700, Yosry Ahmed wrote:
> > > >
> > > > +Chris +Kairui +Baoquan
> > > >
> > > > Hello
> > > >
> > > > Thanks for inviting me to the discussion, Shakeel.
> > > >
> > > > > > > > Youngjun is working on swap tiers. At the moment he is more interested in
> > > > > > > > allowing a specific swap device to a memcg or not. I can imagine in future there
> > > > > > > > will be use-cases where there will be a need to demote data on higher tier swap
> > > > > > > > to lower tier swap. What would be the appropriate interface?
> > > >
> > > > Speaking of my work on swap tiers, I recently submitted a patch and am
> > > > currently considering memcg integration:
> > > > https://lore.kernel.org/linux-mm/20260527062247.3440692-1-youngjun.park@lge.com/
> > > >
> > > > The future use-cases imagined above seem to align with this
> > > > direction. (BTW, I am currently waiting for reviews/feedback from the memcg
> > > > folks on this patch. Any reviews would be highly appreciated!)
> > > >
> > > > We could potentially assign a target tier
> > > > for writeback within the existing memory.zswap.writeback interface.
> > > >
> > > > For instance, '0' could mean disabled, while non-zero values could represent
> > > > specific tiers, which would maintain backward compatibility with the current
> > > > version. Alternatively, if zswap is treated as the default top tier,
> > > > the `memory.swap.tiers` interface could potentially replace `memory.zswap.writeback`.
> > > >
> > > > Furthermore, this could be expanded so that each swap tier can demote data
> > > > user-triggered demotion between swap tiers.
> > > >
> > > > Based on the current patch's ideas combined with my swap tiers concept:
> > > >
> > > > Assuming a hierarchy like:
> > > > zswap -> tier1 (SSD swap) -> tier2 (HDD swap) -> tier3 (Network swap)
> > > >
> > > > We could configure the active tiers via a setting like `memory.swap.tiers`
> > > > (tier2 enabled, tier3 enabled).
> > > >
> > > > For example, the concept of `echo "100M zswap_writeback_only > memory.reclaim"`
> > > > could be extended. A user could run `echo "100M tier2 > memory.reclaim"`
> > > > to explicitly trigger demotion from tier2 to tier3.
> > > > (BTW, if we combine these features, my personal preference for the keyword
> > > > format would be `<size> <demote_prefix><tier_name>`. I think it would be
> > > > better to explicitly indicate that it is a swap demotion by using a specific
> > > > prefix followed by the tier name.
> > > > Or make demote prefix another key is also possible)
> > >
> > > I am not sure if proactive demotion between swap tiers would be driven
> > > by memory.reclaim, I am guessing a new interface might be more suitable.
> > > But yes, you are right that it's very possible that
> > > 'zswap_writeback_only' with memory.reclaim will become obsolete once
> > > swap tiering matures and starts supporting things like proactive
> > > demotion.
> > >
> > > Part of me wants to wait until the swap tiering interfaces are figured
> > > out so that we don't end up with redundant interfaces, but I also don't
> > > want to hold Hao's work since it doesn't directly depend on swap
> > > tiering.
> > However I would need zswap folks (Yosry & Nhat) help in figuring out swap tiers
> > interfaces. Zswap is the current top tier swap usage in real world. I want
> > zswap users to eaily (and hopefully transparently) migrate to swap tiers.
>
> > > Shakeel, how do you want to handle this? I think there's a few options:
> > >
> > > 1. Add zswap_writeback_only now, and when we have swap tiering demotion
> > > it becomes a redundant interface, like memory.zswap.writeback -- or
> > > maybe we try to deprecate both of them at that point. It's difficult to
> > > remove interfaces tho, but maybe easier to stop supporting
> > > zswap_writeback_only.
> > >
> > > 2. Add zswap_writeback_only behind an experimental config option, to
> > > unblock development but have a line of sight to dropping support once we
> > > have a swap tiering interface.
> > >
> > > 3. Wait until we figure out the swap tiering interfaces and then add
> > > the proactive zswap writeback as part of it.
> > >
> > > WDYT?
> >
> > Is Hao's work needed for some followup work/development? The earliest Hao's
> > work can is 7.3, so if we aim to figure out swap tiering interfaces in next
> > couple of weeks then option 3 is the way to go. If swap tiers take more time
> > then we can discuss other options as well.
> > However I would need zswap folks (Yosry & Nhat) help in figuring out swap tiers
> > interfaces. Zswap is the current top tier swap usage in real world. I want
> > zswap users to eaily (and hopefully transparently) migrate to swap tiers.
>
> I am looking forward to the discussion on this interface!
>
> To help boost the discussion and progress, I would like to share a few of my thoughts.
> We could either introduce a new interface to trigger demotion/promotion,
> or we could reuse the existing one (using tier just internally)
>
> Based on the memcg interface currently proposed in swap_tier
> (memory.swap.tiers, memory.swap.tiers.effective), I think it aligns well
> with the current direction. It provides a foundation for selectively
> targeting devices in tier order.
Here instead of cpuset like interface, we may want more zswap like interface
where you can put limit on the usage i.e. memory.swap.tier*.max. We can start
with allowing only two values i.e. 0 and max which effectively will be the
same as what you need.
I will respond to your other points later when I have time.
>
> To summarize the discussions so far, the following points align well.
>
> - Per-cgroup swap control, as I suggested.
> - Proactive zswap writeback (Hao's usecase)
> - Swap device target demotion(if it wants selective, then it is more better), as you mentioned:
> https://lore.kernel.org/linux-mm/aicZ-5GX9De3MAU7@linux.dev/
> - Virtual Swap on/off in the future, as Nhat mentioned:
> https://lore.kernel.org/linux-mm/20260528212955.1912856-1-nphamcs@gmail.com/
> - The memory.zswap.writeback alternative (no hierarchy model conflict)
> - zswap is first swap tier.
> - Promotion. (Also better for selectve usage)
> - tier based swap policy (e.g round-robin...)
>
> To accelerate this work, I believe we should reach a consensus and
> merge the currently proposed swap_tier interface :)
>
> If the above approach is difficult, I would like to suggest an
> alternative for progress with the memcg interfaces removed:
>
> 1) We could make zswap the first tier and create
> a use case where memory.zswap.writeback internally is handled by tier logic.
>
> 2) Or simply merge the swap_tier infrastructure itself first.
>
> This would allow the swap_tier infrastructure to be merged and discussed
> more easily.
>
> If it takes longer to adopt swap_tier anyway, by doing so we progress next step
> as a experimental feature.
>
> - Apply per-cgroup swap as an experimental (debugfs) feature.
> - Apply Hao's use case experimentally or as it is as Yosry suggested.
> (future migration to swap tier)
>
> How do you think?
>
> (FYI: My emails to kernel.org are failing due to internal server issues.)
>
> Thank you
> Youngjun Park
next prev parent reply other threads:[~2026-06-12 17:02 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-26 11:45 [PATCH v3 0/4] mm/zswap: Implement per-cgroup proactive writeback Hao Jia
2026-05-26 11:45 ` [PATCH v3 1/4] mm/zswap: Make shrink_worker writeback cursor per-memcg Hao Jia
2026-05-29 19:51 ` Nhat Pham
2026-05-30 1:24 ` Yosry Ahmed
2026-06-01 11:07 ` Hao Jia
2026-06-01 16:44 ` Nhat Pham
2026-06-01 16:47 ` Nhat Pham
2026-06-01 17:08 ` Nhat Pham
2026-06-02 11:32 ` Hao Jia
2026-06-02 0:31 ` Yosry Ahmed
2026-06-02 11:33 ` Hao Jia
2026-06-02 23:19 ` Yosry Ahmed
2026-06-03 3:02 ` Hao Jia
2026-06-03 17:53 ` Yosry Ahmed
2026-06-04 1:58 ` Hao Jia
2026-06-04 5:34 ` Yosry Ahmed
2026-06-04 13:06 ` Hao Jia
2026-06-04 16:10 ` Yosry Ahmed
2026-06-04 17:23 ` Nhat Pham
2026-06-08 12:50 ` Hao Jia
2026-06-08 16:23 ` Nhat Pham
2026-06-08 16:44 ` Yosry Ahmed
2026-06-08 16:48 ` Yosry Ahmed
2026-06-08 18:01 ` Nhat Pham
2026-06-09 3:18 ` Hao Jia
2026-06-11 17:39 ` Yosry Ahmed
2026-06-12 16:40 ` Shakeel Butt
2026-06-12 18:15 ` Yosry Ahmed
2026-05-26 11:45 ` [PATCH v3 2/4] mm/zswap: Implement proactive writeback Hao Jia
2026-05-29 19:58 ` Nhat Pham
2026-05-30 1:40 ` Yosry Ahmed
2026-06-03 11:22 ` Hao Jia
2026-06-03 17:58 ` Yosry Ahmed
2026-06-03 18:14 ` Nhat Pham
2026-06-04 2:11 ` Hao Jia
2026-06-04 5:36 ` Yosry Ahmed
2026-06-04 14:01 ` Shakeel Butt
2026-06-08 18:30 ` Shakeel Butt
2026-06-08 19:50 ` Shakeel Butt
2026-06-08 20:19 ` Yosry Ahmed
2026-06-08 22:22 ` Shakeel Butt
2026-06-08 22:27 ` Yosry Ahmed
2026-06-09 4:19 ` YoungJun Park
2026-06-11 17:45 ` Yosry Ahmed
2026-06-11 19:12 ` Shakeel Butt
2026-06-12 7:27 ` YoungJun Park
2026-06-12 17:02 ` Shakeel Butt [this message]
2026-06-12 21:31 ` [swap tier discussion] " Yosry Ahmed
2026-05-30 1:37 ` Yosry Ahmed
2026-06-03 11:27 ` Hao Jia
2026-06-03 17:55 ` Yosry Ahmed
2026-06-03 18:23 ` Nhat Pham
2026-06-03 18:26 ` Yosry Ahmed
2026-06-03 18:34 ` Nhat Pham
2026-06-03 18:43 ` Yosry Ahmed
2026-06-03 18:51 ` Nhat Pham
2026-06-03 18:54 ` Yosry Ahmed
2026-05-26 11:46 ` [PATCH v3 3/4] mm/zswap: Add per-memcg stat for " Hao Jia
2026-05-29 20:01 ` Nhat Pham
2026-06-03 11:29 ` Hao Jia
2026-05-26 11:46 ` [PATCH v3 4/4] selftests/cgroup: Add tests for zswap " Hao Jia
2026-05-29 20:02 ` Nhat Pham
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aiw2p5ANjsQUCIHA@linux.dev \
--to=shakeel.butt@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=baoquan.he@linux.dev \
--cc=cgroups@vger.kernel.org \
--cc=chengming.zhou@linux.dev \
--cc=chrisl@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=jiahao.kernel@gmail.com \
--cc=jiahao1@lixiang.com \
--cc=joshua.hahnjy@gmail.com \
--cc=kasong@tencent.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=tj@kernel.org \
--cc=yosry@kernel.org \
--cc=youngjun.park@lge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.