Linux Documentation
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeel.butt@linux.dev>
To: YoungJun Park <youngjun.park@lge.com>
Cc: Yosry Ahmed <yosry@kernel.org>, Hao Jia <jiahao.kernel@gmail.com>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	mhocko@kernel.org, tj@kernel.org, mkoutny@suse.com,
	 roman.gushchin@linux.dev, Nhat Pham <nphamcs@gmail.com>,
	akpm@linux-foundation.org,  chengming.zhou@linux.dev,
	muchun.song@linux.dev, cgroups@vger.kernel.org,
	 linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org,  Hao Jia <jiahao1@lixiang.com>,
	chrisl@kernel.org, kasong@tencent.com, baoquan.he@linux.dev,
	 joshua.hahnjy@gmail.com
Subject: [swap tier discussion] Re: [PATCH v3 2/4] mm/zswap: Implement proactive writeback
Date: Fri, 12 Jun 2026 10:02:32 -0700	[thread overview]
Message-ID: <aiw2p5ANjsQUCIHA@linux.dev> (raw)
In-Reply-To: <aiu06fbV7rWqY0Bm@yjaykim-PowerEdge-T330>

Changed the subject to separate the discussion on swap tiers.

On Fri, Jun 12, 2026 at 04:27:37PM +0900, YoungJun Park wrote:
> On Thu, Jun 11, 2026 at 12:12:40PM -0700, Shakeel Butt wrote:
> > On Thu, Jun 11, 2026 at 05:45:04PM +0000, Yosry Ahmed wrote:
> > > On Tue, Jun 09, 2026 at 01:19:13PM +0900, YoungJun Park wrote:
> > > > On Mon, Jun 08, 2026 at 03:27:07PM -0700, Yosry Ahmed wrote:
> > > > 
> > > > +Chris +Kairui +Baoquan
> > > > 
> > > > Hello
> > > > 
> > > > Thanks for inviting me to the discussion, Shakeel.
> > > > 
> > > > > > > > Youngjun is working on swap tiers. At the moment he is more interested in
> > > > > > > > allowing a specific swap device to a memcg or not. I can imagine in future there
> > > > > > > > will be use-cases where there will be a need to demote data on higher tier swap
> > > > > > > > to lower tier swap. What would be the appropriate interface?
> > > > 
> > > > Speaking of my work on swap tiers, I recently submitted a patch and am
> > > > currently considering memcg integration:
> > > > https://lore.kernel.org/linux-mm/20260527062247.3440692-1-youngjun.park@lge.com/
> > > > 
> > > > The future use-cases imagined above seem to align with this
> > > > direction. (BTW, I am currently waiting for reviews/feedback from the memcg
> > > > folks on this patch. Any reviews would be highly appreciated!)
> > > > 
> > > > We could potentially assign a target tier
> > > > for writeback within the existing memory.zswap.writeback interface. 
> > > > 
> > > > For instance, '0' could mean disabled, while non-zero values could represent
> > > > specific tiers, which would maintain backward compatibility with the current
> > > > version. Alternatively, if zswap is treated as the default top tier, 
> > > > the `memory.swap.tiers` interface could potentially replace `memory.zswap.writeback`.
> > > > 
> > > > Furthermore, this could be expanded so that each swap tier can demote data
> > > > user-triggered demotion between swap tiers.
> > > > 
> > > > Based on the current patch's ideas combined with my swap tiers concept:
> > > > 
> > > > Assuming a hierarchy like:
> > > > zswap -> tier1 (SSD swap) -> tier2 (HDD swap) -> tier3 (Network swap)
> > > > 
> > > > We could configure the active tiers via a setting like `memory.swap.tiers`
> > > > (tier2 enabled, tier3 enabled).
> > > > 
> > > > For example, the concept of `echo "100M zswap_writeback_only > memory.reclaim"`
> > > > could be extended. A user could run `echo "100M tier2 > memory.reclaim"`
> > > > to explicitly trigger demotion from tier2 to tier3.
> > > > (BTW, if we combine these features, my personal preference for the keyword
> > > > format would be `<size> <demote_prefix><tier_name>`. I think it would be
> > > > better to explicitly indicate that it is a swap demotion by using a specific
> > > > prefix followed by the tier name. 
> > > > Or make demote prefix another key is also possible)
> > > 
> > > I am not sure if proactive demotion between swap tiers would be driven
> > > by memory.reclaim, I am guessing a new interface might be more suitable.
> > > But yes, you are right that it's very possible that
> > > 'zswap_writeback_only' with memory.reclaim will become obsolete once
> > > swap tiering matures and starts supporting things like proactive
> > > demotion.
> > > 
> > > Part of me wants to wait until the swap tiering interfaces are figured
> > > out so that we don't end up with redundant interfaces, but I also don't
> > > want to hold Hao's work since it doesn't directly depend on swap
> > > tiering.
> > However I would need zswap folks (Yosry & Nhat) help in figuring out swap tiers
> > interfaces. Zswap is the current top tier swap usage in real world. I want
> > zswap users to eaily (and hopefully transparently) migrate to swap tiers.
> 
> > > Shakeel, how do you want to handle this? I think there's a few options:
> > > 
> > > 1. Add zswap_writeback_only now, and when we have swap tiering demotion
> > > it becomes a redundant interface, like memory.zswap.writeback -- or
> > > maybe we try to deprecate both of them at that point. It's difficult to
> > > remove interfaces tho, but maybe easier to stop supporting
> > > zswap_writeback_only.
> > > 
> > > 2. Add zswap_writeback_only behind an experimental config option, to
> > > unblock development but have a line of sight to dropping support once we
> > > have a swap tiering interface.
> > > 
> > > 3. Wait until we figure out the swap tiering interfaces and then add
> > > the proactive zswap writeback as part of it.
> > > 
> > > WDYT?
> > 
> > Is Hao's work needed for some followup work/development? The earliest Hao's
> > work can is 7.3, so if we aim to figure out swap tiering interfaces in next
> > couple of weeks then option 3 is the way to go. If swap tiers take more time
> > then we can discuss other options as well.
> > However I would need zswap folks (Yosry & Nhat) help in figuring out swap tiers
> > interfaces. Zswap is the current top tier swap usage in real world. I want
> > zswap users to eaily (and hopefully transparently) migrate to swap tiers.
> 
> I am looking forward to the discussion on this interface!
> 
> To help boost the discussion and progress, I would like to share a few of my thoughts.
> We could either introduce a new interface to trigger demotion/promotion,
> or we could reuse the existing one (using tier just internally)
> 
> Based on the memcg interface currently proposed in swap_tier
> (memory.swap.tiers, memory.swap.tiers.effective), I think it aligns well
> with the current direction. It provides a foundation for selectively
> targeting devices in tier order.

Here instead of cpuset like interface, we may want more zswap like interface
where you can put limit on the usage i.e. memory.swap.tier*.max. We can start
with allowing only two values i.e. 0 and max which effectively will be the
same as what you need.

I will respond to your other points later when I have time.

> 
> To summarize the discussions so far, the following points align well.
> 
> - Per-cgroup swap control, as I suggested.
> - Proactive zswap writeback (Hao's usecase)
> - Swap device target demotion(if it wants selective, then it is more better), as you mentioned:
>   https://lore.kernel.org/linux-mm/aicZ-5GX9De3MAU7@linux.dev/
> - Virtual Swap on/off in the future, as Nhat mentioned:
>   https://lore.kernel.org/linux-mm/20260528212955.1912856-1-nphamcs@gmail.com/
> - The memory.zswap.writeback alternative (no hierarchy model conflict)
> - zswap is first swap tier.
> - Promotion. (Also better for selectve usage)
> - tier based swap policy (e.g round-robin...)
> 
> To accelerate this work, I believe we should reach a consensus and
> merge the currently proposed swap_tier interface :)
> 
> If the above approach is difficult, I would like to suggest an
> alternative for progress with the memcg interfaces removed:
> 
> 1) We could make zswap the first tier and create
> a use case where memory.zswap.writeback internally is handled by tier logic.
> 
> 2) Or simply merge the swap_tier infrastructure itself first.
> 
> This would allow the swap_tier infrastructure to be merged and discussed
> more easily.
> 
> If it takes longer to adopt swap_tier anyway, by doing so we progress next step
> as a experimental feature.
> 
> - Apply per-cgroup swap as an experimental (debugfs) feature.
> - Apply Hao's use case experimentally or as it is as Yosry suggested.
> (future migration to swap tier)
> 
> How do you think?
> 
> (FYI: My emails to kernel.org are failing due to internal server issues.)
> 
> Thank you 
> Youngjun Park

  reply	other threads:[~2026-06-12 17:02 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-26 11:45 [PATCH v3 0/4] mm/zswap: Implement per-cgroup proactive writeback Hao Jia
2026-05-26 11:45 ` [PATCH v3 1/4] mm/zswap: Make shrink_worker writeback cursor per-memcg Hao Jia
2026-05-29 19:51   ` Nhat Pham
2026-05-30  1:24   ` Yosry Ahmed
2026-06-01 11:07     ` Hao Jia
2026-06-01 16:44       ` Nhat Pham
2026-06-01 16:47         ` Nhat Pham
2026-06-01 17:08       ` Nhat Pham
2026-06-02 11:32         ` Hao Jia
2026-06-02  0:31       ` Yosry Ahmed
2026-06-02 11:33         ` Hao Jia
2026-06-02 23:19           ` Yosry Ahmed
2026-06-03  3:02             ` Hao Jia
2026-06-03 17:53               ` Yosry Ahmed
2026-06-04  1:58                 ` Hao Jia
2026-06-04  5:34                   ` Yosry Ahmed
2026-06-04 13:06                     ` Hao Jia
2026-06-04 16:10                       ` Yosry Ahmed
2026-06-04 17:23                       ` Nhat Pham
2026-06-08 12:50                         ` Hao Jia
2026-06-08 16:23                           ` Nhat Pham
2026-06-08 16:44                             ` Yosry Ahmed
2026-06-08 16:48                             ` Yosry Ahmed
2026-06-08 18:01                               ` Nhat Pham
2026-06-09  3:18                                 ` Hao Jia
2026-06-11 17:39                                   ` Yosry Ahmed
2026-06-12 16:40                                     ` Shakeel Butt
2026-06-12 18:15                                       ` Yosry Ahmed
2026-05-26 11:45 ` [PATCH v3 2/4] mm/zswap: Implement proactive writeback Hao Jia
2026-05-29 19:58   ` Nhat Pham
2026-05-30  1:40     ` Yosry Ahmed
2026-06-03 11:22       ` Hao Jia
2026-06-03 17:58         ` Yosry Ahmed
2026-06-03 18:14           ` Nhat Pham
2026-06-04  2:11             ` Hao Jia
2026-06-04  5:36               ` Yosry Ahmed
2026-06-04 14:01                 ` Shakeel Butt
2026-06-08 18:30                 ` Shakeel Butt
2026-06-08 19:50                   ` Shakeel Butt
2026-06-08 20:19                     ` Yosry Ahmed
2026-06-08 22:22                       ` Shakeel Butt
2026-06-08 22:27                         ` Yosry Ahmed
2026-06-09  4:19                           ` YoungJun Park
2026-06-11 17:45                             ` Yosry Ahmed
2026-06-11 19:12                               ` Shakeel Butt
2026-06-12  7:27                                 ` YoungJun Park
2026-06-12 17:02                                   ` Shakeel Butt [this message]
2026-06-12 21:31                                     ` [swap tier discussion] " Yosry Ahmed
2026-05-30  1:37   ` Yosry Ahmed
2026-06-03 11:27     ` Hao Jia
2026-06-03 17:55       ` Yosry Ahmed
2026-06-03 18:23       ` Nhat Pham
2026-06-03 18:26         ` Yosry Ahmed
2026-06-03 18:34           ` Nhat Pham
2026-06-03 18:43             ` Yosry Ahmed
2026-06-03 18:51               ` Nhat Pham
2026-06-03 18:54                 ` Yosry Ahmed
2026-05-26 11:46 ` [PATCH v3 3/4] mm/zswap: Add per-memcg stat for " Hao Jia
2026-05-29 20:01   ` Nhat Pham
2026-06-03 11:29     ` Hao Jia
2026-05-26 11:46 ` [PATCH v3 4/4] selftests/cgroup: Add tests for zswap " Hao Jia
2026-05-29 20:02   ` Nhat Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiw2p5ANjsQUCIHA@linux.dev \
    --to=shakeel.butt@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=baoquan.he@linux.dev \
    --cc=cgroups@vger.kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=jiahao.kernel@gmail.com \
    --cc=jiahao1@lixiang.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kasong@tencent.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=roman.gushchin@linux.dev \
    --cc=tj@kernel.org \
    --cc=yosry@kernel.org \
    --cc=youngjun.park@lge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox