Linux Documentation
 help / color / mirror / Atom feed
From: YoungJun Park <youngjun.park@lge.com>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Yosry Ahmed <yosry@kernel.org>, Hao Jia <jiahao.kernel@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	mhocko@kernel.org, tj@kernel.org, mkoutny@suse.com,
	roman.gushchin@linux.dev, Nhat Pham <nphamcs@gmail.com>,
	akpm@linux-foundation.org, chengming.zhou@linux.dev,
	muchun.song@linux.dev, cgroups@vger.kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, Hao Jia <jiahao1@lixiang.com>,
	chrisl@kernel.org, kasong@tencent.com, baoquan.he@linux.dev,
	joshua.hahnjy@gmail.com
Subject: Re: [swap tier discussion] Re: [PATCH v3 2/4] mm/zswap: Implement proactive writeback
Date: Sun, 14 Jun 2026 18:23:03 +0900	[thread overview]
Message-ID: <ai5y923elCSZp41j@yjaykim-PowerEdge-T330> (raw)
In-Reply-To: <aiw2p5ANjsQUCIHA@linux.dev>

....
> >Based on the memcg interface currently proposed in swap_tier
> > (memory.swap.tiers, memory.swap.tiers.effective), I think it aligns well
> > with the current direction. It provides a foundation for selectively
> > targeting devices in tier order.
> 
> Here instead of cpuset like interface, we may want more zswap like interface
> where you can put limit on the usage i.e. memory.swap.tier*.max. We can start
> with allowing only two values i.e. 0 and max which effectively will be the
> same as what you need.
>

Good idea, and it's certainly feasible. When I considered this a while
ago, the reasons I didn't take this direction were:

1. There's no real-world usage for adjusting the swap tier amount (it's
   either 0 or MAX). That said, your suggestion to initially allow only
   0 and max is the killing point, and it's making me reconsider.

2. The implementation cost seems high. The current implementation
   handles this at runtime via simple masking.

3. Relationship with swap.max:
   - If we tie it to the current interface, wouldn't limiting the swap
     amount within a selected tier already be possible? I wonder if
     that alone is enough.
   - If we add tier.max, it would need to be a subset of swap.max.
     (Any other complexities here?)

4. vswap enable/disable: vswap doesn't seem to have an amount-control
   aspect, so an on/off semantic would be clearer.
   https://lore.kernel.org/linux-mm/ai5kOOmR1LPTWs1J@yjaykim-PowerEdge-T330/T/#m8831ec057bf9387978d3bd698f51920600e09a04

In that case, the internal logic could stay roughly the same rather
than counting via a page counter. Something like:

1. Change the interface shell: tier.*.max — allow only 0 ~ max.
2. Keep the internal logic as is: 0 disables the mask (child memcgs
   off too), max enables it (child memcgs on too).
3. memory.zswap.max integrates naturally (it's memory."tier_name".max).
4. Extend later if use cases arise.

On balance I still lean toward the current interface, but if a per-tier
max is the better fit for memcg's direction and others feel the same,
I'm happy to switch. I'd like to hear Shakeel's thoughts again, and I'm
curious about others' opinions too.

A few more perspectives on the points below.

> I will respond to your other points later when I have time.

> > 
> > To summarize the discussions so far, the following points align well.
> > 
> > - Per-cgroup swap control, as I suggested.
> > - Proactive zswap writeback (Hao's usecase)
> > - Swap device target demotion(if it wants selective, then it is more better), as you mentioned:
> >   https://lore.kernel.org/linux-mm/aicZ-5GX9De3MAU7@linux.dev/
> > - Virtual Swap on/off in the future, as Nhat mentioned:
> >   https://lore.kernel.org/linux-mm/20260528212955.1912856-1-nphamcs@gmail.com/
> > - The memory.zswap.writeback alternative (no hierarchy model conflict)
> > - zswap is first swap tier.
> > - Promotion. (Also better for selectve usage)
> > - tier based swap policy (e.g round-robin...)
> > 
> > To accelerate this work, I believe we should reach a consensus and
> > merge the currently proposed swap_tier interface :)
> > 
> > If the above approach is difficult, I would like to suggest an
> > alternative for progress with the memcg interfaces removed:
> > 
> > 1) We could make zswap the first tier and create
> > a use case where memory.zswap.writeback internally is handled by tier logic.
> > 
> > 2) Or simply merge the swap_tier infrastructure itself first.
> > 
> > This would allow the swap_tier infrastructure to be merged and discussed
> > more easily.
> > 
> > If it takes longer to adopt swap_tier anyway, by doing so we progress next step
> > as a experimental feature.
> > 
> > - Apply per-cgroup swap as an experimental (debugfs) feature.
> > - Apply Hao's use case experimentally or as it is as Yosry suggested.
> > (future migration to swap tier)
> > 
> > How do you think?
> > 
> > (FYI: My emails to kernel.org are failing due to internal server issues.)
> > 
> > Thank you 
> > Youngjun Park

Let me clarify a part I wrote confusingly. Handling
memory.zswap.writeback via tiers is possible, but I don't think the
interface itself would be replaced even if memory.swap.tiers is adopted.

Selecting only zswap in memory.swap.tiers would not just disable
writeback.it would also block regular swap entirely, which differs
slightly from the current semantic. (... "Per the cgroup v2 docs: a
zswap-only tier setting is subtly different from setting
memory.swap.max to 0, since it still allows pages to be written to the
zswap pool; this has no effect if zswap is disabled, and swapping is
allowed unless memory.swap.max is set to 0.")

So the interface itself needs to be retained, and it could be extended
toward selective writeback — e.g., passing a desired tier into
memory.zswap.writeback so writeback targets only that tier. Currently
it only controls on/off. Other tiers probably don't need this. demotion
based on the selected tier should be enough.

Thanks,
Youngjun Park

  parent reply	other threads:[~2026-06-14  9:23 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-26 11:45 [PATCH v3 0/4] mm/zswap: Implement per-cgroup proactive writeback Hao Jia
2026-05-26 11:45 ` [PATCH v3 1/4] mm/zswap: Make shrink_worker writeback cursor per-memcg Hao Jia
2026-05-29 19:51   ` Nhat Pham
2026-05-30  1:24   ` Yosry Ahmed
2026-06-01 11:07     ` Hao Jia
2026-06-01 16:44       ` Nhat Pham
2026-06-01 16:47         ` Nhat Pham
2026-06-01 17:08       ` Nhat Pham
2026-06-02 11:32         ` Hao Jia
2026-06-02  0:31       ` Yosry Ahmed
2026-06-02 11:33         ` Hao Jia
2026-06-02 23:19           ` Yosry Ahmed
2026-06-03  3:02             ` Hao Jia
2026-06-03 17:53               ` Yosry Ahmed
2026-06-04  1:58                 ` Hao Jia
2026-06-04  5:34                   ` Yosry Ahmed
2026-06-04 13:06                     ` Hao Jia
2026-06-04 16:10                       ` Yosry Ahmed
2026-06-04 17:23                       ` Nhat Pham
2026-06-08 12:50                         ` Hao Jia
2026-06-08 16:23                           ` Nhat Pham
2026-06-08 16:44                             ` Yosry Ahmed
2026-06-08 16:48                             ` Yosry Ahmed
2026-06-08 18:01                               ` Nhat Pham
2026-06-09  3:18                                 ` Hao Jia
2026-06-11 17:39                                   ` Yosry Ahmed
2026-06-12 16:40                                     ` Shakeel Butt
2026-06-12 18:15                                       ` Yosry Ahmed
2026-05-26 11:45 ` [PATCH v3 2/4] mm/zswap: Implement proactive writeback Hao Jia
2026-05-29 19:58   ` Nhat Pham
2026-05-30  1:40     ` Yosry Ahmed
2026-06-03 11:22       ` Hao Jia
2026-06-03 17:58         ` Yosry Ahmed
2026-06-03 18:14           ` Nhat Pham
2026-06-04  2:11             ` Hao Jia
2026-06-04  5:36               ` Yosry Ahmed
2026-06-04 14:01                 ` Shakeel Butt
2026-06-08 18:30                 ` Shakeel Butt
2026-06-08 19:50                   ` Shakeel Butt
2026-06-08 20:19                     ` Yosry Ahmed
2026-06-08 22:22                       ` Shakeel Butt
2026-06-08 22:27                         ` Yosry Ahmed
2026-06-09  4:19                           ` YoungJun Park
2026-06-11 17:45                             ` Yosry Ahmed
2026-06-11 19:12                               ` Shakeel Butt
2026-06-12  7:27                                 ` YoungJun Park
2026-06-12 17:02                                   ` [swap tier discussion] " Shakeel Butt
2026-06-12 21:31                                     ` Yosry Ahmed
2026-06-14  9:23                                     ` YoungJun Park [this message]
2026-05-30  1:37   ` Yosry Ahmed
2026-06-03 11:27     ` Hao Jia
2026-06-03 17:55       ` Yosry Ahmed
2026-06-03 18:23       ` Nhat Pham
2026-06-03 18:26         ` Yosry Ahmed
2026-06-03 18:34           ` Nhat Pham
2026-06-03 18:43             ` Yosry Ahmed
2026-06-03 18:51               ` Nhat Pham
2026-06-03 18:54                 ` Yosry Ahmed
2026-05-26 11:46 ` [PATCH v3 3/4] mm/zswap: Add per-memcg stat for " Hao Jia
2026-05-29 20:01   ` Nhat Pham
2026-06-03 11:29     ` Hao Jia
2026-05-26 11:46 ` [PATCH v3 4/4] selftests/cgroup: Add tests for zswap " Hao Jia
2026-05-29 20:02   ` Nhat Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ai5y923elCSZp41j@yjaykim-PowerEdge-T330 \
    --to=youngjun.park@lge.com \
    --cc=akpm@linux-foundation.org \
    --cc=baoquan.he@linux.dev \
    --cc=cgroups@vger.kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=jiahao.kernel@gmail.com \
    --cc=jiahao1@lixiang.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kasong@tencent.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=tj@kernel.org \
    --cc=yosry@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox