From: Johannes Weiner <hannes@cmpxchg.org>
To: Chris Li <chrisl@kernel.org>
Cc: Nhat Pham <nphamcs@gmail.com>,
akpm@linux-foundation.org, tj@kernel.org,
lizefan.x@bytedance.com, cerasuolodomenico@gmail.com,
yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org,
vitaly.wool@konsulko.com, mhocko@kernel.org,
roman.gushchin@linux.dev, shakeelb@google.com,
muchun.song@linux.dev, hughd@google.com, corbet@lwn.net,
konrad.wilk@oracle.com, senozhatsky@chromium.org,
rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
david@ixit.cz, Kairui Song <kasong@tencent.com>,
Minchan Kim <minchan@google.com>,
Zhongkun He <hezhongkun.hzk@bytedance.com>
Subject: Re: [PATCH v6] zswap: memcontrol: implement zswap writeback disabling
Date: Fri, 8 Dec 2023 22:42:29 -0500 [thread overview]
Message-ID: <20231209034229.GA1001962@cmpxchg.org> (raw)
In-Reply-To: <CAF8kJuNpnqTM5x1QmQ7h-FaRWVnHBdNGvGvB3txohSOmZhYA-Q@mail.gmail.com>
On Fri, Dec 08, 2023 at 03:55:59PM -0800, Chris Li wrote:
> I can give you three usage cases right now:
> 1) Google producting kernel uses SSD only swap, it is currently on
> pilot. This is not expressible by the memory.zswap.writeback. You can
> set the memory.zswap.max = 0 and memory.zswap.writeback = 1, then SSD
> backed swapfile. But the whole thing feels very clunky, especially
> what you really want is SSD only swap, you need to do all this zswap
> config dance. Google has an internal memory.swapfile feature
> implemented per cgroup swap file type by "zswap only", "real swap file
> only", "both", "none" (the exact keyword might be different). running
> in the production for almost 10 years. The need for more than zswap
> type of per cgroup control is really there.
We use regular swap on SSD without zswap just fine. Of course it's
expressible.
On dedicated systems, zswap is disabled in sysfs. On shared hosts
where it's determined based on which workload is scheduled, zswap is
generally enabled through sysfs, and individual cgroup access is
controlled via memory.zswap.max - which is what this knob is for.
This is analogous to enabling swap globally, and then opting
individual cgroups in and out with memory.swap.max.
So this usecase is very much already supported, and it's expressed in
a way that's pretty natural for how cgroups express access and lack of
access to certain resources.
I don't see how memory.swap.type or memory.swap.tiers would improve
this in any way. On the contrary, it would overlap and conflict with
existing controls to manage swap and zswap on a per-cgroup basis.
> 2) As indicated by this discussion, Tencent has a usage case for SSD
> and hard disk swap as overflow.
> https://lore.kernel.org/linux-mm/20231119194740.94101-9-ryncsn@gmail.com/
> +Kairui
Multiple swap devices for round robin or with different priorities
aren't new, they have been supported for a very, very long time. So
far nobody has proposed to control the exact behavior on a per-cgroup
basis, and I didn't see anybody in this thread asking for it either.
So I don't see how this counts as an obvious and automatic usecase for
memory.swap.tiers.
> 3) Android has some fancy swap ideas led by those patches.
> https://lore.kernel.org/linux-mm/20230710221659.2473460-1-minchan@kernel.org/
> It got shot down due to removal of frontswap. But the usage case and
> product requirement is there.
> +Minchan
This looks like an optimization for zram to bypass the block layer and
hook directly into the swap code. Correct me if I'm wrong, but this
doesn't appear to have anything to do with per-cgroup backend control.
> > zswap.writeback is a more urgent need, and does not prevent swap.tiers
> > if we do decide to implement it.
>
> I respect that urgent need, that is why I Ack on the V5 path, under
> the understanding that this zswap.writeback is not carved into stones.
> When a better interface comes alone, that interface can be obsolete.
> Frankly speaking I would much prefer not introducing the cgroup API
> which will be obsolete soon.
>
> If you think zswap.writeback is not removable when another better
> alternative is available, please voice it now.
>
> If you squash my minimal memory.swap.tiers patch, it will also address
> your urgent need for merging the "zswap.writeback", no?
We can always deprecate ABI if something better comes along.
However, it's quite bold to claim that memory.swap.tiers is the best
way to implement backend control on a per-cgroup basis, and that it'll
definitely be needed in the future. You might see this as a foregone
conclusion, but I very much doubt this.
Even if such a file were to show up, I'm not convinced it should even
include zswap as one of the tiers. Zswap isn't a regular swap backend,
it doesn't show up in /proc/swaps, it can't be a second tier, the way
it interacts with its backend file is very different than how two
swapfiles of different priorities interact with each other, it's
already controllable with memory.zswap.max, etc.
I'm open to discussing usecases and proposals for more fine-grained
per-cgroup backend control. We've had discussions about per-cgroup
swapfiles in the past. Cgroup parameters for swapon are another
thought. There are several options and many considerations. The
memory.swap.tiers idea is the newest, has probably had the least
amount of discussion among them, and looks the least convincing to me.
Let's work out the requirements first.
The "conflict" with memory.zswap.writeback is a red herring - it's no
more of a conflict than setting memory.swap.tiers to "zswap" or "all"
and then setting memory.zswap.max or memory.swap.max to 0.
So the notion that we have to rush in a minimal version of a MUCH
bigger concept, just to support zswap writeback disabling is
misguided. And then hope that this format works as the concept evolves
and real usecases materialize... There is no reason to take that risk.
next prev parent reply other threads:[~2023-12-09 3:44 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-07 19:24 [PATCH v6] zswap: memcontrol: implement zswap writeback disabling Nhat Pham
2023-12-07 19:26 ` Yosry Ahmed
2023-12-07 22:11 ` Andrew Morton
2023-12-08 0:42 ` Nhat Pham
2023-12-08 1:14 ` Nhat Pham
2023-12-08 19:58 ` Andrew Morton
2023-12-08 19:57 ` Andrew Morton
2023-12-08 0:19 ` Chris Li
2023-12-08 1:03 ` Nhat Pham
2023-12-08 1:12 ` Yosry Ahmed
2023-12-08 16:34 ` Johannes Weiner
2023-12-08 20:08 ` Yosry Ahmed
2023-12-09 2:02 ` Chris Li
2023-12-09 0:09 ` Chris Li
2023-12-08 23:55 ` Chris Li
2023-12-09 3:42 ` Johannes Weiner [this message]
2023-12-09 17:39 ` Chris Li
2023-12-11 22:55 ` Minchan Kim
2023-12-12 2:43 ` [External] " Zhongkun He
2023-12-12 23:57 ` Chris Li
2023-12-20 10:22 ` Kairui Song
2023-12-14 17:11 ` Johannes Weiner
2023-12-14 17:23 ` Yu Zhao
2023-12-14 18:03 ` Fabian Deutsch
[not found] ` <CA+PVUaR9EtUMke-K8mM0gmJXdOm9equ1JHqBjZ0T5V0tiHVc8Q@mail.gmail.com>
2023-12-14 23:22 ` Chris Li
[not found] ` <CA+PVUaRxXdndKCodgPKFcsCUQwO-8mGtU65OkkudoR-8rB=KaA@mail.gmail.com>
2023-12-15 9:40 ` Chris Li
2023-12-15 9:50 ` Fabian Deutsch
2023-12-14 17:34 ` Christopher Li
2023-12-14 22:11 ` Johannes Weiner
2023-12-14 22:54 ` Chris Li
2023-12-15 2:19 ` Nhat Pham
2023-12-12 21:36 ` Nhat Pham
2023-12-13 0:29 ` Chris Li
2023-12-11 9:31 ` Kairui Song
2023-12-12 23:39 ` Chris Li
2023-12-20 10:21 ` Kairui Song
2023-12-15 21:21 ` Yosry Ahmed
2023-12-18 14:44 ` Johannes Weiner
2023-12-18 19:21 ` Nhat Pham
2023-12-18 21:54 ` Yosry Ahmed
2023-12-18 21:52 ` Yosry Ahmed
2023-12-20 5:15 ` Johannes Weiner
2023-12-20 8:59 ` Yosry Ahmed
2023-12-20 14:50 ` Johannes Weiner
2023-12-21 0:24 ` Yosry Ahmed
2023-12-21 0:50 ` Nhat Pham
2023-12-21 0:57 ` [PATCH v6] zswap: memcontrol: implement zswap writeback disabling (fix) Nhat Pham
2023-12-24 17:17 ` Chris Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231209034229.GA1001962@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=cerasuolodomenico@gmail.com \
--cc=chrisl@kernel.org \
--cc=corbet@lwn.net \
--cc=david@ixit.cz \
--cc=ddstreet@ieee.org \
--cc=hezhongkun.hzk@bytedance.com \
--cc=hughd@google.com \
--cc=kasong@tencent.com \
--cc=kernel-team@meta.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan.x@bytedance.com \
--cc=mhocko@kernel.org \
--cc=minchan@google.com \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=rppt@kernel.org \
--cc=senozhatsky@chromium.org \
--cc=shakeelb@google.com \
--cc=sjenning@redhat.com \
--cc=tj@kernel.org \
--cc=vitaly.wool@konsulko.com \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox