From: Johannes Weiner <hannes@cmpxchg.org>
To: Chris Li <chrisl@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Kairui Song <kasong@tencent.com>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
Barry Song <baohua@kernel.org>,
Yosry Ahmed <yosry.ahmed@linux.dev>,
Chengming Zhou <chengming.zhou@linux.dev>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
pratmal@google.com, sweettea@google.com, gthelen@google.com,
weixugc@google.com
Subject: Re: [PATCH RFC] mm: ghost swapfile support for zswap
Date: Mon, 24 Nov 2025 12:27:17 -0500 [thread overview]
Message-ID: <20251124172717.GA476776@cmpxchg.org> (raw)
In-Reply-To: <CACePvbXXDaOY-E-nZ3n44w0StBc=n59+v5V-X2fw-V+roH=Qyw@mail.gmail.com>
On Fri, Nov 21, 2025 at 05:52:09PM -0800, Chris Li wrote:
> On Fri, Nov 21, 2025 at 3:40 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Fri, Nov 21, 2025 at 01:31:43AM -0800, Chris Li wrote:
> > > The current zswap requires a backing swapfile. The swap slot used
> > > by zswap is not able to be used by the swapfile. That waste swapfile
> > > space.
> > >
> > > The ghost swapfile is a swapfile that only contains the swapfile header
> > > for zswap. The swapfile header indicate the size of the swapfile. There
> > > is no swap data section in the ghost swapfile, therefore, no waste of
> > > swapfile space. As such, any write to a ghost swapfile will fail. To
> > > prevents accidental read or write of ghost swapfile, bdev of
> > > swap_info_struct is set to NULL. Ghost swapfile will also set the SSD
> > > flag because there is no rotation disk access when using zswap.
> >
> > Zswap is primarily a compressed cache for real swap on secondary
> > storage. It's indeed quite important that entries currently in zswap
> > don't occupy disk slots; but for a solution to this to be acceptable,
> > it has to work with the primary usecase and support disk writeback.
>
> Well, my plan is to support the writeback via swap.tiers.
Do you have a link to that proposal?
My understanding of swap tiers was about grouping different swapfiles
and assigning them to cgroups. The issue with writeback is relocating
the data that a swp_entry_t page table refers to - without having to
find and update all the possible page tables. I'm not sure how
swap.tiers solve this problem.
> > This direction is a dead-end. Please take a look at Nhat's swap
> > virtualization patches. They decouple zswap from disk geometry, while
> > still supporting writeback to an actual backend file.
>
> Yes, there are many ways to decouple zswap from disk geometry, my swap
> table + swap.tiers design can do that as well. I have concerns about
> swap virtualization in the aspect of adding another layer of memory
> overhead addition per swap entry and CPU overhead of extra xarray
> lookup. I believe my approach is technically superior and cleaner.
> Both faster and cleaner. Basically swap.tiers + VFS like swap read
> write page ops. I will let Nhat clarify the performance and memory
> overhead side of the swap virtualization.
I'm happy to discuss it.
But keep in mind that the swap virtualization idea is a collaborative
product of quite a few people with an extensive combined upstream
record. Quite a bit of thought has gone into balancing static vs
runtime costs of that proposal. So you'll forgive me if I'm a bit
skeptical of the somewhat grandiose claims of one person that is new
to upstream development.
As to your specific points - we use xarray lookups in the page cache
fast path. It's a bold claim to say this would be too much overhead
during swapins.
Two, it's not clear to me how you want to make writeback efficient
*without* any sort of swap entry redirection. Walking all relevant
page tables is expensive; and you have to be able to find them first.
If you're talking about a redirection array as opposed to a tree -
static sizing of the compressed space is also a no-go. Zswap
utilization varies *widely* between workloads and different workload
combinations. Further, zswap consumes the same fungible resource as
uncompressed memory - there is really no excuse to burden users with
static sizing questions about this pool.
next prev parent reply other threads:[~2025-11-24 17:27 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-21 9:31 [PATCH RFC] mm: ghost swapfile support for zswap Chris Li
2025-11-21 10:19 ` Nhat Pham
2025-11-22 1:52 ` Chris Li
2025-11-24 14:47 ` Nhat Pham
2025-11-25 18:26 ` Chris Li
2025-11-21 11:40 ` Johannes Weiner
2025-11-22 1:52 ` Chris Li
2025-11-22 10:29 ` Kairui Song
2025-11-24 15:35 ` Nhat Pham
2025-11-24 16:14 ` Rik van Riel
2025-11-24 17:26 ` Chris Li
2025-11-24 17:42 ` Rik van Riel
2025-11-24 17:58 ` Chris Li
2025-11-24 17:27 ` Johannes Weiner [this message]
2025-11-24 18:24 ` Chris Li
2025-11-24 19:32 ` Johannes Weiner
2025-11-25 19:27 ` Chris Li
2025-11-25 21:31 ` Johannes Weiner
2025-11-26 19:22 ` Chris Li
2025-11-26 21:52 ` Rik van Riel
2025-11-27 1:52 ` Chris Li
2025-11-27 2:26 ` Rik van Riel
2025-11-27 19:09 ` Chris Li
2025-11-28 20:46 ` Nhat Pham
2025-11-29 20:38 ` Chris Li
2025-12-01 16:43 ` Johannes Weiner
2025-12-01 19:49 ` Kairui Song
2025-12-02 17:02 ` Johannes Weiner
2025-12-02 20:48 ` Chris Li
2025-12-01 20:21 ` Barry Song
2025-12-02 19:58 ` Chris Li
2025-12-01 23:37 ` Nhat Pham
2025-12-02 19:18 ` Chris Li
2025-12-02 18:18 ` Nhat Pham
2025-12-02 21:07 ` Chris Li
2025-11-24 19:32 ` Yosry Ahmed
2025-11-24 20:24 ` Nhat Pham
2025-11-25 18:50 ` Chris Li
2025-11-26 21:58 ` Rik van Riel
2025-11-27 2:07 ` Chris Li
2025-11-27 2:34 ` Rik van Riel
2025-11-25 18:14 ` Chris Li
2025-11-25 18:55 ` Johannes Weiner
2025-11-21 15:14 ` Yosry Ahmed
2025-11-22 1:52 ` Chris Li
2025-11-24 14:57 ` Nhat Pham
2025-11-22 9:59 ` Kairui Song
2025-11-22 13:58 ` Baoquan He
2025-12-02 2:56 ` Barry Song
2025-12-02 6:31 ` Baoquan He
2025-12-02 17:53 ` Nhat Pham
2025-12-02 21:01 ` Chris Li
2025-12-03 8:37 ` Yosry Ahmed
2025-12-03 20:02 ` Chris Li
2025-12-04 6:16 ` Yosry Ahmed
2025-12-04 10:11 ` Chris Li
2025-12-04 20:55 ` Yosry Ahmed
2025-12-05 8:56 ` Kairui Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251124172717.GA476776@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=chengming.zhou@linux.dev \
--cc=chrisl@kernel.org \
--cc=gthelen@google.com \
--cc=kasong@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=pratmal@google.com \
--cc=shikemeng@huaweicloud.com \
--cc=sweettea@google.com \
--cc=weixugc@google.com \
--cc=yosry.ahmed@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.