From: Yosry Ahmed <yosry@kernel.org>
To: Nhat Pham <nphamcs@gmail.com>
Cc: kasong@tencent.com, Liam.Howlett@oracle.com,
akpm@linux-foundation.org, apopple@nvidia.com,
axelrasmussen@google.com, baohua@kernel.org,
baolin.wang@linux.alibaba.com, bhe@redhat.com, byungchul@sk.com,
cgroups@vger.kernel.org, chengming.zhou@linux.dev,
chrisl@kernel.org, corbet@lwn.net, david@kernel.org,
dev.jain@arm.com, gourry@gourry.net, hannes@cmpxchg.org,
hughd@google.com, jannh@google.com, joshua.hahnjy@gmail.com,
lance.yang@linux.dev, lenb@kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, linux-pm@vger.kernel.org,
lorenzo.stoakes@oracle.com, matthew.brost@intel.com,
mhocko@suse.com, muchun.song@linux.dev, npache@redhat.com,
pavel@kernel.org, peterx@redhat.com, peterz@infradead.org,
pfalcato@suse.de, rafael@kernel.org, rakie.kim@sk.com,
roman.gushchin@linux.dev, rppt@kernel.org, ryan.roberts@arm.com,
shakeel.butt@linux.dev, shikemeng@huaweicloud.com,
surenb@google.com, tglx@kernel.org, vbabka@suse.cz,
weixugc@google.com, ying.huang@linux.alibaba.com,
yosry.ahmed@linux.dev, yuanchu@google.com,
zhengqi.arch@bytedance.com, ziy@nvidia.com,
kernel-team@meta.com, riel@surriel.com, haowenchao22@gmail.com
Subject: Re: [RFC PATCH 0/5] mm, swap: Virtual Swap Space (Swap Table Edition)
Date: Wed, 3 Jun 2026 18:58:50 +0000 [thread overview]
Message-ID: <aiB2sHqxcBAJrTkP@google.com> (raw)
In-Reply-To: <CAKEwX=MWX9KkSFAoN4xEMg3b+gZUN9=yd7rirAWG5NOBf26eAg@mail.gmail.com>
> > I assume the main reason here is to avoid the extra overhead if
> > everything uses vswap, which would mainly be the reverse mapping
> > overhead? I guess there's also some simplicity that comes from reusing
> > the swap info infra as a whole, including the swap table.
>
> Yeah it helps a lot that we don't have to rewrite the whole allocator
> and swap entry reference counting logic again :)
I specifically meant using a full swap info thing for the physical swap
device even when it's behind vswap. That seems like an overkill, and we
don't need things like the swap entry reference coutning. We probably
just need a bitmap and a reverse mapping.
So I am assuming the main reason why we are not doing that (at least for
now) is simplicity?
> >
> > I don't like that the code bifurcates for vswap vs. normal swap entries
> > though. Not sure if this is an issue that can be fixed with proper
> > abstractions to hide it, or if the design needs modifications. I was
> > honestly really hoping we don't end up with this. I was hoping that the
> > physical swap device no longer uses a full swap table and all, and
> > everything goes through vswap.
> >
> > I hoping that if redirection isn't needed (e.g. zswap is disabled),
> > vswap can directly encode the physical swap slot so that the reverse
> > mapping isn't needed -- so we avoid the overhead without keeping the
> > physical swap device using a fully-fledged swap table.
>
> Can you expand on "vswap can directly encode the physical swap slot"?
> I'm not sure I follow here.
I meant that if redirection is not needed (e.g. zswap is disabled), then
instead of having a vswap device pointing at a physical swap device, we
can just the data (e.g. phyiscal swap slot) in the vswap device
directly. Then we don't need a full swap info thing and swap table for
the physical swap device.
This directly ties into my question above, about why we have a
fully-fledged swap info thing for the physical swap device when using
vswap.
> >
> > All that being said, perhaps I am too out of touch with the code to
> > realize it's simply not possible.
> >
> > Honestly, if the main reason we can't have a single swap table for vswap
> > is saving 8 bytes on the reverse mapping, it sounds like a weak-ish
> > argument, even if we can't optimize the reverse mapping away. But maybe
> > I am also out of touch with RAM prices :)
>
> In terms of the space overhead I do agree, FWIW :)
>
> I think the other concern is the indirection overhead with going
> through the xarray for every swap operation, hence the per-CPU vswap
> cluster lookup caching idea:
>
> https://lore.kernel.org/all/20260505153854.1612033-23-nphamcs@gmail.com/
Right, but we should already avoid the xarray with the swap table
design, right? We just have one swap table pointing to another
essentially?
> >
> > I at least hope that, the current design is not painting us into a
> > corner (e.g. through userspace interfaces), and we can still achieve a
> > vswap-for-all implementation in the future (maybe that's what you have
> > in mind already?).
>
> That's still my plan. Operationally speaking, I want to make this
> completely transparent to users, with minimal to no performance
> overhead.
So if CONFIG_VSWAP is set all swap devices are vswap by default, right?
Would it help with testing if it's controlled by a boot param?
>
> The next action item is to optimize for vswap-on-fast-swapfile case -
> that was Kairui's main concerns regarding performance. I spent a lot
> of time perfing and fixing issues for this case in v6. The issues with
> the most egregious effects and simplest fix (vswap-less
> swap-cache-only check for e.g) are already fixed in this new design,
> and eventually I will move the rest (lookup caching) and more to here.
So is the end goal to have vswap be the default rather than a special
swap device? It would certainly help to include some details about that.
> >
> > Aside from the swap code, the only sticking point for me is the logic
> > bifurcation in zswap. Why does zswap need to handle vswap vs. not vswap?
> > I thought the point of the design is to use vswap when zswap is used,
> > and otherwise use a normal swap table. In a way, one of the goals is to
> > make zswap a first class swap citizen, but it doesn't seem like we are
> > achieving that?
>
> We already have all the machinery to make zswap completely
> independent. Right now, if you use vswap, you'll skip the zswap's
> internal xarray entirely, and just store a zswap entry in the virtual
> swap cluster's vtable.
>
> I just haven't removed the old code for 2 reasons:
>
> 1. Reduce the delta on this RFC, to ease the burden for reviewers (and
> definitely not because I'm lazy :P)
>
> 2. The only other practical reason is so that we can let users compile
> with !CONFIG_VSWAP and still uses zswap on top of the old swapfile
> setup during the transition/experimentation period for now.
>
> But logically and conceptually speaking, there is no reason I can come
> up with to use zswap on without vswap. The CPU indirection overhead is
> already partially there (since zswap uses an xarray) and further
> optimized (cluster loopup caching etc.), as well as the space overhead
> (vswap replaces the zswap xarray). I actually wrote a whole paragraph
> about how we should always go for vswap if we're using zswap, but then
> decide to remove it since there's no code for it yet.
>
> If folks like it, what I can do is have CONFIG_ZSWAP depends on
> CONFIG_VSWAP, removes all the non-vswap logic, and call it a day? :)
> Then, on the swap allocation side, if vswap allocation fail and zswap
> writeback is disabled, we can error out early.
Hmm maybe we can keep it around for now and do that after vswap
stabilizes? It ultimately depend on how much complexity we maintain by
allowing both.
I think another problem is 32-bit, technically zswap can be used on
32-bit now, right? So vswap not supporitng 32-bit is a problem.
General question (for both zswap and general swap code), would a boot
param make implementation simpler? Right now we seem to key off the swap
device having the "vswap" flag, would it help if it was a runtime
constant?
next prev parent reply other threads:[~2026-06-03 18:58 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-28 21:29 [RFC PATCH 0/5] mm, swap: Virtual Swap Space (Swap Table Edition) Nhat Pham
2026-05-28 21:29 ` [RFC PATCH 1/5] mm, swap: add virtual swap device infrastructure Nhat Pham
2026-05-28 21:29 ` [RFC PATCH 2/5] mm, swap: support zswap and zeroswap as vswap backends Nhat Pham
2026-05-28 21:29 ` [RFC PATCH 3/5] mm, swap: support physical swap as a vswap backend Nhat Pham
2026-05-28 21:29 ` [RFC PATCH 4/5] mm, swap: only charge physical swap entries Nhat Pham
2026-05-28 21:29 ` [RFC PATCH 5/5] mm, swap: add debugfs counters for vswap Nhat Pham
2026-06-01 7:34 ` [RFC PATCH 0/5] mm, swap: Virtual Swap Space (Swap Table Edition) Kairui Song
2026-06-01 15:56 ` Nhat Pham
2026-06-01 16:22 ` Nhat Pham
2026-06-01 17:49 ` Kairui Song
2026-06-02 15:54 ` Nhat Pham
2026-06-02 16:43 ` Kairui Song
2026-06-01 17:44 ` Kairui Song
2026-06-01 18:06 ` Nhat Pham
2026-06-02 3:24 ` Kairui Song
2026-06-02 15:28 ` Nhat Pham
2026-06-03 1:29 ` Yosry Ahmed
2026-06-03 17:12 ` Nhat Pham
2026-06-03 17:22 ` Nhat Pham
2026-06-03 19:00 ` Yosry Ahmed
2026-06-03 18:58 ` Yosry Ahmed [this message]
2026-06-03 19:26 ` Nhat Pham
2026-06-03 19:35 ` Yosry Ahmed
2026-06-03 20:09 ` Nhat Pham
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aiB2sHqxcBAJrTkP@google.com \
--to=yosry@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bhe@redhat.com \
--cc=byungchul@sk.com \
--cc=cgroups@vger.kernel.org \
--cc=chengming.zhou@linux.dev \
--cc=chrisl@kernel.org \
--cc=corbet@lwn.net \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=haowenchao22@gmail.com \
--cc=hughd@google.com \
--cc=jannh@google.com \
--cc=joshua.hahnjy@gmail.com \
--cc=kasong@tencent.com \
--cc=kernel-team@meta.com \
--cc=lance.yang@linux.dev \
--cc=lenb@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pm@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=matthew.brost@intel.com \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=npache@redhat.com \
--cc=nphamcs@gmail.com \
--cc=pavel@kernel.org \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=pfalcato@suse.de \
--cc=rafael@kernel.org \
--cc=rakie.kim@sk.com \
--cc=riel@surriel.com \
--cc=roman.gushchin@linux.dev \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=surenb@google.com \
--cc=tglx@kernel.org \
--cc=vbabka@suse.cz \
--cc=weixugc@google.com \
--cc=ying.huang@linux.alibaba.com \
--cc=yosry.ahmed@linux.dev \
--cc=yuanchu@google.com \
--cc=zhengqi.arch@bytedance.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox