public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosry@kernel.org>
To: Kairui Song <ryncsn@gmail.com>
Cc: Nhat Pham <nphamcs@gmail.com>,
	Liam.Howlett@oracle.com,  akpm@linux-foundation.org,
	apopple@nvidia.com, axelrasmussen@google.com, baohua@kernel.org,
	 baolin.wang@linux.alibaba.com, bhe@redhat.com, byungchul@sk.com,
	cgroups@vger.kernel.org,  chengming.zhou@linux.dev,
	chrisl@kernel.org, corbet@lwn.net, david@kernel.org,
	 dev.jain@arm.com, gourry@gourry.net, hannes@cmpxchg.org,
	hughd@google.com,  jannh@google.com, joshua.hahnjy@gmail.com,
	lance.yang@linux.dev, lenb@kernel.org,
	 linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org,  linux-pm@vger.kernel.org,
	lorenzo.stoakes@oracle.com, matthew.brost@intel.com,
	 mhocko@suse.com, muchun.song@linux.dev, npache@redhat.com,
	pavel@kernel.org,  peterx@redhat.com, peterz@infradead.org,
	pfalcato@suse.de, rafael@kernel.org,  rakie.kim@sk.com,
	roman.gushchin@linux.dev, rppt@kernel.org, ryan.roberts@arm.com,
	 shakeel.butt@linux.dev, shikemeng@huaweicloud.com,
	surenb@google.com, tglx@kernel.org,  vbabka@suse.cz,
	weixugc@google.com, ying.huang@linux.alibaba.com,
	 yosry.ahmed@linux.dev, yuanchu@google.com,
	zhengqi.arch@bytedance.com, ziy@nvidia.com,
	 kernel-team@meta.com, riel@surriel.com
Subject: Re: [PATCH v5 00/21] Virtual Swap Space
Date: Wed, 22 Apr 2026 20:27:06 +0000	[thread overview]
Message-ID: <aektdlD4npMVThu3@google.com> (raw)
In-Reply-To: <CAMgjq7C53WRS5oYxO157mX7JxhfoPoi34k+taiKLrMah-b-iRg@mail.gmail.com>

On Wed, Apr 22, 2026 at 10:18:35AM +0800, Kairui Song wrote:
> On Wed, Apr 22, 2026 at 8:26 AM Yosry Ahmed <yosry@kernel.org> wrote:
> >
> > On Fri, Mar 20, 2026 at 12:27:14PM -0700, Nhat Pham wrote:
> > >
> > > This patch series implements the virtual swap space idea, based on Yosry's
> > > proposals at LSFMMBPF 2023 (see [1], [2], [3]), as well as valuable
> > > inputs from Johannes Weiner. The same idea (with different
> > > implementation details) has been floated by Rik van Riel since at least
> > > 2011 (see [8]).
> >
> > Unfortuantely, I haven't been able to keep up with virtual swap and swap
> > table development, as my time is mostly being spent elsewhere these
> > days. I do have a question tho, which might have already been answered
> > or is too naive/stupid -- so apologies in advance.
> 
> Hi Yosry,
> 
> Not a stupid question at all—it's actually spot on. :)
> 
> >
> > Given the recent advancements in the swap table and that most metadata
> > and the swap cache are already being pulled into it, is it possible to
> > use the swap table in the virtual swap layer instead of the xarray?
> >
> > Basically pull the swap table one layer higher, and have it point to
> > either a zswap entry or a physical swap slot (or others in the future)?
> > If my understanding is correct, we kinda get the best of both worlds and
> > reuse the integration already done by the swap table with the swap
> > cache, as well as the lock paritioning.
> >
> > In this world, the clusters would be in the virtual swap space, and we'd
> > create the clusters on-demand as needed.
> >
> > Does this even work or make the least amount of sense (I guess the
> > question is for both Nhat and Kairui)?
> >
> 
> Yes, this absolutely works. In fact, I previously posted a working RFC
> based on this idea. In that series, clusters are dynamically
> allocated, allowing the swap space to be dynamically sized
> (essentially infinite) while reusing all the existing infrastructure:
> https://lore.kernel.org/all/20260220-swap-table-p4-v1-0-104795d19815@tencent.com/

There are a few aspects that I don't agree with in this RFC, and I think
Nhat and Johannes raised most of them. Mostly that I don't want to
expose ghost swapfiles or similar to userspace.

I think userspace's view of swapfiles should remain the same and reflect
the physical swap slots. The virtual swap layer should be completely
transparent in this case. Userspace shouldn't need to configure it in
any way.

In an ideal world, the only noticeable change from userspace is that
with zswap, compressed pages would stop using slots in the swapfile and
charging the memcg for them -- and that zswap would work even without a
swapfile, by just enabling it. This is admittedly a user-visible
behavioral change, but I am hoping that's a good one that we can live
with.

If there are real concerns about this, we can discuss things like a knob
or config option to keep charging zswap pages as swap slots (ew..) or
only allow zswap with a real swapfile (double ew..). But I am really
hoping we can get away with changing the semantics without doing this.

We can add extra interfaces for virtual swap as needed, e.g. virtual
swapoff that you mentioned to clear the swap cache, or stats about the
virtual swap space (which translates to memory overhead).

There are also a few missing pieces like different memcg charging, but
these were already pointed out, and we can figure them out as we go.

Nhat/Johannes, WDYT? Am I missing somthing?

> 
> The only missing pieces are a few helpers like folio_realloc_swap()
> and folio_migrate_swap() for lower layer allocation and migration. I
> prototyped this locally and it wasn't difficult to implement.
> Furthermore, this approach works perfectly with YoungJun's tiering
> work with zero conflicts, the dynamic layer can be runtime or
> per-memcg optional.
> 
> To move this forward, I've stripped out the RFC features and memcg
> behavior changes, and recently sent a V3 that focuses purely on the
> infrastructure. It introduces no behavior changes or new features,
> just optimizations.
> 
> It cleans up a lot of allocation and ordering, as well as memcg
> swap lookups. Since some of these problems were also observed in the
> vss discussion, I think this will make things easier for all of us:
> https://lore.kernel.org/all/20260421-swap-table-p4-v3-0-2f23759a76bc@tencent.com/

Yeah I saw that (but didn't really have time to do anything else about
it). Splitting this out is definitely the right thing to do, and the
series looks great from a very high level. Awesome work, as usual :)

  reply	other threads:[~2026-04-22 20:27 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-20 19:27 [PATCH v5 00/21] Virtual Swap Space Nhat Pham
2026-03-20 19:27 ` [PATCH v5 01/21] mm/swap: decouple swap cache from physical swap infrastructure Nhat Pham
2026-03-20 19:27 ` [PATCH v5 02/21] swap: rearrange the swap header file Nhat Pham
2026-03-20 19:27 ` [PATCH v5 03/21] mm: swap: add an abstract API for locking out swapoff Nhat Pham
2026-03-20 19:27 ` [PATCH v5 04/21] zswap: add new helpers for zswap entry operations Nhat Pham
2026-03-20 19:27 ` [PATCH v5 05/21] mm/swap: add a new function to check if a swap entry is in swap cached Nhat Pham
2026-03-20 19:27 ` [PATCH v5 06/21] mm: swap: add a separate type for physical swap slots Nhat Pham
2026-03-20 19:27 ` [PATCH v5 07/21] mm: create scaffolds for the new virtual swap implementation Nhat Pham
2026-03-20 19:27 ` [PATCH v5 08/21] zswap: prepare zswap for swap virtualization Nhat Pham
2026-03-20 19:27 ` [PATCH v5 09/21] mm: swap: allocate a virtual swap slot for each swapped out page Nhat Pham
2026-03-20 19:27 ` [PATCH v5 10/21] swap: move swap cache to virtual swap descriptor Nhat Pham
2026-03-20 19:27 ` [PATCH v5 11/21] zswap: move zswap entry management to the " Nhat Pham
2026-03-20 19:27 ` [PATCH v5 12/21] swap: implement the swap_cgroup API using virtual swap Nhat Pham
2026-03-20 19:27 ` [PATCH v5 13/21] swap: manage swap entry lifecycle at the virtual swap layer Nhat Pham
2026-03-20 19:27 ` [PATCH v5 14/21] mm: swap: decouple virtual swap slot from backing store Nhat Pham
2026-03-20 19:27 ` [PATCH v5 15/21] zswap: do not start zswap shrinker if there is no physical swap slots Nhat Pham
2026-03-20 19:27 ` [PATCH v5 16/21] swap: do not unnecesarily pin readahead swap entries Nhat Pham
2026-03-20 19:27 ` [PATCH v5 17/21] swapfile: remove zeromap bitmap Nhat Pham
2026-03-20 19:27 ` [PATCH v5 18/21] memcg: swap: only charge physical swap slots Nhat Pham
2026-03-20 19:27 ` [PATCH v5 19/21] swap: simplify swapoff using virtual swap Nhat Pham
2026-03-20 19:27 ` [PATCH v5 20/21] swapfile: replace the swap map with bitmaps Nhat Pham
2026-03-20 19:27 ` [PATCH v5 21/21] vswap: batch contiguous vswap free calls Nhat Pham
2026-03-21 18:22 ` [PATCH v5 00/21] Virtual Swap Space Andrew Morton
2026-03-22  2:18   ` Roman Gushchin
2026-03-23 10:08 ` Kairui Song
2026-03-23 15:32   ` Nhat Pham
2026-03-23 16:40     ` Kairui Song
2026-03-23 20:05       ` Nhat Pham
2026-04-14 17:23         ` Nhat Pham
2026-04-14 17:32           ` Nhat Pham
2026-04-16 18:46           ` Kairui Song
2026-03-25 18:53     ` YoungJun Park
2026-04-12  1:03       ` Nhat Pham
2026-04-14  3:09         ` YoungJun Park
2026-04-20 16:02   ` Nhat Pham
2026-03-24 13:19 ` Askar Safin
2026-03-24 17:23   ` Nhat Pham
2026-03-25  2:35     ` Askar Safin
2026-03-25 18:36 ` YoungJun Park
2026-04-12  1:40   ` Nhat Pham
2026-04-14  2:50     ` YoungJun Park
2026-04-14  3:28       ` Kairui Song
2026-04-14 16:35         ` Nhat Pham
2026-04-14  7:52     ` Christoph Hellwig
2026-04-22  0:26 ` Yosry Ahmed
2026-04-22  2:18   ` Kairui Song
2026-04-22 20:27     ` Yosry Ahmed [this message]
2026-04-23  6:16       ` Kairui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aektdlD4npMVThu3@google.com \
    --to=yosry@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=byungchul@sk.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=corbet@lwn.net \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=lance.yang@linux.dev \
    --cc=lenb@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=npache@redhat.com \
    --cc=nphamcs@gmail.com \
    --cc=pavel@kernel.org \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pfalcato@suse.de \
    --cc=rafael@kernel.org \
    --cc=rakie.kim@sk.com \
    --cc=riel@surriel.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=ryncsn@gmail.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=surenb@google.com \
    --cc=tglx@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=weixugc@google.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yosry.ahmed@linux.dev \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox