From: Johannes Weiner <hannes@cmpxchg.org>
To: Chris Li <chrisl@kernel.org>
Cc: Nhat Pham <nphamcs@gmail.com>,
akpm@linux-foundation.org, hughd@google.com,
yosry.ahmed@linux.dev, mhocko@kernel.org,
roman.gushchin@linux.dev, shakeel.butt@linux.dev,
muchun.song@linux.dev, len.brown@intel.com,
chengming.zhou@linux.dev, kasong@tencent.com,
huang.ying.caritas@gmail.com, ryan.roberts@arm.com,
shikemeng@huaweicloud.com, viro@zeniv.linux.org.uk,
baohua@kernel.org, bhe@redhat.com, osalvador@suse.de,
christophe.leroy@csgroup.eu, pavel@kernel.org,
linux-mm@kvack.org, kernel-team@meta.com,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
linux-pm@vger.kernel.org, peterx@redhat.com, riel@surriel.com,
joshua.hahnjy@gmail.com, npache@redhat.com, gourry@gourry.net,
axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com,
rafael@kernel.org, jannh@google.com, pfalcato@suse.de,
zhengqi.arch@bytedance.com
Subject: Re: [PATCH v3 00/20] Virtual Swap Space
Date: Mon, 9 Feb 2026 21:36:22 -0500 [thread overview]
Message-ID: <aYqZppn4yDbTP2_q@cmpxchg.org> (raw)
In-Reply-To: <CACePvbXsngZmn0OrJZjvMhhHnL5FazxYX7ShEpbU9RwHSJaUuA@mail.gmail.com>
Hi Chris,
On Mon, Feb 09, 2026 at 04:20:21AM -0800, Chris Li wrote:
> On Sun, Feb 8, 2026 at 4:15 PM Nhat Pham <nphamcs@gmail.com> wrote:
> >
> > My sincerest apologies - it seems like the cover letter (and just the
> > cover letter) fails to be sent out, for some reason. I'm trying to figure
> > out what happened - it works when I send the entire patch series to
> > myself...
> >
> > Anyway, resending this (in-reply-to patch 1 of the series):
>
> For the record I did receive your original V3 cover letter from the
> linux-mm mailing list.
>
> > Changelog:
> > * RFC v2 -> v3:
> > * Implement a cluster-based allocation algorithm for virtual swap
> > slots, inspired by Kairui Song and Chris Li's implementation, as
> > well as Johannes Weiner's suggestions. This eliminates the lock
> > contention issues on the virtual swap layer.
> > * Re-use swap table for the reverse mapping.
> > * Remove CONFIG_VIRTUAL_SWAP.
> > * Reducing the size of the swap descriptor from 48 bytes to 24
>
> Is the per swap slot entry overhead 24 bytes in your implementation?
> The current swap overhead is 3 static +8 dynamic, your 24 dynamic is a
> big jump. You can argue that 8->24 is not a big jump . But it is an
> unnecessary price compared to the alternatives, which is 8 dynamic +
> 4(optional redirect).
No, this is not the net overhead.
The descriptor consolidates and eliminates several other data
structures.
Here is the more detailed breakdown:
> > The size of the virtual swap descriptor is 24 bytes. Note that this is
> > not all "new" overhead, as the swap descriptor will replace:
> > * the swap_cgroup arrays (one per swap type) in the old design, which
> > is a massive source of static memory overhead. With the new design,
> > it is only allocated for used clusters.
> > * the swap tables, which holds the swap cache and workingset shadows.
> > * the zeromap bitmap, which is a bitmap of physical swap slots to
> > indicate whether the swapped out page is zero-filled or not.
> > * huge chunk of the swap_map. The swap_map is now replaced by 2 bitmaps,
> > one for allocated slots, and one for bad slots, representing 3 possible
> > states of a slot on the swapfile: allocated, free, and bad.
> > * the zswap tree.
> >
> > So, in terms of additional memory overhead:
> > * For zswap entries, the added memory overhead is rather minimal. The
> > new indirection pointer neatly replaces the existing zswap tree.
> > We really only incur less than one word of overhead for swap count
> > blow up (since we no longer use swap continuation) and the swap type.
> > * For physical swap entries, the new design will impose fewer than 3 words
> > memory overhead. However, as noted above this overhead is only for
> > actively used swap entries, whereas in the current design the overhead is
> > static (including the swap cgroup array for example).
> >
> > The primary victim of this overhead will be zram users. However, as
> > zswap now no longer takes up disk space, zram users can consider
> > switching to zswap (which, as a bonus, has a lot of useful features
> > out of the box, such as cgroup tracking, dynamic zswap pool sizing,
> > LRU-ordering writeback, etc.).
> >
> > For a more concrete example, suppose we have a 32 GB swapfile (i.e.
> > 8,388,608 swap entries), and we use zswap.
> >
> > 0% usage, or 0 entries: 0.00 MB
> > * Old design total overhead: 25.00 MB
> > * Vswap total overhead: 0.00 MB
> >
> > 25% usage, or 2,097,152 entries:
> > * Old design total overhead: 57.00 MB
> > * Vswap total overhead: 48.25 MB
> >
> > 50% usage, or 4,194,304 entries:
> > * Old design total overhead: 89.00 MB
> > * Vswap total overhead: 96.50 MB
> >
> > 75% usage, or 6,291,456 entries:
> > * Old design total overhead: 121.00 MB
> > * Vswap total overhead: 144.75 MB
> >
> > 100% usage, or 8,388,608 entries:
> > * Old design total overhead: 153.00 MB
> > * Vswap total overhead: 193.00 MB
> >
> > So even in the worst case scenario for virtual swap, i.e when we
> > somehow have an oracle to correctly size the swapfile for zswap
> > pool to 32 GB, the added overhead is only 40 MB, which is a mere
> > 0.12% of the total swapfile :)
> >
> > In practice, the overhead will be closer to the 50-75% usage case, as
> > systems tend to leave swap headroom for pathological events or sudden
> > spikes in memory requirements. The added overhead in these cases are
> > practically neglible. And in deployments where swapfiles for zswap
> > are previously sparsely used, switching over to virtual swap will
> > actually reduce memory overhead.
> >
> > Doing the same math for the disk swap, which is the worst case for
> > virtual swap in terms of swap backends:
> >
> > 0% usage, or 0 entries: 0.00 MB
> > * Old design total overhead: 25.00 MB
> > * Vswap total overhead: 2.00 MB
> >
> > 25% usage, or 2,097,152 entries:
> > * Old design total overhead: 41.00 MB
> > * Vswap total overhead: 66.25 MB
> >
> > 50% usage, or 4,194,304 entries:
> > * Old design total overhead: 57.00 MB
> > * Vswap total overhead: 130.50 MB
> >
> > 75% usage, or 6,291,456 entries:
> > * Old design total overhead: 73.00 MB
> > * Vswap total overhead: 194.75 MB
> >
> > 100% usage, or 8,388,608 entries:
> > * Old design total overhead: 89.00 MB
> > * Vswap total overhead: 259.00 MB
> >
> > The added overhead is 170MB, which is 0.5% of the total swapfile size,
> > again in the worst case when we have a sizing oracle.
next prev parent reply other threads:[~2026-02-10 2:36 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-08 21:58 [PATCH v3 00/20] Virtual Swap Space Nhat Pham
2026-02-08 21:58 ` [PATCH v3 01/20] mm/swap: decouple swap cache from physical swap infrastructure Nhat Pham
2026-02-08 22:26 ` [PATCH v3 00/20] Virtual Swap Space Nhat Pham
2026-02-10 17:59 ` Kairui Song
2026-02-10 18:52 ` Johannes Weiner
2026-02-10 19:11 ` Nhat Pham
2026-02-10 19:23 ` Nhat Pham
2026-02-12 5:07 ` Chris Li
2026-02-17 23:36 ` Nhat Pham
2026-02-10 21:58 ` Chris Li
2026-02-20 21:05 ` [PATCH] vswap: fix poor batching behavior of vswap free path Nhat Pham
2026-02-08 22:31 ` [PATCH v3 00/20] Virtual Swap Space Nhat Pham
2026-02-09 12:20 ` Chris Li
2026-02-10 2:36 ` Johannes Weiner [this message]
2026-02-10 21:24 ` Chris Li
2026-02-10 23:01 ` Johannes Weiner
2026-02-10 18:00 ` Nhat Pham
2026-02-10 23:17 ` Chris Li
2026-02-08 22:39 ` Nhat Pham
2026-02-09 2:22 ` [PATCH v3 01/20] mm/swap: decouple swap cache from physical swap infrastructure kernel test robot
2026-02-08 21:58 ` [PATCH v3 02/20] swap: rearrange the swap header file Nhat Pham
2026-02-08 21:58 ` [PATCH v3 03/20] mm: swap: add an abstract API for locking out swapoff Nhat Pham
2026-02-08 21:58 ` [PATCH v3 04/20] zswap: add new helpers for zswap entry operations Nhat Pham
2026-02-08 21:58 ` [PATCH v3 05/20] mm/swap: add a new function to check if a swap entry is in swap cached Nhat Pham
2026-02-08 21:58 ` [PATCH v3 06/20] mm: swap: add a separate type for physical swap slots Nhat Pham
2026-02-08 21:58 ` [PATCH v3 07/20] mm: create scaffolds for the new virtual swap implementation Nhat Pham
2026-02-08 21:58 ` [PATCH v3 08/20] zswap: prepare zswap for swap virtualization Nhat Pham
2026-02-08 21:58 ` [PATCH v3 09/20] mm: swap: allocate a virtual swap slot for each swapped out page Nhat Pham
2026-02-09 17:12 ` kernel test robot
2026-02-11 13:42 ` kernel test robot
2026-02-08 21:58 ` [PATCH v3 10/20] swap: move swap cache to virtual swap descriptor Nhat Pham
2026-02-08 21:58 ` [PATCH v3 11/20] zswap: move zswap entry management to the " Nhat Pham
2026-02-08 21:58 ` [PATCH v3 12/20] swap: implement the swap_cgroup API using virtual swap Nhat Pham
2026-02-08 21:58 ` [PATCH v3 13/20] swap: manage swap entry lifecycle at the virtual swap layer Nhat Pham
2026-02-08 21:58 ` [PATCH v3 14/20] mm: swap: decouple virtual swap slot from backing store Nhat Pham
2026-02-08 21:58 ` [PATCH v3 15/20] zswap: do not start zswap shrinker if there is no physical swap slots Nhat Pham
2026-02-08 21:58 ` [PATCH v3 16/20] swap: do not unnecesarily pin readahead swap entries Nhat Pham
2026-02-08 21:58 ` [PATCH v3 17/20] swapfile: remove zeromap bitmap Nhat Pham
2026-02-08 21:58 ` [PATCH v3 18/20] memcg: swap: only charge physical swap slots Nhat Pham
2026-02-09 2:01 ` kernel test robot
2026-02-09 2:12 ` kernel test robot
2026-02-08 21:58 ` [PATCH v3 19/20] swap: simplify swapoff using virtual swap Nhat Pham
2026-02-08 21:58 ` [PATCH v3 20/20] swapfile: replace the swap map with bitmaps Nhat Pham
2026-02-08 22:51 ` [PATCH v3 00/20] Virtual Swap Space Nhat Pham
2026-02-12 12:23 ` David Hildenbrand (Arm)
2026-02-12 17:29 ` Nhat Pham
2026-02-12 17:39 ` Nhat Pham
2026-02-12 20:11 ` David Hildenbrand (Arm)
2026-02-12 17:41 ` David Hildenbrand (Arm)
2026-02-12 17:45 ` Nhat Pham
2026-02-10 15:45 ` [syzbot ci] " syzbot ci
2026-02-25 0:09 ` [PATCH v3 00/20] " Askar Safin
2026-02-25 1:04 ` Askar Safin
-- strict thread matches above, loose matches on Subject: below --
2026-02-09 15:37 [PATCH v3 14/20] mm: swap: decouple virtual swap slot from backing store kernel test robot
2026-02-10 6:31 ` Dan Carpenter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aYqZppn4yDbTP2_q@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=cgroups@vger.kernel.org \
--cc=chengming.zhou@linux.dev \
--cc=chrisl@kernel.org \
--cc=christophe.leroy@csgroup.eu \
--cc=gourry@gourry.net \
--cc=huang.ying.caritas@gmail.com \
--cc=hughd@google.com \
--cc=jannh@google.com \
--cc=joshua.hahnjy@gmail.com \
--cc=kasong@tencent.com \
--cc=kernel-team@meta.com \
--cc=len.brown@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pm@vger.kernel.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=npache@redhat.com \
--cc=nphamcs@gmail.com \
--cc=osalvador@suse.de \
--cc=pavel@kernel.org \
--cc=peterx@redhat.com \
--cc=pfalcato@suse.de \
--cc=rafael@kernel.org \
--cc=riel@surriel.com \
--cc=roman.gushchin@linux.dev \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=viro@zeniv.linux.org.uk \
--cc=weixugc@google.com \
--cc=yosry.ahmed@linux.dev \
--cc=yuanchu@google.com \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.