From: Johannes Weiner <hannes@cmpxchg.org>
To: Usama Arif <usamaarif642@gmail.com>
Cc: Nhat Pham <nphamcs@gmail.com>,
linux-mm@kvack.org, akpm@linux-foundation.org, hughd@google.com,
yosry.ahmed@linux.dev, mhocko@kernel.org,
roman.gushchin@linux.dev, shakeel.butt@linux.dev,
muchun.song@linux.dev, len.brown@intel.com,
chengming.zhou@linux.dev, kasong@tencent.com, chrisl@kernel.org,
huang.ying.caritas@gmail.com, ryan.roberts@arm.com,
viro@zeniv.linux.org.uk, baohua@kernel.org, osalvador@suse.de,
lorenzo.stoakes@oracle.com, christophe.leroy@csgroup.eu,
pavel@kernel.org, kernel-team@meta.com,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
linux-pm@vger.kernel.org
Subject: Re: [RFC PATCH 00/14] Virtual Swap Space
Date: Tue, 8 Apr 2025 11:45:47 -0400 [thread overview]
Message-ID: <20250408154547.GC816@cmpxchg.org> (raw)
In-Reply-To: <983965b6-2262-4f72-a672-39085dcdaa3c@gmail.com>
On Tue, Apr 08, 2025 at 02:04:06PM +0100, Usama Arif wrote:
>
>
> On 08/04/2025 00:42, Nhat Pham wrote:
> >
> > V. Benchmarking
> >
> > As a proof of concept, I run the prototype through some simple
> > benchmarks:
> >
> > 1. usemem: 16 threads, 2G each, memory.max = 16G
> >
> > I benchmarked the following usemem commands:
> >
> > time usemem --init-time -w -O -s 10 -n 16 2g
> >
> > Baseline:
> > real: 33.96s
> > user: 25.31s
> > sys: 341.09s
> > average throughput: 111295.45 KB/s
> > average free time: 2079258.68 usecs
> >
> > New Design:
> > real: 35.87s
> > user: 25.15s
> > sys: 373.01s
> > average throughput: 106965.46 KB/s
> > average free time: 3192465.62 usecs
> >
> > To root cause this regression, I ran perf on the usemem program, as
> > well as on the following stress-ng program:
> >
> > perf record -ag -e cycles -G perf_cg -- ./stress-ng/stress-ng --pageswap $(nproc) --pageswap-ops 100000
> >
> > and observed the (predicted) increase in lock contention on swap cache
> > accesses. This regression is alleviated if I put together the
> > following hack: limit the virtual swap space to a sufficient size for
> > the benchmark, range partition the swap-related data structures (swap
> > cache, zswap tree, etc.) based on the limit, and distribute the
> > allocation of virtual swap slotss among these partitions (on a per-CPU
> > basis):
> >
> > real: 34.94s
> > user: 25.28s
> > sys: 360.25s
> > average throughput: 108181.15 KB/s
> > average free time: 2680890.24 usecs
> >
> > As mentioned above, I will implement proper dynamic swap range
> > partitioning in a follow up work.
> >
> > 2. Kernel building: zswap enabled, 52 workers (one per processor),
> > memory.max = 3G.
> >
> > Baseline:
> > real: 183.55s
> > user: 5119.01s
> > sys: 655.16s
> >
> > New Design:
> > real: mean: 184.5s
> > user: mean: 5117.4s
> > sys: mean: 695.23s
> >
> > New Design (Static Partition)
> > real: 183.95s
> > user: 5119.29s
> > sys: 664.24s
> >
>
> Hi Nhat,
>
> Thanks for the patches! I have glanced over a couple of them, but this was the main question that came to my mind.
>
> Just wanted to check if you had a look at the memory regression during these benchmarks?
>
> Also what is sizeof(swp_desc)? Maybe we can calculate the memory overhead as sizeof(swp_desc) * swap size/PAGE_SIZE?
>
> For a 64G swap that is filled with private anon pages, the overhead in MB might be (sizeof(swp_desc) in bytes * 16M) - 16M (zerobitmap) - 16M*8 (swap map)?
>
> This looks like a sizeable memory regression?
One thing to keep in mind is that the swap descriptor is currently
blatantly explicit, and many conversions and optimizations have not
been done yet. There are some tradeoffs made here regarding code
reviewability, but I agree it makes it hard to see what this would
look like fully realized.
I think what's really missing is an analysis of what the goal is and
what the overhead will be then.
The swapin path currently consults the swapcache, then the zeromap,
then zswap, and finally the backend. The external swap_cgroup array is
consulted to determine who to charge for the new page.
With vswap, the descriptor is looked up and resolves to a type,
location, cgroup ownership, a refcount. This means it replaces the
swapcache, the zeromap, the cgroup map, and largely the swap_map.
Nhat was not quite sure yet if the swap_map can be a single bit per
entry or two bits to represent bad slots. In any case, it's a large
reduction in static swap space overhead, and eliminates the tricky
swap count continuation code.
next prev parent reply other threads:[~2025-04-08 15:45 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-07 23:42 [RFC PATCH 00/14] Virtual Swap Space Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 01/14] swapfile: rearrange functions Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 02/14] mm: swap: add an abstract API for locking out swapoff Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 03/14] mm: swap: add a separate type for physical swap slots Nhat Pham
2025-04-08 14:15 ` Johannes Weiner
2025-04-08 15:11 ` Nhat Pham
2025-04-22 14:41 ` Yosry Ahmed
[not found] ` <6807ab09.670a0220.152ca3.502fSMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-22 15:50 ` Nhat Pham
2025-04-22 18:50 ` Kairui Song
2025-04-07 23:42 ` [RFC PATCH 04/14] mm: swap: swap cache support for virtualized swap Nhat Pham
2025-04-08 15:00 ` Johannes Weiner
2025-04-08 15:34 ` Nhat Pham
2025-04-08 15:43 ` Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 05/14] zswap: unify zswap tree " Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 06/14] mm: swap: allocate a virtual swap slot for each swapped out page Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 07/14] swap: implement the swap_cgroup API using virtual swap Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 08/14] swap: manage swap entry lifetime at the virtual swap layer Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 09/14] swap: implement locking out swapoff using virtual swap slot Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 10/14] mm: swap: decouple virtual swap slot from backing store Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 11/14] memcg: swap: only charge physical swap slots Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 12/14] vswap: support THP swapin and batch free_swap_and_cache Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 13/14] swap: simplify swapoff using virtual swap Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 14/14] zswap: do not start zswap shrinker if there is no physical swap slots Nhat Pham
2025-04-08 13:04 ` [RFC PATCH 00/14] Virtual Swap Space Usama Arif
2025-04-08 15:20 ` Nhat Pham
2025-04-08 15:45 ` Johannes Weiner [this message]
2025-04-08 16:25 ` Nhat Pham
2025-04-08 16:27 ` Nhat Pham
2025-04-08 16:22 ` Kairui Song
2025-04-08 16:47 ` Nhat Pham
2025-04-08 16:59 ` Kairui Song
2025-04-22 14:43 ` Yosry Ahmed
2025-04-22 14:56 ` Yosry Ahmed
[not found] ` <6807afd0.a70a0220.2ae8b9.e07cSMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-22 17:15 ` Nhat Pham
2025-04-22 19:29 ` Nhat Pham
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250408154547.GC816@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=chengming.zhou@linux.dev \
--cc=chrisl@kernel.org \
--cc=christophe.leroy@csgroup.eu \
--cc=huang.ying.caritas@gmail.com \
--cc=hughd@google.com \
--cc=kasong@tencent.com \
--cc=kernel-team@meta.com \
--cc=len.brown@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pm@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=osalvador@suse.de \
--cc=pavel@kernel.org \
--cc=roman.gushchin@linux.dev \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=usamaarif642@gmail.com \
--cc=viro@zeniv.linux.org.uk \
--cc=yosry.ahmed@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox