From: Matthew Wilcox <willy@infradead.org>
To: Chris Li <chrisl@kernel.org>
Cc: Karim Manaouil <kmanaouil.dev@gmail.com>, Jan Kara <jack@suse.cz>,
Chuanhua Han <hanchuanhua@oppo.com>,
linux-mm <linux-mm@kvack.org>,
lsf-pc@lists.linux-foundation.org, ryan.roberts@arm.com,
21cnbao@gmail.com, david@redhat.com
Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Swap Abstraction "the pony"
Date: Wed, 29 May 2024 04:57:33 +0100 [thread overview]
Message-ID: <ZlanrUntADvnJWUY@casper.infradead.org> (raw)
In-Reply-To: <CANeU7QnoKUSdMjOGNFWFueH4LG+mj8J0Ezp_KetHhHUr2_pC_w@mail.gmail.com>
On Tue, May 21, 2024 at 01:40:56PM -0700, Chris Li wrote:
> > Filesystems already implemented a lot of solutions for fragmentation
> > avoidance that are more apropriate for slow storage media.
>
> Swap and file systems have very different requirements and usage
> patterns and IO patterns.
Should they, though? Filesystems noticed that handling pages in LRU
order was inefficient and so they stopped doing that (see the removal
of aops->writepage in favour of ->writepages, along with where each are
called from). Maybe it's time for swap to start doing writes in the order
of virtual addresses within a VMA, instead of LRU order.
Indeed, if we're open to radical ideas, the LRU sucks. A physical scan
is 40x faster:
https://lore.kernel.org/linux-mm/ZTc7SHQ4RbPkD3eZ@casper.infradead.org/
> One challenging aspect is that the current swap back end has a very
> low per swap entry memory overhead. It is about 1 byte (swap_map), 2
> byte (swap cgroup), 8 byte(swap cache pointer). The inode struct is
> more than 64 bytes per file. That is a big jump if you map a swap
> entry to a file. If you map more than one swap entry to a file, then
> you need to track the mapping of file offset to swap entry, and the
> reverse lookup of swap entry to a file with offset. Whichever way you
> cut it, it will significantly increase the per swap entry memory
> overhead.
Not necessarily, no. If your workload uses a lot of order-2, order-4
and order-9 folios, then the current scheme is using 11 bytes per page,
so 44 bytes per order-2 folio, 176 per order-4 folio and 5632 per
order-9 folio. That's a lot of bytes we can use for an extent-based
scheme.
Also, why would you compare the size of an inode to the size of an
inode? inode is ~equivalent to an anon_vma, not to a swap entry.
next prev parent reply other threads:[~2024-05-29 3:57 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-01 9:24 [LSF/MM/BPF TOPIC] Swap Abstraction "the pony" Chris Li
2024-03-01 9:53 ` Nhat Pham
2024-03-01 18:57 ` Chris Li
2024-03-04 22:58 ` Matthew Wilcox
2024-03-05 3:23 ` Chengming Zhou
2024-03-05 7:44 ` Chris Li
2024-03-05 8:15 ` Chengming Zhou
2024-03-05 18:24 ` Chris Li
2024-03-05 9:32 ` Nhat Pham
2024-03-05 9:52 ` Chengming Zhou
2024-03-05 10:55 ` Nhat Pham
2024-03-05 19:20 ` Chris Li
2024-03-05 20:56 ` Jared Hulbert
2024-03-05 21:38 ` Jared Hulbert
2024-03-05 21:58 ` Chris Li
2024-03-06 4:16 ` Jared Hulbert
2024-03-06 5:50 ` Chris Li
[not found] ` <CA+ZsKJ7JE56NS6hu4L_uyywxZO7ixgftvfKjdND9e5SOyn+72Q@mail.gmail.com>
2024-03-06 18:16 ` Chris Li
2024-03-06 22:44 ` Jared Hulbert
2024-03-07 0:46 ` Chris Li
2024-03-07 8:57 ` Jared Hulbert
2024-03-06 1:33 ` Barry Song
2024-03-04 18:43 ` Kairui Song
2024-03-04 22:03 ` Jared Hulbert
2024-03-04 22:47 ` Chris Li
2024-03-04 22:36 ` Chris Li
2024-03-06 1:15 ` Barry Song
2024-03-06 2:59 ` Chris Li
2024-03-06 6:05 ` Barry Song
2024-03-06 17:56 ` Chris Li
2024-03-06 21:29 ` Barry Song
2024-03-08 8:55 ` David Hildenbrand
2024-03-07 7:56 ` Chuanhua Han
2024-03-07 14:03 ` [Lsf-pc] " Jan Kara
2024-03-07 21:06 ` Jared Hulbert
2024-03-07 21:17 ` Barry Song
2024-03-08 0:14 ` Jared Hulbert
2024-03-08 0:53 ` Barry Song
2024-03-14 9:03 ` Jan Kara
2024-05-16 15:04 ` Zi Yan
2024-05-17 3:48 ` Chris Li
2024-03-14 8:52 ` Jan Kara
2024-03-08 2:02 ` Chuanhua Han
2024-03-14 8:26 ` Jan Kara
2024-03-14 11:19 ` Chuanhua Han
2024-05-15 23:07 ` Chris Li
2024-05-16 7:16 ` Chuanhua Han
2024-05-17 12:12 ` Karim Manaouil
2024-05-21 20:40 ` Chris Li
2024-05-28 7:08 ` Jared Hulbert
2024-05-29 3:36 ` Chris Li
2024-05-29 3:57 ` Matthew Wilcox [this message]
2024-05-29 6:50 ` Chris Li
2024-05-29 12:33 ` Matthew Wilcox
2024-05-30 22:53 ` Chris Li
2024-05-31 3:12 ` Matthew Wilcox
2024-06-01 0:43 ` Chris Li
2024-05-31 1:56 ` Yuanchu Xie
2024-05-31 16:51 ` Chris Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZlanrUntADvnJWUY@casper.infradead.org \
--to=willy@infradead.org \
--cc=21cnbao@gmail.com \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=hanchuanhua@oppo.com \
--cc=jack@suse.cz \
--cc=kmanaouil.dev@gmail.com \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=ryan.roberts@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).