linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Chris Li <chrisl@kernel.org>
Cc: Karim Manaouil <kmanaouil.dev@gmail.com>, Jan Kara <jack@suse.cz>,
	Chuanhua Han <hanchuanhua@oppo.com>,
	linux-mm <linux-mm@kvack.org>,
	lsf-pc@lists.linux-foundation.org, ryan.roberts@arm.com,
	21cnbao@gmail.com, david@redhat.com
Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Swap Abstraction "the pony"
Date: Wed, 29 May 2024 04:57:33 +0100	[thread overview]
Message-ID: <ZlanrUntADvnJWUY@casper.infradead.org> (raw)
In-Reply-To: <CANeU7QnoKUSdMjOGNFWFueH4LG+mj8J0Ezp_KetHhHUr2_pC_w@mail.gmail.com>

On Tue, May 21, 2024 at 01:40:56PM -0700, Chris Li wrote:
> > Filesystems already implemented a lot of solutions for fragmentation
> > avoidance that are more apropriate for slow storage media.
> 
> Swap and file systems have very different requirements and usage
> patterns and IO patterns.

Should they, though?  Filesystems noticed that handling pages in LRU
order was inefficient and so they stopped doing that (see the removal
of aops->writepage in favour of ->writepages, along with where each are
called from).  Maybe it's time for swap to start doing writes in the order
of virtual addresses within a VMA, instead of LRU order.

Indeed, if we're open to radical ideas, the LRU sucks.  A physical scan
is 40x faster:
https://lore.kernel.org/linux-mm/ZTc7SHQ4RbPkD3eZ@casper.infradead.org/

> One challenging aspect is that the current swap back end has a very
> low per swap entry memory overhead. It is about 1 byte (swap_map), 2
> byte (swap cgroup), 8 byte(swap cache pointer). The inode struct is
> more than 64 bytes per file. That is a big jump if you map a swap
> entry to a file. If you map more than one swap entry to a file, then
> you need to track the mapping of file offset to swap entry, and the
> reverse lookup of swap entry to a file with offset. Whichever way you
> cut it, it will significantly increase the per swap entry memory
> overhead.

Not necessarily, no.  If your workload uses a lot of order-2, order-4
and order-9 folios, then the current scheme is using 11 bytes per page,
so 44 bytes per order-2 folio, 176 per order-4 folio and 5632 per
order-9 folio.  That's a lot of bytes we can use for an extent-based
scheme.

Also, why would you compare the size of an inode to the size of an
inode?  inode is ~equivalent to an anon_vma, not to a swap entry.


  parent reply	other threads:[~2024-05-29  3:57 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-01  9:24 [LSF/MM/BPF TOPIC] Swap Abstraction "the pony" Chris Li
2024-03-01  9:53 ` Nhat Pham
2024-03-01 18:57   ` Chris Li
2024-03-04 22:58   ` Matthew Wilcox
2024-03-05  3:23     ` Chengming Zhou
2024-03-05  7:44       ` Chris Li
2024-03-05  8:15         ` Chengming Zhou
2024-03-05 18:24           ` Chris Li
2024-03-05  9:32         ` Nhat Pham
2024-03-05  9:52           ` Chengming Zhou
2024-03-05 10:55             ` Nhat Pham
2024-03-05 19:20               ` Chris Li
2024-03-05 20:56                 ` Jared Hulbert
2024-03-05 21:38         ` Jared Hulbert
2024-03-05 21:58           ` Chris Li
2024-03-06  4:16             ` Jared Hulbert
2024-03-06  5:50               ` Chris Li
     [not found]                 ` <CA+ZsKJ7JE56NS6hu4L_uyywxZO7ixgftvfKjdND9e5SOyn+72Q@mail.gmail.com>
2024-03-06 18:16                   ` Chris Li
2024-03-06 22:44                     ` Jared Hulbert
2024-03-07  0:46                       ` Chris Li
2024-03-07  8:57                         ` Jared Hulbert
2024-03-06  1:33   ` Barry Song
2024-03-04 18:43 ` Kairui Song
2024-03-04 22:03   ` Jared Hulbert
2024-03-04 22:47     ` Chris Li
2024-03-04 22:36   ` Chris Li
2024-03-06  1:15 ` Barry Song
2024-03-06  2:59   ` Chris Li
2024-03-06  6:05     ` Barry Song
2024-03-06 17:56       ` Chris Li
2024-03-06 21:29         ` Barry Song
2024-03-08  8:55       ` David Hildenbrand
2024-03-07  7:56 ` Chuanhua Han
2024-03-07 14:03   ` [Lsf-pc] " Jan Kara
2024-03-07 21:06     ` Jared Hulbert
2024-03-07 21:17       ` Barry Song
2024-03-08  0:14         ` Jared Hulbert
2024-03-08  0:53           ` Barry Song
2024-03-14  9:03         ` Jan Kara
2024-05-16 15:04           ` Zi Yan
2024-05-17  3:48             ` Chris Li
2024-03-14  8:52       ` Jan Kara
2024-03-08  2:02     ` Chuanhua Han
2024-03-14  8:26       ` Jan Kara
2024-03-14 11:19         ` Chuanhua Han
2024-05-15 23:07           ` Chris Li
2024-05-16  7:16             ` Chuanhua Han
2024-05-17 12:12     ` Karim Manaouil
2024-05-21 20:40       ` Chris Li
2024-05-28  7:08         ` Jared Hulbert
2024-05-29  3:36           ` Chris Li
2024-05-29  3:57         ` Matthew Wilcox [this message]
2024-05-29  6:50           ` Chris Li
2024-05-29 12:33             ` Matthew Wilcox
2024-05-30 22:53               ` Chris Li
2024-05-31  3:12                 ` Matthew Wilcox
2024-06-01  0:43                   ` Chris Li
2024-05-31  1:56               ` Yuanchu Xie
2024-05-31 16:51                 ` Chris Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZlanrUntADvnJWUY@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=21cnbao@gmail.com \
    --cc=chrisl@kernel.org \
    --cc=david@redhat.com \
    --cc=hanchuanhua@oppo.com \
    --cc=jack@suse.cz \
    --cc=kmanaouil.dev@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=ryan.roberts@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).