Re: [LSF/MM/BPF TOPIC] Large folios, swap and fscache

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Matthew Wilcox <willy@infradead.org>
To: David Howells <dhowells@redhat.com>
Cc: lsf-pc@lists.linux-foundation.org, netfs@lists.linux.dev,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [LSF/MM/BPF TOPIC] Large folios, swap and fscache
Date: Fri, 2 Feb 2024 19:22:11 +0000	[thread overview]
Message-ID: <Zb1A44esSQVJOezg@casper.infradead.org> (raw)
In-Reply-To: <2761655.1706889464@warthog.procyon.org.uk>

On Fri, Feb 02, 2024 at 03:57:44PM +0000, David Howells wrote:
> Matthew Wilcox <willy@infradead.org> wrote:
> 
> > So my modest proposal is that we completely rearchitect how we handle
> > swap.  Instead of putting swp entries in the page tables (and in shmem's
> > case in the page cache), we turn swap into an (object, offset) lookup
> > (just like a filesystem).  That means that each anon_vma becomes its
> > own swap object and each shmem inode becomes its own swap object.
> > The swap system can then borrow techniques from whichever filesystem
> > it likes to do (object, offset, length) -> n x (device, block) mappings.
> 
> That's basically what I'm suggesting, I think, but offloading the mechanics
> down to a filesystem.  That would be fine with me.  bcachefs is an {key,val}
> store right?

Hmm.  That's not a bad idea.  So instead of having a swapfile, we
could create a swap directory on an existing filesystem.  Or if we
want to partition the drive and have a swap partition we just
mkfs.favourite that and tell it that root is the swap directory.

I think this means we do away with the swap cache?  If the page has been
brought back in, we'd be able to find it in the anon_vma's page cache
rather than having to search the global swap cache.

> > I think my proposal above works for you?  For each file you want to cache,
> > create a swap object, and then tell swap when you want to read/write to
> > the local swap object.  What you do need is to persist the objects over
> > a power cycle.  That shouldn't be too hard ... after all, filesystems
> > manage to do it.
> 
> Sure - but there is an integrity constraint that doesn't exist with swap.
> 
> There is also an additional feature of fscache: unless the cache entry is
> locked in the cache (e.g. we're doing diconnected operation), we can throw
> away an object from fscache and recycle it if we need space.  In fact, this is
> the way OpenAFS works: every write transaction done on a file/dir on the
> server is done atomically and is given a monotonically increasing data version
> number that is then used as part of the index key in the cache.  So old
> versions of the data get recycled as the cache needs to make space.
> 
> Which also means that if swap needs more space, it can just kick stuff out of
> fscache if it is not locked in.

Ah, more requirements ;-)

> > All we need to do is figure out how to name the lookup (I don't think we
> > need to use strings to name the swap object, but obviously we could).  Maybe
> > it's just a stream of bytes.
> 
> A binary blob would probably be better.
> 
> I would use a separate index to map higher level organisations, such as
> cell+volume in afs or the server address + share name in cifs to an index
> number that can be used in the cache.
> 
> Further, I could do with a way to invalidate all objects matching a particular
> subkey.

That seems to map to a directory hierarchy?

So, named swap objects for fscache; anonymous ones for anon memory?

next prev parent reply	other threads:[~2024-02-02 19:22 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-02  9:09 [LSF/MM/BPF TOPIC] Large folios, swap and fscache David Howells
2024-02-02 14:29 ` Matthew Wilcox
2024-02-22 19:02   ` Luis Chamberlain
2024-02-22 19:16     ` Yosry Ahmed
2024-02-22 22:26     ` Chris Li
2024-02-29 19:31   ` Chris Li
2024-02-02 15:57 ` David Howells
2024-02-02 19:22   ` Matthew Wilcox [this message]
2024-02-03  5:13 ` Gao Xiang
2024-02-04 23:45 ` Dave Chinner
2024-02-22 22:45 ` Chris Li
2024-02-23  3:00   ` Andreas Dilger
2024-02-23  3:46     ` Chris Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zb1A44esSQVJOezg@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=dhowells@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=netfs@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).