From: Matthew Wilcox <willy@infradead.org>
To: linux-fsdevel@vger.kernel.org
Cc: Dave Chinner <david@fromorbit.com>,
"Darrick J. Wong" <darrick.wong@oracle.com>,
Christoph Hellwig <hch@lst.de>,
Chuck Lever <chuck.lever@oracle.com>, Jan Kara <jack@suse.cz>
Subject: VFS caching of file extents
Date: Wed, 28 Aug 2024 20:34:00 +0100 [thread overview]
Message-ID: <Zs97qHI-wA1a53Mm@casper.infradead.org> (raw)
Today it is the responsibility of each filesystem to maintain the mapping
from file logical addresses to disk blocks (*). There are various ways
to query that information, eg calling get_block() or using iomap.
What if we pull that information up into the VFS? Filesystems obviously
_control_ that information, so need to be able to invalidate entries.
And we wouldn't want to store all extents in the VFS all the time, so
would need to have a way to call into the filesystem to populate ranges
of files. We'd need to decide how to lock/protect that information
-- a per-file lock? A per-extent lock? No locking, just a seqcount?
We need a COW bit in the extent which tells the user that this extent
is fine for reading through, but if there's a write to be done then the
filesystem needs to be asked to create a new extent.
There are a few problems I think this can solve. One is efficient
implementation of NFS READPLUS. Another is the callback from iomap
to the filesystem when doing buffered writeback. A third is having a
common implementation of FIEMAP. I've heard rumours that FUSE would like
something like this, and maybe there are other users that would crop up.
Anyway, this is as far as my thinking has got on this topic for now.
Maybe there's a good idea here, maybe it's all a huge overengineered mess
waiting to happen. I'm sure other people know this area of filesystems
better than I do.
(*) For block device filesystems. Obviously network filesystems and
synthetic filesystems don't care and can stop reading now. Umm, unless
maybe they _want_ to use it, eg maybe there's a sharded thing going on and
the fs wants to store information about each shard in the extent cache?
next reply other threads:[~2024-08-28 19:34 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-28 19:34 Matthew Wilcox [this message]
2024-08-28 19:46 ` VFS caching of file extents Chuck Lever
2024-08-28 19:50 ` Matthew Wilcox
2024-08-29 6:05 ` Dave Chinner
2024-08-28 20:30 ` Josef Bacik
2024-08-28 23:46 ` Dave Chinner
2024-08-29 1:57 ` Darrick J. Wong
2024-08-29 4:00 ` Christoph Hellwig
2024-08-29 13:52 ` Chuck Lever III
2024-08-29 22:36 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zs97qHI-wA1a53Mm@casper.infradead.org \
--to=willy@infradead.org \
--cc=chuck.lever@oracle.com \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).