linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-fsdevel@vger.kernel.org, Dave Chinner <david@fromorbit.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Christoph Hellwig <hch@lst.de>,
	Chuck Lever <chuck.lever@oracle.com>, Jan Kara <jack@suse.cz>
Subject: Re: VFS caching of file extents
Date: Wed, 28 Aug 2024 16:30:26 -0400	[thread overview]
Message-ID: <20240828203026.GA2974106@perftesting> (raw)
In-Reply-To: <Zs97qHI-wA1a53Mm@casper.infradead.org>

On Wed, Aug 28, 2024 at 08:34:00PM +0100, Matthew Wilcox wrote:
> Today it is the responsibility of each filesystem to maintain the mapping
> from file logical addresses to disk blocks (*).  There are various ways
> to query that information, eg calling get_block() or using iomap.
> 
> What if we pull that information up into the VFS?  Filesystems obviously
> _control_ that information, so need to be able to invalidate entries.
> And we wouldn't want to store all extents in the VFS all the time, so
> would need to have a way to call into the filesystem to populate ranges
> of files.  We'd need to decide how to lock/protect that information
> -- a per-file lock?  A per-extent lock?  No locking, just a seqcount?
> We need a COW bit in the extent which tells the user that this extent
> is fine for reading through, but if there's a write to be done then the
> filesystem needs to be asked to create a new extent.
> 

At least for btrfs we store a lot of things in our extent map, so I'm not sure
if everybody wants to share the overhead of the amount of information we keep
cached in these entries.

We also protect all that with an extent lock, which again I'm not entirely sure
everybody wants to adopt our extent locking.  If we pushed the locking
responsibility into the file system then hooray, but that makes the generic
implementation more complex.

> There are a few problems I think this can solve.  One is efficient
> implementation of NFS READPLUS.  Another is the callback from iomap
> to the filesystem when doing buffered writeback.  A third is having a
> common implementation of FIEMAP.  I've heard rumours that FUSE would like
> something like this, and maybe there are other users that would crop up.
> 

For us we actually stopped using our in memory cache for FIEMAP because it ended
up being way slower and kind of a pain to work with all the different ways we'll
update the cache based on io happening.  Our FIEMAP implementation just reads
the extents on disk because it's easier/cleaner to just walk through the btree
than the cache.

> Anyway, this is as far as my thinking has got on this topic for now.
> Maybe there's a good idea here, maybe it's all a huge overengineered mess
> waiting to happen.  I'm sure other people know this area of filesystems
> better than I do.

Maybe it's fine for simpler file systems, and it could probably be argued that
btrfs is a bit over-engineered in this case, but I worry it'll turn into one of
those "this seemed like a good idea at the time, but after we added all the
features everybody needed we ended up with something way more complex"
scenarios.  Thanks,

Josef

  parent reply	other threads:[~2024-08-28 20:30 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-28 19:34 VFS caching of file extents Matthew Wilcox
2024-08-28 19:46 ` Chuck Lever
2024-08-28 19:50   ` Matthew Wilcox
2024-08-29  6:05     ` Dave Chinner
2024-08-28 20:30 ` Josef Bacik [this message]
2024-08-28 23:46 ` Dave Chinner
2024-08-29  1:57 ` Darrick J. Wong
2024-08-29  4:00   ` Christoph Hellwig
2024-08-29 13:52     ` Chuck Lever III
2024-08-29 22:36       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240828203026.GA2974106@perftesting \
    --to=josef@toxicpanda.com \
    --cc=chuck.lever@oracle.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).