From: Josef Bacik <josef@toxicpanda.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-fsdevel@vger.kernel.org, amir73il@gmail.com,
miklos@szeredi.hu, joannelkoong@gmail.com, bschubert@ddn.com,
willy@infradead.org
Subject: Re: [PATCH v2 06/11] fuse: use iomap for writeback cache buffered writes
Date: Thu, 29 Aug 2024 10:23:56 -0400 [thread overview]
Message-ID: <20240829142356.GA3067112@perftesting> (raw)
In-Reply-To: <Zs/wI17fs4qHoFOF@dread.disaster.area>
On Thu, Aug 29, 2024 at 01:50:59PM +1000, Dave Chinner wrote:
> On Wed, Aug 28, 2024 at 05:13:56PM -0400, Josef Bacik wrote:
> > We're currently using the old ->write_begin()/->write_end() method of
> > doing buffered writes. This isn't a huge deal for fuse since we
> > basically just want to copy the pages and move on, but the iomap
> > infrastructure gives us access to having huge folios. Rework the
> > buffered write path when we have writeback cache to use the iomap
> > buffered write code, the ->get_folio() callback now handles the work
> > that we did in ->write_begin(), the rest of the work is handled inside
> > of iomap so we don't need a replacement for ->write_end.
> >
> > This does bring BLOCK as a dependency, as the buffered write part of
> > iomap requires CONFIG_BLOCK. This could be shed if we reworked the file
> > write iter portion of the buffered write path was separated out to not
> > need BLOCK.
> >
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> > fs/fuse/Kconfig | 2 +
> > fs/fuse/file.c | 154 +++++++++++++++++++++---------------------------
> > 2 files changed, 68 insertions(+), 88 deletions(-)
> >
> > diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
> > index 8674dbfbe59d..8a799324d7bd 100644
> > --- a/fs/fuse/Kconfig
> > +++ b/fs/fuse/Kconfig
> > @@ -1,7 +1,9 @@
> > # SPDX-License-Identifier: GPL-2.0-only
> > config FUSE_FS
> > tristate "FUSE (Filesystem in Userspace) support"
> > + depends on BLOCK
> > select FS_POSIX_ACL
> > + select FS_IOMAP
> > help
> > With FUSE it is possible to implement a fully functional filesystem
> > in a userspace program.
> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > index ab531a4694b3..af91043b44d7 100644
> > --- a/fs/fuse/file.c
> > +++ b/fs/fuse/file.c
> > @@ -21,6 +21,7 @@
> > #include <linux/filelock.h>
> > #include <linux/splice.h>
> > #include <linux/task_io_accounting_ops.h>
> > +#include <linux/iomap.h>
> >
> > static int fuse_send_open(struct fuse_mount *fm, u64 nodeid,
> > unsigned int open_flags, int opcode,
> > @@ -1420,6 +1421,63 @@ static void fuse_dio_unlock(struct kiocb *iocb, bool exclusive)
> > }
> > }
> >
> > +static struct folio *fuse_iomap_get_folio(struct iomap_iter *iter,
> > + loff_t pos, unsigned len)
> > +{
> > + struct file *file = (struct file *)iter->private;
> > + struct inode *inode = iter->inode;
> > + struct folio *folio;
> > + loff_t fsize;
> > +
> > + folio = iomap_get_folio(iter, pos, len);
> > + if (IS_ERR(folio))
> > + return folio;
> > +
> > + fuse_wait_on_folio_writeback(inode, folio);
> > +
> > + if (folio_test_uptodate(folio))
> > + return folio;
> > +
> > + /*
> > + * If we're going to write past EOF then avoid the read, but zero the
> > + * whole thing and mark it uptodate so that if we get a short write we
> > + * don't try to re-read this page, we just carry on.
> > + */
> > + fsize = i_size_read(inode);
> > + if (fsize <= folio_pos(folio)) {
> > + folio_zero_range(folio, 0, folio_size(folio));
>
> The comment doesn't match what this does - the folio is not marked
> uptodate at all.
I'll update the comment, it gets marked uptodate in __iomap_write_end() once the
write is complete.
>
> > + } else {
> > + int err = fuse_do_readpage(file, &folio->page);
>
> readpage on a large folio? does that work?
I haven't done the work to enable large folios yet, this is just the prep stuff.
Supporting large folios is going to take a fair bit of work, so I'm getting the
ball rolling with this prep series.
>
> > + if (err) {
> > + folio_unlock(folio);
> > + folio_put(folio);
> > + return ERR_PTR(err);
> > + }
> > + }
>
> Also, why do this here when __iomap_write_begin() will do all the
> sub-folio zeroing and read IO on the folio?
>
I looked long and hard at iomap because I thought it would, but it turns out it
won't work for fuse. I could be totally wrong, but looking at iomap it will
allocate an ifs because it assumes this is sub-folio blocksize, but we aren't,
and don't really want to incur that pain. Additionally it does
iomap_read_folio_sync() to read in the folio, which just does a bio, which
obviously doesn't work on fuse. Again totally expecting to be told I'm stupid
in some way that I missed, but it seemed like iomap won't do what we need it to
do here, and it's simple enough to handle the zeroing here for ourselves.
> > +
> > + return folio;
> > +}
> > +
> > +static const struct iomap_folio_ops fuse_iomap_folio_ops = {
> > + .get_folio = fuse_iomap_get_folio,
> > +};
> > +
> > +static int fuse_iomap_begin_write(struct inode *inode, loff_t pos, loff_t length,
> > + unsigned flags, struct iomap *iomap,
> > + struct iomap *srcmap)
> > +{
> > + iomap->type = IOMAP_DELALLOC;
> > + iomap->addr = IOMAP_NULL_ADDR;
> > + iomap->offset = pos;
> > + iomap->length = length;
> > + iomap->folio_ops = &fuse_iomap_folio_ops;
> > + return 0;
> > +}
>
> What's the reason for using IOMAP_DELALLOC for these mappings? I'm
> not saying it is wrong, I just don't know enough about fuse to
> understand is this is valid or not because there are no iomap-based
> writeback hooks being added here....
At the time it was "oh we're doing what equates to delalloc, clearly this should
be marked as delalloc". Now that I have been in this code longer I realize this
is supposed to be "what does this range look like now", so I suppose the "right"
thing to do here is use IOMAP_HOLE? Thanks,
Josef
next prev parent reply other threads:[~2024-08-29 14:23 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-28 21:13 [PATCH v2 00/11] fuse: convert to using folios and iomap Josef Bacik
2024-08-28 21:13 ` [PATCH v2 01/11] fuse: convert readahead to use folios Josef Bacik
2024-08-28 21:13 ` [PATCH v2 02/11] fuse: convert fuse_send_write_pages " Josef Bacik
2024-08-28 21:13 ` [PATCH v2 03/11] fuse: convert fuse_fill_write_pages " Josef Bacik
2024-08-28 21:13 ` [PATCH v2 04/11] fuse: convert fuse_page_mkwrite " Josef Bacik
2024-08-28 21:13 ` [PATCH v2 05/11] fuse: use kiocb_modified in buffered write path Josef Bacik
2024-09-10 21:51 ` Bernd Schubert
2024-08-28 21:13 ` [PATCH v2 06/11] fuse: use iomap for writeback cache buffered writes Josef Bacik
2024-08-29 3:50 ` Dave Chinner
2024-08-29 14:23 ` Josef Bacik [this message]
2024-08-29 23:30 ` Bernd Schubert
2024-08-28 21:13 ` [PATCH v2 07/11] fuse: convert fuse_do_readpage to use folios Josef Bacik
2024-08-28 21:13 ` [PATCH v2 08/11] fuse: convert fuse_writepage_need_send to take a folio Josef Bacik
2024-08-28 21:13 ` [PATCH v2 09/11] fuse: use the folio based vmstat helpers Josef Bacik
2024-08-28 21:14 ` [PATCH v2 10/11] fuse: convert fuse_retrieve to use folios Josef Bacik
2024-08-28 21:14 ` [PATCH v2 11/11] fuse: convert fuse_notify_store " Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240829142356.GA3067112@perftesting \
--to=josef@toxicpanda.com \
--cc=amir73il@gmail.com \
--cc=bschubert@ddn.com \
--cc=david@fromorbit.com \
--cc=joannelkoong@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).