From: Dave Chinner <david@fromorbit.com>
To: John Garry <john.g.garry@oracle.com>
Cc: Ojaswin Mujoo <ojaswin@linux.ibm.com>,
linux-ext4@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
Ritesh Harjani <ritesh.list@gmail.com>,
linux-kernel@vger.kernel.org,
"Darrick J . Wong" <djwong@kernel.org>,
linux-block@vger.kernel.org, linux-xfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org, dchinner@redhat.com
Subject: Re: [RFC 1/7] iomap: Don't fall back to buffered write if the write is atomic
Date: Sat, 2 Dec 2023 09:07:34 +1100 [thread overview]
Message-ID: <ZWpZJicSjW2XqMmp@dread.disaster.area> (raw)
In-Reply-To: <85d1b27c-f4ef-43dd-8eed-f497817ab86d@oracle.com>
On Fri, Dec 01, 2023 at 10:42:57AM +0000, John Garry wrote:
> On 30/11/2023 21:10, Dave Chinner wrote:
> > On Thu, Nov 30, 2023 at 07:23:09PM +0530, Ojaswin Mujoo wrote:
> > > Currently, iomap only supports atomic writes for direct IOs and there is
> > > no guarantees that a buffered IO will be atomic. Hence, if the user has
> > > explicitly requested the direct write to be atomic and there's a
> > > failure, return -EIO instead of falling back to buffered IO.
> > >
> > > Signed-off-by: Ojaswin Mujoo<ojaswin@linux.ibm.com>
> > > ---
> > > fs/iomap/direct-io.c | 8 +++++++-
> > > 1 file changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > > index 6ef25e26f1a1..3e7cd9bc8f4d 100644
> > > --- a/fs/iomap/direct-io.c
> > > +++ b/fs/iomap/direct-io.c
> > > @@ -662,7 +662,13 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> > > if (ret != -EAGAIN) {
> > > trace_iomap_dio_invalidate_fail(inode, iomi.pos,
> > > iomi.len);
> > > - ret = -ENOTBLK;
> > > + /*
> > > + * if this write was supposed to be atomic,
> > > + * return the err rather than trying to fall
> > > + * back to buffered IO.
> > > + */
> > > + if (!atomic_write)
> > > + ret = -ENOTBLK;
> > This belongs in the caller when it receives an -ENOTBLK from
> > iomap_dio_rw(). The iomap code is saying "this IO cannot be done
> > with direct IO" by returning this value, and then the caller can
> > make the determination of whether to run a buffered IO or not.
> >
> > For example, a filesystem might still be able to perform an atomic
> > IO via a COW-based buffered IO slow path. Sure, ext4 can't do this,
> > but the above patch would prevent filesystems that could from being
> > able to implement such a fallback....
>
> Sure, and I think that we need a better story for supporting buffered IO for
> atomic writes.
>
> Currently we have:
> - man pages tell us RWF_ATOMIC is only supported for direct IO
> - statx gives atomic write unit min/max, not explicitly telling us it's for
> direct IO
> - RWF_ATOMIC is ignored for !O_DIRECT
>
> So I am thinking of expanding statx support to enable querying of atomic
> write capabilities for buffered IO and direct IO separately.
You're over complicating this way too much by trying to restrict the
functionality down to just what you want to implement right now.
RWF_ATOMIC is no different to RWF_NOWAIT. The API doesn't decide
what can be supported - the filesystems themselves decide what part
of the API they can support and implement those pieces.
TO go back to RWF_NOWAIT, for a long time we (XFS) only supported
RWF_NOWAIT on DIO, and buffered reads and writes were given
-EOPNOTSUPP by the filesystem. Then other filesystems started
supporting DIO with RWF_NOWAIT. Then buffered read support was added
to the page cache and XFS, and as other filesystems were converted
they removed the RWF_NOWAIT exclusion check from their read IO
paths.
We are now in the same place with buffered write support for
RWF_NOWAIT. XFS, the page cache and iomap allow buffered writes w/
RWF_NOWAIT, but ext4, btrfs and f2fs still all return -EOPNOTSUPP
because they don't support non-blocking buffered writes yet.
This is the same model we should be applying with RWF_ATOMIC - we
know that over time we'll be able to expand support for atomic
writes across both direct and buffered IO, so we should not be
restricting the API or infrastructure to only allow RWF_ATOMIC w/
DIO. Just have the filesystems reject RWF_ATOMIC w/ -EOPNOTSUPP if
they don't support it, and for those that do it is conditional on
whther the filesystem supports it for the given type of IO being
done.
Seriously - an application can easily probe for RWF_ATOMIC support
without needing information to be directly exposed in statx() - just
open a O_TMPFILE, issue the type of RWF_ATOMIC IO you require to be
supported, and if it returns -EOPNOTSUPP then it you can't use
RWF_ATOMIC optimisations in the application....
-Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2023-12-01 22:07 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-30 13:53 [RFC 0/7] ext4: Allocator changes for atomic write support with DIO Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 1/7] iomap: Don't fall back to buffered write if the write is atomic Ojaswin Mujoo
2023-11-30 21:10 ` Dave Chinner
2023-12-01 10:42 ` John Garry
2023-12-01 13:27 ` Matthew Wilcox
2023-12-01 19:06 ` John Garry
2023-12-01 22:07 ` Dave Chinner [this message]
2023-12-04 9:02 ` John Garry
2023-12-04 18:17 ` Darrick J. Wong
2023-12-04 18:34 ` John Garry
2023-12-07 12:43 ` John Garry
2023-11-30 13:53 ` [RFC 2/7] ext4: Factor out size and start prediction from ext4_mb_normalize_request() Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 3/7] ext4: add aligned allocation support in mballoc Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 4/7] ext4: allow inode preallocation for aligned alloc Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 5/7] block: export blkdev_atomic_write_valid() and refactor api Ojaswin Mujoo
2023-12-01 10:47 ` John Garry
2023-12-11 10:57 ` Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 6/7] ext4: Add aligned allocation support for atomic direct io Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 7/7] ext4: Support atomic write for statx Ojaswin Mujoo
2023-12-04 10:36 ` [RFC 0/7] ext4: Allocator changes for atomic write support with DIO John Garry
2023-12-04 13:38 ` Ojaswin Mujoo
2023-12-04 14:44 ` John Garry
2023-12-11 10:54 ` Ojaswin Mujoo
2023-12-12 7:46 ` John Garry
2023-12-12 13:10 ` Christoph Hellwig
2023-12-12 15:16 ` Theodore Ts'o
2023-12-12 15:19 ` Christoph Hellwig
2023-12-12 16:10 ` John Garry
2023-12-13 5:59 ` Ojaswin Mujoo
2023-12-13 9:17 ` John Garry
2023-12-13 6:42 ` Ojaswin Mujoo
2023-12-13 9:20 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZWpZJicSjW2XqMmp@dread.disaster.area \
--to=david@fromorbit.com \
--cc=dchinner@redhat.com \
--cc=djwong@kernel.org \
--cc=john.g.garry@oracle.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=ritesh.list@gmail.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox