From: "Darrick J. Wong" <djwong@kernel.org>
To: John Garry <john.g.garry@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>,
Ojaswin Mujoo <ojaswin@linux.ibm.com>,
linux-ext4@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
Ritesh Harjani <ritesh.list@gmail.com>,
linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
dchinner@redhat.com
Subject: Re: [RFC 1/7] iomap: Don't fall back to buffered write if the write is atomic
Date: Mon, 4 Dec 2023 10:17:23 -0800 [thread overview]
Message-ID: <20231204181723.GW361584@frogsfrogsfrogs> (raw)
In-Reply-To: <2aced048-4d4b-4a48-9a45-049f73763697@oracle.com>
On Mon, Dec 04, 2023 at 09:02:56AM +0000, John Garry wrote:
> On 01/12/2023 22:07, Dave Chinner wrote:
> > > Sure, and I think that we need a better story for supporting buffered IO for
> > > atomic writes.
> > >
> > > Currently we have:
> > > - man pages tell us RWF_ATOMIC is only supported for direct IO
> > > - statx gives atomic write unit min/max, not explicitly telling us it's for
> > > direct IO
> > > - RWF_ATOMIC is ignored for !O_DIRECT
> > >
> > > So I am thinking of expanding statx support to enable querying of atomic
> > > write capabilities for buffered IO and direct IO separately.
> > You're over complicating this way too much by trying to restrict the
> > functionality down to just what you want to implement right now.
> >
> > RWF_ATOMIC is no different to RWF_NOWAIT. The API doesn't decide
> > what can be supported - the filesystems themselves decide what part
> > of the API they can support and implement those pieces.
>
> Sure, but for RWF_ATOMIC we still have the associated statx call to tell us
> whether atomic writes are supported for a file and the specific range
> capability.
>
> >
> > TO go back to RWF_NOWAIT, for a long time we (XFS) only supported
> > RWF_NOWAIT on DIO, and buffered reads and writes were given
> > -EOPNOTSUPP by the filesystem. Then other filesystems started
> > supporting DIO with RWF_NOWAIT. Then buffered read support was added
> > to the page cache and XFS, and as other filesystems were converted
> > they removed the RWF_NOWAIT exclusion check from their read IO
> > paths.
> >
> > We are now in the same place with buffered write support for
> > RWF_NOWAIT. XFS, the page cache and iomap allow buffered writes w/
> > RWF_NOWAIT, but ext4, btrfs and f2fs still all return -EOPNOTSUPP
> > because they don't support non-blocking buffered writes yet.
> >
> > This is the same model we should be applying with RWF_ATOMIC - we
> > know that over time we'll be able to expand support for atomic
> > writes across both direct and buffered IO, so we should not be
> > restricting the API or infrastructure to only allow RWF_ATOMIC w/
> > DIO.
>
> Agreed.
>
> > Just have the filesystems reject RWF_ATOMIC w/ -EOPNOTSUPP if
> > they don't support it,
>
> Yes, I was going to add this regardless.
>
> > and for those that do it is conditional on
> > whther the filesystem supports it for the given type of IO being
> > done.
> >
> > Seriously - an application can easily probe for RWF_ATOMIC support
> > without needing information to be directly exposed in statx() - just
> > open a O_TMPFILE, issue the type of RWF_ATOMIC IO you require to be
> > supported, and if it returns -EOPNOTSUPP then it you can't use
> > RWF_ATOMIC optimisations in the application....
>
> ok, if that is the done thing.
>
> So I can't imagine that atomic write unit range will be different for direct
> IO and buffered IO (ignoring for a moment Christoph's idea for CoW always
> for no HW offload) when supported. But it seems that we may have a scenario
> where statx tells is that atomic writes are supported for a file, and a DIO
> write succeeds and a buffered IO write may return -EOPNOTSUPP. If that's
> acceptable then I'll work towards that.
>
> If we could just run statx on a file descriptor here then that would be
> simpler...
statx(fd, "", AT_EMPTY_PATH, ...); ?
--D
> Thanks,
> John
>
>
>
next prev parent reply other threads:[~2023-12-04 18:17 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-30 13:53 [RFC 0/7] ext4: Allocator changes for atomic write support with DIO Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 1/7] iomap: Don't fall back to buffered write if the write is atomic Ojaswin Mujoo
2023-11-30 21:10 ` Dave Chinner
2023-12-01 10:42 ` John Garry
2023-12-01 13:27 ` Matthew Wilcox
2023-12-01 19:06 ` John Garry
2023-12-01 22:07 ` Dave Chinner
2023-12-04 9:02 ` John Garry
2023-12-04 18:17 ` Darrick J. Wong [this message]
2023-12-04 18:34 ` John Garry
2023-12-07 12:43 ` John Garry
2023-11-30 13:53 ` [RFC 2/7] ext4: Factor out size and start prediction from ext4_mb_normalize_request() Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 3/7] ext4: add aligned allocation support in mballoc Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 4/7] ext4: allow inode preallocation for aligned alloc Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 5/7] block: export blkdev_atomic_write_valid() and refactor api Ojaswin Mujoo
2023-12-01 10:47 ` John Garry
2023-12-11 10:57 ` Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 6/7] ext4: Add aligned allocation support for atomic direct io Ojaswin Mujoo
2023-11-30 13:53 ` [RFC 7/7] ext4: Support atomic write for statx Ojaswin Mujoo
2023-12-04 10:36 ` [RFC 0/7] ext4: Allocator changes for atomic write support with DIO John Garry
2023-12-04 13:38 ` Ojaswin Mujoo
2023-12-04 14:44 ` John Garry
2023-12-11 10:54 ` Ojaswin Mujoo
2023-12-12 7:46 ` John Garry
2023-12-12 13:10 ` Christoph Hellwig
2023-12-12 15:16 ` Theodore Ts'o
2023-12-12 15:19 ` Christoph Hellwig
2023-12-12 16:10 ` John Garry
2023-12-13 5:59 ` Ojaswin Mujoo
2023-12-13 9:17 ` John Garry
2023-12-13 6:42 ` Ojaswin Mujoo
2023-12-13 9:20 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231204181723.GW361584@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=dchinner@redhat.com \
--cc=john.g.garry@oracle.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=ritesh.list@gmail.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox