From: "Darrick J. Wong" <djwong@kernel.org>
To: John Garry <john.g.garry@oracle.com>
Cc: brauner@kernel.org, cem@kernel.org, dchinner@redhat.com,
hch@lst.de, ritesh.list@gmail.com, linux-xfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
martin.petersen@oracle.com
Subject: Re: [PATCH v2 2/7] iomap: Add zero unwritten mappings dio support
Date: Thu, 12 Dec 2024 12:40:07 -0800 [thread overview]
Message-ID: <20241212204007.GL6678@frogsfrogsfrogs> (raw)
In-Reply-To: <4d34e14f-6596-483b-86e8-d4b7e44acd9a@oracle.com>
On Thu, Dec 12, 2024 at 10:40:15AM +0000, John Garry wrote:
> On 11/12/2024 23:47, Darrick J. Wong wrote:
> > On Tue, Dec 10, 2024 at 12:57:32PM +0000, John Garry wrote:
> > > For atomic writes support, it is required to only ever submit a single bio
> > > (for an atomic write).
> > >
> > > Furthermore, currently the atomic write unit min and max limit is fixed at
> > > the FS block size.
> > >
> > > For lifting the atomic write unit max limit, it may occur that an atomic
> > > write spans mixed unwritten and mapped extents. For this case, due to the
> > > iterative nature of iomap, multiple bios would be produced, which is
> > > intolerable.
> > >
> > > Add a function to zero unwritten extents in a certain range, which may be
> > > used to ensure that unwritten extents are zeroed prior to issuing of an
> > > atomic write.
> >
> > I still dislike this. IMO block untorn writes _is_ a niche feature for
> > programs that perform IO in large blocks. Any program that wants a
> > general "apply all these updates or none of them" interface should use
> > XFS_IOC_EXCHANGE_RANGE since it has no awu_max restrictions, can handle
> > discontiguous update ranges, doesn't require block alignment, etc.
>
> That is not a problem which I am trying to solve. Indeed atomic writes are
> for fixed blocks of data and not atomic file updates.
Agreed.
> However, I still think that we should be able to atomic write mixed extents,
> even though it is a pain to implement. To that end, I could be convinced
> again that we don't require it...
Well... if you /did/ add a few entries to include/uapi/linux/fs.h for
ways that an untorn write can fail, then we could define the programming
interface as so:
"If you receive -EBADMAP, then call fallocate(FALLOC_FL_MAKE_OVERWRITE)
to force all the mappings to pure overwrites."
...since there have been a few people who have asked about that ability
so that they can write+fdatasync without so much overhead from file
metadata updates.
> >
> > Instead here we are adding a bunch of complexity, and not even all that
> > well:
> >
> > > Signed-off-by: John Garry <john.g.garry@oracle.com>
> > > ---
> > > fs/iomap/direct-io.c | 76 +++++++++++++++++++++++++++++++++++++++++++
> > > include/linux/iomap.h | 3 ++
> > > 2 files changed, 79 insertions(+)
> > >
> > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > > index 23fdad16e6a8..18c888f0c11f 100644
> > > --- a/fs/iomap/direct-io.c
> > > +++ b/fs/iomap/direct-io.c
> > > @@ -805,6 +805,82 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> > > }
> > > EXPORT_SYMBOL_GPL(iomap_dio_rw);
> > > +static loff_t
> > > +iomap_dio_zero_unwritten_iter(struct iomap_iter *iter, struct iomap_dio *dio)
> > > +{
> > > + const struct iomap *iomap = &iter->iomap;
> > > + loff_t length = iomap_length(iter);
> > > + loff_t pos = iter->pos;
> > > +
> > > + if (iomap->type == IOMAP_UNWRITTEN) {
> > > + int ret;
> > > +
> > > + dio->flags |= IOMAP_DIO_UNWRITTEN;
> > > + ret = iomap_dio_zero(iter, dio, pos, length);
> >
> > Shouldn't this be detecting the particular case that the mapping for the
> > kiocb is in mixed state and only zeroing in that case? This just
> > targets every unwritten extent, even if the unwritten extent covered the
> > entire range that is being written.
>
> Right, so I did touch on this in the final comment in patch 4/7 commit log.
>
> Why I did it this way? I did not think that it made much difference, since
> this zeroing would be generally a one-off and did not merit even more
> complexity to implement.
The trouble is, if you fallocate the whole file and then write an
aligned 64k block, this will write zeroes to the block, update the
mapping, and only then issue the untorn write. Sure that's a one time
performance hit, but probably not a welcome one.
> > It doesn't handle COW, it doesn't
>
> Do we want to atomic write COW?
I don't see why not -- if there's a single COW mapping for the whole
untorn write, then the data gets written to the media in an untorn
fashion, and the remap is a single transaction.
> > handle holes, etc.
>
> I did test hole, and it seemed to work. However I noticed that for a hole
> region we get IOMAP_UNWRITTEN type, like:
Oh right, that's ->iomap_begin allocating an unwritten extent into the
hole, because you're not allowed to specify a hole for the destination
of a write. I withdraw that part of the comment.
> # xfs_bmap -vvp /root/mnt/file
> /root/mnt/file:
> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
> 0: [0..127]: 192..319 0 (192..319) 128 000000
> 1: [128..255]: hole 128
> 2: [256..383]: 448..575 0 (448..575) 128 000000
> FLAG Values:
> 0100000 Shared extent
> 0010000 Unwritten preallocated extent
> 0001000 Doesn't begin on stripe unit
> 0000100 Doesn't end on stripe unit
> 0000010 Doesn't begin on stripe width
> 0000001 Doesn't end on stripe width
> #
> #
>
> For an atomic wrote of length 65536 @ offset 65536.
>
> Any hint on how to create a iomap->type == IOMAP_HOLE?
>
> > Also, can you make a version of blkdev_issue_zeroout that returns the
> > bio so the caller can issue them asynchronously instead of opencoding
> > the bio_alloc loop in iomap_dev_zero?
>
> ok, fine
>
> >
> > > + if (ret)
> > > + return ret;
> > > + }
> > > +
> > > + dio->size += length;
> > > +
> > > + return length;
> > > +}
> > > +
> > > +ssize_t
> > > +iomap_dio_zero_unwritten(struct kiocb *iocb, struct iov_iter *iter,
> > > + const struct iomap_ops *ops, const struct iomap_dio_ops *dops)
> > > +{
> > > + struct inode *inode = file_inode(iocb->ki_filp);
> > > + struct iomap_dio *dio;
> > > + ssize_t ret;
> > > + struct iomap_iter iomi = {
> > > + .inode = inode,
> > > + .pos = iocb->ki_pos,
> > > + .len = iov_iter_count(iter),
> > > + .flags = IOMAP_WRITE,
> >
> > IOMAP_WRITE | IOMAP_DIRECT, no?
>
> yes, I'll fix that.
>
> And I should also set IOMAP_DIO_WRITE for dio->flags.
<nod>
--D
> Thanks,
> John
>
next prev parent reply other threads:[~2024-12-12 20:40 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-10 12:57 [PATCH v2 0/7] large atomic writes for xfs John Garry
2024-12-10 12:57 ` [PATCH v2 1/7] iomap: Increase iomap_dio_zero() size limit John Garry
2024-12-10 12:57 ` [PATCH v2 2/7] iomap: Add zero unwritten mappings dio support John Garry
2024-12-11 23:47 ` Darrick J. Wong
2024-12-12 10:40 ` John Garry
2024-12-12 20:40 ` Darrick J. Wong [this message]
2024-12-13 10:43 ` John Garry
2024-12-13 14:47 ` Christoph Hellwig
2024-12-14 0:56 ` Darrick J. Wong
2024-12-17 7:08 ` Christoph Hellwig
2024-12-18 11:15 ` John Garry
2025-01-08 1:26 ` Darrick J. Wong
2025-01-08 11:39 ` John Garry
2025-01-08 17:42 ` Darrick J. Wong
2025-01-09 7:54 ` Christoph Hellwig
2025-01-10 11:59 ` John Garry
2024-12-10 12:57 ` [PATCH v2 3/7] iomap: Lift blocksize restriction on atomic writes John Garry
2024-12-10 12:57 ` [PATCH v2 4/7] xfs: Add extent zeroing support for " John Garry
2024-12-10 12:57 ` [PATCH v2 5/7] xfs: Switch atomic write size check in xfs_file_write_iter() John Garry
2024-12-10 12:57 ` [PATCH v2 6/7] xfs: Add RT atomic write unit max to xfs_mount John Garry
2024-12-10 12:57 ` [PATCH v2 7/7] xfs: Update xfs_get_atomic_write_attr() for large atomic writes John Garry
2024-12-13 14:38 ` [PATCH v2 0/7] large atomic writes for xfs Christoph Hellwig
2024-12-13 17:15 ` John Garry
2024-12-13 17:22 ` Christoph Hellwig
2024-12-13 17:43 ` John Garry
2024-12-14 0:42 ` Darrick J. Wong
2024-12-16 8:40 ` John Garry
2024-12-17 7:11 ` Christoph Hellwig
2024-12-17 8:23 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241212204007.GL6678@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=brauner@kernel.org \
--cc=cem@kernel.org \
--cc=dchinner@redhat.com \
--cc=hch@lst.de \
--cc=john.g.garry@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=ritesh.list@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox