From: "Darrick J. Wong" <djwong@kernel.org>
To: John Garry <john.g.garry@oracle.com>
Cc: brauner@kernel.org, cem@kernel.org, dchinner@redhat.com,
hch@lst.de, ritesh.list@gmail.com, linux-xfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
martin.petersen@oracle.com
Subject: Re: [PATCH v2 2/7] iomap: Add zero unwritten mappings dio support
Date: Wed, 11 Dec 2024 15:47:48 -0800 [thread overview]
Message-ID: <20241211234748.GB6678@frogsfrogsfrogs> (raw)
In-Reply-To: <20241210125737.786928-3-john.g.garry@oracle.com>
On Tue, Dec 10, 2024 at 12:57:32PM +0000, John Garry wrote:
> For atomic writes support, it is required to only ever submit a single bio
> (for an atomic write).
>
> Furthermore, currently the atomic write unit min and max limit is fixed at
> the FS block size.
>
> For lifting the atomic write unit max limit, it may occur that an atomic
> write spans mixed unwritten and mapped extents. For this case, due to the
> iterative nature of iomap, multiple bios would be produced, which is
> intolerable.
>
> Add a function to zero unwritten extents in a certain range, which may be
> used to ensure that unwritten extents are zeroed prior to issuing of an
> atomic write.
I still dislike this. IMO block untorn writes _is_ a niche feature for
programs that perform IO in large blocks. Any program that wants a
general "apply all these updates or none of them" interface should use
XFS_IOC_EXCHANGE_RANGE since it has no awu_max restrictions, can handle
discontiguous update ranges, doesn't require block alignment, etc.
Instead here we are adding a bunch of complexity, and not even all that
well:
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
> fs/iomap/direct-io.c | 76 +++++++++++++++++++++++++++++++++++++++++++
> include/linux/iomap.h | 3 ++
> 2 files changed, 79 insertions(+)
>
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 23fdad16e6a8..18c888f0c11f 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -805,6 +805,82 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> }
> EXPORT_SYMBOL_GPL(iomap_dio_rw);
>
> +static loff_t
> +iomap_dio_zero_unwritten_iter(struct iomap_iter *iter, struct iomap_dio *dio)
> +{
> + const struct iomap *iomap = &iter->iomap;
> + loff_t length = iomap_length(iter);
> + loff_t pos = iter->pos;
> +
> + if (iomap->type == IOMAP_UNWRITTEN) {
> + int ret;
> +
> + dio->flags |= IOMAP_DIO_UNWRITTEN;
> + ret = iomap_dio_zero(iter, dio, pos, length);
Shouldn't this be detecting the particular case that the mapping for the
kiocb is in mixed state and only zeroing in that case? This just
targets every unwritten extent, even if the unwritten extent covered the
entire range that is being written. It doesn't handle COW, it doesn't
handle holes, etc.
Also, can you make a version of blkdev_issue_zeroout that returns the
bio so the caller can issue them asynchronously instead of opencoding
the bio_alloc loop in iomap_dev_zero?
> + if (ret)
> + return ret;
> + }
> +
> + dio->size += length;
> +
> + return length;
> +}
> +
> +ssize_t
> +iomap_dio_zero_unwritten(struct kiocb *iocb, struct iov_iter *iter,
> + const struct iomap_ops *ops, const struct iomap_dio_ops *dops)
> +{
> + struct inode *inode = file_inode(iocb->ki_filp);
> + struct iomap_dio *dio;
> + ssize_t ret;
> + struct iomap_iter iomi = {
> + .inode = inode,
> + .pos = iocb->ki_pos,
> + .len = iov_iter_count(iter),
> + .flags = IOMAP_WRITE,
IOMAP_WRITE | IOMAP_DIRECT, no?
--D
> + };
> +
> + dio = kzalloc(sizeof(*dio), GFP_KERNEL);
> + if (!dio)
> + return -ENOMEM;
> +
> + dio->iocb = iocb;
> + atomic_set(&dio->ref, 1);
> + dio->i_size = i_size_read(inode);
> + dio->dops = dops;
> + dio->submit.waiter = current;
> + dio->wait_for_completion = true;
> +
> + inode_dio_begin(inode);
> +
> + while ((ret = iomap_iter(&iomi, ops)) > 0)
> + iomi.processed = iomap_dio_zero_unwritten_iter(&iomi, dio);
> +
> + if (ret < 0)
> + iomap_dio_set_error(dio, ret);
> +
> + if (!atomic_dec_and_test(&dio->ref)) {
> + for (;;) {
> + set_current_state(TASK_UNINTERRUPTIBLE);
> + if (!READ_ONCE(dio->submit.waiter))
> + break;
> +
> + blk_io_schedule();
> + }
> + __set_current_state(TASK_RUNNING);
> + }
> +
> + if (dops && dops->end_io)
> + ret = dops->end_io(iocb, dio->size, ret, dio->flags);
> +
> + kfree(dio);
> +
> + inode_dio_end(file_inode(iocb->ki_filp));
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(iomap_dio_zero_unwritten);
> +
> static int __init iomap_dio_init(void)
> {
> zero_page = alloc_pages(GFP_KERNEL | __GFP_ZERO,
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 5675af6b740c..c2d44b9e446d 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -440,6 +440,9 @@ ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> struct iomap_dio *__iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
> unsigned int dio_flags, void *private, size_t done_before);
> +ssize_t iomap_dio_zero_unwritten(struct kiocb *iocb, struct iov_iter *iter,
> + const struct iomap_ops *ops, const struct iomap_dio_ops *dops);
> +
> ssize_t iomap_dio_complete(struct iomap_dio *dio);
> void iomap_dio_bio_end_io(struct bio *bio);
>
> --
> 2.31.1
>
>
next prev parent reply other threads:[~2024-12-11 23:47 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-10 12:57 [PATCH v2 0/7] large atomic writes for xfs John Garry
2024-12-10 12:57 ` [PATCH v2 1/7] iomap: Increase iomap_dio_zero() size limit John Garry
2024-12-10 12:57 ` [PATCH v2 2/7] iomap: Add zero unwritten mappings dio support John Garry
2024-12-11 23:47 ` Darrick J. Wong [this message]
2024-12-12 10:40 ` John Garry
2024-12-12 20:40 ` Darrick J. Wong
2024-12-13 10:43 ` John Garry
2024-12-13 14:47 ` Christoph Hellwig
2024-12-14 0:56 ` Darrick J. Wong
2024-12-17 7:08 ` Christoph Hellwig
2024-12-18 11:15 ` John Garry
2025-01-08 1:26 ` Darrick J. Wong
2025-01-08 11:39 ` John Garry
2025-01-08 17:42 ` Darrick J. Wong
2025-01-09 7:54 ` Christoph Hellwig
2025-01-10 11:59 ` John Garry
2024-12-10 12:57 ` [PATCH v2 3/7] iomap: Lift blocksize restriction on atomic writes John Garry
2024-12-10 12:57 ` [PATCH v2 4/7] xfs: Add extent zeroing support for " John Garry
2024-12-10 12:57 ` [PATCH v2 5/7] xfs: Switch atomic write size check in xfs_file_write_iter() John Garry
2024-12-10 12:57 ` [PATCH v2 6/7] xfs: Add RT atomic write unit max to xfs_mount John Garry
2024-12-10 12:57 ` [PATCH v2 7/7] xfs: Update xfs_get_atomic_write_attr() for large atomic writes John Garry
2024-12-13 14:38 ` [PATCH v2 0/7] large atomic writes for xfs Christoph Hellwig
2024-12-13 17:15 ` John Garry
2024-12-13 17:22 ` Christoph Hellwig
2024-12-13 17:43 ` John Garry
2024-12-14 0:42 ` Darrick J. Wong
2024-12-16 8:40 ` John Garry
2024-12-17 7:11 ` Christoph Hellwig
2024-12-17 8:23 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241211234748.GB6678@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=brauner@kernel.org \
--cc=cem@kernel.org \
--cc=dchinner@redhat.com \
--cc=hch@lst.de \
--cc=john.g.garry@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=ritesh.list@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox