From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: John Garry <john.g.garry@oracle.com>,
brauner@kernel.org, djwong@kernel.org, cem@kernel.org,
dchinner@redhat.com, hch@lst.de
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, ojaswin@linux.ibm.com,
martin.petersen@oracle.com, tytso@mit.edu,
linux-ext4@vger.kernel.org, John Garry <john.g.garry@oracle.com>
Subject: Re: [PATCH v6 01/13] iomap: inline iomap_dio_bio_opflags()
Date: Sun, 16 Mar 2025 19:10:06 +0530 [thread overview]
Message-ID: <87cyeh5c21.fsf@gmail.com> (raw)
In-Reply-To: <20250313171310.1886394-2-john.g.garry@oracle.com>
John Garry <john.g.garry@oracle.com> writes:
> It is neater to build blk_opf_t fully in one place, so inline
> iomap_dio_bio_opflags() in iomap_dio_bio_iter().
>
> Also tidy up the logic in dealing with IOMAP_DIO_CALLER_COMP, in generally
> separate the logic in dealing with flags associated with reads and writes.
>
Indeed it clean things up and separates the logic required for
IOMAP_DIO_WRITE v/s reads.
The change looks good to me. Please feel free to add -
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> Originally-from: Christoph Hellwig <hch@lst.de>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
> Should I change author?
> fs/iomap/direct-io.c | 112 +++++++++++++++++++------------------------
> 1 file changed, 49 insertions(+), 63 deletions(-)
>
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 5299f70428ef..8c1bec473586 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -312,27 +312,20 @@ static int iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
> }
>
> /*
> - * Figure out the bio's operation flags from the dio request, the
> - * mapping, and whether or not we want FUA. Note that we can end up
> - * clearing the WRITE_THROUGH flag in the dio request.
> + * Use a FUA write if we need datasync semantics and this is a pure data I/O
> + * that doesn't require any metadata updates (including after I/O completion
> + * such as unwritten extent conversion) and the underlying device either
> + * doesn't have a volatile write cache or supports FUA.
> + * This allows us to avoid cache flushes on I/O completion.
> */
> -static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
> - const struct iomap *iomap, bool use_fua, bool atomic_hw)
> +static inline bool iomap_dio_can_use_fua(const struct iomap *iomap,
> + struct iomap_dio *dio)
> {
> - blk_opf_t opflags = REQ_SYNC | REQ_IDLE;
> -
> - if (!(dio->flags & IOMAP_DIO_WRITE))
> - return REQ_OP_READ;
> -
> - opflags |= REQ_OP_WRITE;
> - if (use_fua)
> - opflags |= REQ_FUA;
> - else
> - dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
> - if (atomic_hw)
> - opflags |= REQ_ATOMIC;
> -
> - return opflags;
> + if (iomap->flags & (IOMAP_F_SHARED | IOMAP_F_DIRTY))
> + return false;
> + if (!(dio->flags & IOMAP_DIO_WRITE_THROUGH))
> + return false;
> + return !bdev_write_cache(iomap->bdev) || bdev_fua(iomap->bdev);
> }
>
> static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
> @@ -340,52 +333,59 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
> const struct iomap *iomap = &iter->iomap;
> struct inode *inode = iter->inode;
> unsigned int fs_block_size = i_blocksize(inode), pad;
> - bool atomic_hw = iter->flags & IOMAP_ATOMIC_HW;
> const loff_t length = iomap_length(iter);
> loff_t pos = iter->pos;
> - blk_opf_t bio_opf;
> + blk_opf_t bio_opf = REQ_SYNC | REQ_IDLE;
> struct bio *bio;
> bool need_zeroout = false;
> - bool use_fua = false;
> int nr_pages, ret = 0;
> u64 copied = 0;
> size_t orig_count;
>
> - if (atomic_hw && length != iter->len)
> - return -EINVAL;
> -
> if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) ||
> !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
> return -EINVAL;
>
> - if (iomap->type == IOMAP_UNWRITTEN) {
> - dio->flags |= IOMAP_DIO_UNWRITTEN;
> - need_zeroout = true;
> - }
> + if (dio->flags & IOMAP_DIO_WRITE) {
> + bio_opf |= REQ_OP_WRITE;
> +
> + if (iter->flags & IOMAP_ATOMIC_HW) {
> + if (length != iter->len)
> + return -EINVAL;
> + bio_opf |= REQ_ATOMIC;
> + }
> +
> + if (iomap->type == IOMAP_UNWRITTEN) {
> + dio->flags |= IOMAP_DIO_UNWRITTEN;
> + need_zeroout = true;
> + }
>
> - if (iomap->flags & IOMAP_F_SHARED)
> - dio->flags |= IOMAP_DIO_COW;
> + if (iomap->flags & IOMAP_F_SHARED)
> + dio->flags |= IOMAP_DIO_COW;
> +
> + if (iomap->flags & IOMAP_F_NEW) {
> + need_zeroout = true;
> + } else if (iomap->type == IOMAP_MAPPED) {
> + if (iomap_dio_can_use_fua(iomap, dio))
> + bio_opf |= REQ_FUA;
> + else
> + dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
> + }
>
> - if (iomap->flags & IOMAP_F_NEW) {
> - need_zeroout = true;
> - } else if (iomap->type == IOMAP_MAPPED) {
> /*
> - * Use a FUA write if we need datasync semantics, this is a pure
> - * data IO that doesn't require any metadata updates (including
> - * after IO completion such as unwritten extent conversion) and
> - * the underlying device either supports FUA or doesn't have
> - * a volatile write cache. This allows us to avoid cache flushes
> - * on IO completion. If we can't use writethrough and need to
> - * sync, disable in-task completions as dio completion will
> - * need to call generic_write_sync() which will do a blocking
> - * fsync / cache flush call.
> + * We can only do deferred completion for pure overwrites that
> + * don't require additional I/O at completion time.
> + *
> + * This rules out writes that need zeroing or extent conversion,
> + * extend the file size, or issue metadata I/O or cache flushes
> + * during completion processing.
> */
> - if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) &&
> - (dio->flags & IOMAP_DIO_WRITE_THROUGH) &&
> - (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev)))
> - use_fua = true;
> - else if (dio->flags & IOMAP_DIO_NEED_SYNC)
> + if (need_zeroout || (pos >= i_size_read(inode)) ||
> + ((dio->flags & IOMAP_DIO_NEED_SYNC) &&
> + !(bio_opf & REQ_FUA)))
> dio->flags &= ~IOMAP_DIO_CALLER_COMP;
> + } else {
> + bio_opf |= REQ_OP_READ;
> }
>
> /*
> @@ -399,18 +399,6 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
> if (!iov_iter_count(dio->submit.iter))
> goto out;
>
> - /*
> - * We can only do deferred completion for pure overwrites that
> - * don't require additional IO at completion. This rules out
> - * writes that need zeroing or extent conversion, extend
> - * the file size, or issue journal IO or cache flushes
> - * during completion processing.
> - */
> - if (need_zeroout ||
> - ((dio->flags & IOMAP_DIO_NEED_SYNC) && !use_fua) ||
> - ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode)))
> - dio->flags &= ~IOMAP_DIO_CALLER_COMP;
> -
> /*
> * The rules for polled IO completions follow the guidelines as the
> * ones we set for inline and deferred completions. If none of those
> @@ -428,8 +416,6 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
> goto out;
> }
>
> - bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua, atomic_hw);
> -
> nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_VECS);
> do {
> size_t n;
> @@ -461,7 +447,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
> }
>
> n = bio->bi_iter.bi_size;
> - if (WARN_ON_ONCE(atomic_hw && n != length)) {
> + if (WARN_ON_ONCE((bio_opf & REQ_ATOMIC) && n != length)) {
> /*
> * This bio should have covered the complete length,
> * which it doesn't, so error. We may need to zero out
> --
> 2.31.1
next prev parent reply other threads:[~2025-03-16 13:40 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-13 17:12 [PATCH v6 00/13] large atomic writes for xfs with CoW John Garry
2025-03-13 17:12 ` [PATCH v6 01/13] iomap: inline iomap_dio_bio_opflags() John Garry
2025-03-16 13:40 ` Ritesh Harjani [this message]
2025-03-17 6:07 ` Christoph Hellwig
2025-03-13 17:12 ` [PATCH v6 02/13] iomap: comment on atomic write checks in iomap_dio_bio_iter() John Garry
2025-03-17 6:08 ` Christoph Hellwig
2025-03-17 8:22 ` John Garry
2025-03-17 14:16 ` Ritesh Harjani
2025-03-13 17:13 ` [PATCH v6 03/13] iomap: rework IOMAP atomic flags John Garry
2025-03-17 6:11 ` Christoph Hellwig
2025-03-17 9:05 ` John Garry
2025-03-18 5:32 ` Christoph Hellwig
2025-03-18 8:11 ` John Garry
2025-03-17 13:44 ` Ritesh Harjani
2025-03-17 14:25 ` John Garry
2025-03-13 17:13 ` [PATCH v6 04/13] xfs: pass flags to xfs_reflink_allocate_cow() John Garry
2025-03-17 6:15 ` Christoph Hellwig
2025-03-17 9:17 ` John Garry
2025-03-18 5:33 ` Christoph Hellwig
2025-03-18 8:12 ` John Garry
2025-03-13 17:13 ` [PATCH v6 05/13] xfs: allow block allocator to take an alignment hint John Garry
2025-03-17 6:16 ` Christoph Hellwig
2025-03-13 17:13 ` [PATCH v6 06/13] xfs: switch atomic write size check in xfs_file_write_iter() John Garry
2025-03-17 6:18 ` Christoph Hellwig
2025-03-17 9:17 ` John Garry
2025-03-13 17:13 ` [PATCH v6 07/13] xfs: refactor xfs_reflink_end_cow_extent() John Garry
2025-03-17 6:19 ` Christoph Hellwig
2025-03-13 17:13 ` [PATCH v6 08/13] xfs: reflink CoW-based atomic write support John Garry
2025-03-17 6:20 ` Christoph Hellwig
2025-03-13 17:13 ` [PATCH v6 09/13] xfs: add XFS_REFLINK_ALLOC_EXTSZALIGN John Garry
2025-03-13 18:03 ` Darrick J. Wong
2025-03-17 6:23 ` Christoph Hellwig
2025-03-13 17:13 ` [PATCH v6 10/13] xfs: iomap COW-based atomic write support John Garry
2025-03-16 6:53 ` Ritesh Harjani
2025-03-17 8:54 ` John Garry
2025-03-17 14:20 ` Ritesh Harjani
2025-03-17 14:56 ` John Garry
2025-03-18 5:35 ` Christoph Hellwig
2025-03-17 7:26 ` Christoph Hellwig
2025-03-17 10:18 ` John Garry
2025-03-18 5:39 ` Christoph Hellwig
2025-03-18 8:22 ` John Garry
2025-03-18 8:32 ` Christoph Hellwig
2025-03-18 17:44 ` John Garry
2025-03-19 7:30 ` Christoph Hellwig
2025-03-19 10:24 ` John Garry
2025-03-20 5:29 ` Christoph Hellwig
2025-03-20 9:49 ` John Garry
2025-03-20 14:12 ` Christoph Hellwig
2025-03-13 17:13 ` [PATCH v6 11/13] xfs: add xfs_file_dio_write_atomic() John Garry
2025-03-17 6:41 ` Christoph Hellwig
2025-03-17 9:36 ` John Garry
2025-03-18 5:43 ` Christoph Hellwig
2025-03-18 8:42 ` John Garry
2025-03-18 8:46 ` Christoph Hellwig
2025-03-18 9:12 ` John Garry
2025-03-13 17:13 ` [PATCH v6 12/13] xfs: commit CoW-based atomic writes atomically John Garry
2025-03-17 6:56 ` Christoph Hellwig
2025-03-17 9:43 ` John Garry
2025-03-13 17:13 ` [PATCH v6 13/13] xfs: update atomic write max size John Garry
2025-03-17 7:25 ` Christoph Hellwig
2025-03-17 9:57 ` John Garry
2025-03-18 5:47 ` Christoph Hellwig
2025-03-18 5:48 ` [PATCH v6 00/13] large atomic writes for xfs with CoW Christoph Hellwig
2025-03-18 8:44 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87cyeh5c21.fsf@gmail.com \
--to=ritesh.list@gmail.com \
--cc=brauner@kernel.org \
--cc=cem@kernel.org \
--cc=dchinner@redhat.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=john.g.garry@oracle.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=ojaswin@linux.ibm.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.