linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] further iomap large atomic writes changes
@ 2025-03-20 12:02 John Garry
  2025-03-20 12:02 ` [PATCH 1/3] iomap: inline iomap_dio_bio_opflags() John Garry
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: John Garry @ 2025-03-20 12:02 UTC (permalink / raw)
  To: brauner, djwong, hch
  Cc: linux-fsdevel, dchinner, linux-xfs, linux-kernel, ojaswin,
	ritesh.list, martin.petersen, tytso, linux-ext4, John Garry

These iomap changes are spun-off the XFS large atomic writes series at
https://lore.kernel.org/linux-xfs/86a64256-497a-453b-bbba-a5ac6b4cb056@oracle.com/T/#ma99c763221de9d49ea2ccfca9ff9b8d71c8b2677

The XFS parts there are not ready yet, but it is worth having the iomap
changes queued in advance.

Some much earlier changes from that same series were already queued in the
vfs tree, and these patches rework those changes - specifically the
first patch in this series does.

The most other significant change is the patch to rework how the bio flags
are set in the DIO patch.

The baseline is c7be0d72d551 (vfs/vfs-6.15.iomap) Merge patch series
"iomap preliminaries for large atomic write for xfs with CoW"

John Garry (3):
  iomap: inline iomap_dio_bio_opflags()
  iomap: comment on atomic write checks in iomap_dio_bio_iter()
  iomap: rework IOMAP atomic flags

 .../filesystems/iomap/operations.rst          |  35 ++---
 fs/ext4/inode.c                               |   6 +-
 fs/iomap/direct-io.c                          | 125 ++++++++----------
 fs/iomap/trace.h                              |   2 +-
 fs/xfs/xfs_iomap.c                            |   4 +
 include/linux/iomap.h                         |  12 +-
 6 files changed, 91 insertions(+), 93 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] iomap: inline iomap_dio_bio_opflags()
  2025-03-20 12:02 [PATCH 0/3] further iomap large atomic writes changes John Garry
@ 2025-03-20 12:02 ` John Garry
  2025-03-20 12:02 ` [PATCH 2/3] iomap: comment on atomic write checks in iomap_dio_bio_iter() John Garry
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: John Garry @ 2025-03-20 12:02 UTC (permalink / raw)
  To: brauner, djwong, hch
  Cc: linux-fsdevel, dchinner, linux-xfs, linux-kernel, ojaswin,
	ritesh.list, martin.petersen, tytso, linux-ext4, John Garry

It is neater to build blk_opf_t fully in one place, so inline
iomap_dio_bio_opflags() in iomap_dio_bio_iter().

Also tidy up the logic in dealing with IOMAP_DIO_CALLER_COMP, in generally
separate the logic in dealing with flags associated with reads and writes.

Originally-from: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/iomap/direct-io.c | 112 +++++++++++++++++++------------------------
 1 file changed, 49 insertions(+), 63 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 5299f70428ef..8c1bec473586 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -312,27 +312,20 @@ static int iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
 }
 
 /*
- * Figure out the bio's operation flags from the dio request, the
- * mapping, and whether or not we want FUA.  Note that we can end up
- * clearing the WRITE_THROUGH flag in the dio request.
+ * Use a FUA write if we need datasync semantics and this is a pure data I/O
+ * that doesn't require any metadata updates (including after I/O completion
+ * such as unwritten extent conversion) and the underlying device either
+ * doesn't have a volatile write cache or supports FUA.
+ * This allows us to avoid cache flushes on I/O completion.
  */
-static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
-		const struct iomap *iomap, bool use_fua, bool atomic_hw)
+static inline bool iomap_dio_can_use_fua(const struct iomap *iomap,
+		struct iomap_dio *dio)
 {
-	blk_opf_t opflags = REQ_SYNC | REQ_IDLE;
-
-	if (!(dio->flags & IOMAP_DIO_WRITE))
-		return REQ_OP_READ;
-
-	opflags |= REQ_OP_WRITE;
-	if (use_fua)
-		opflags |= REQ_FUA;
-	else
-		dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
-	if (atomic_hw)
-		opflags |= REQ_ATOMIC;
-
-	return opflags;
+	if (iomap->flags & (IOMAP_F_SHARED | IOMAP_F_DIRTY))
+		return false;
+	if (!(dio->flags & IOMAP_DIO_WRITE_THROUGH))
+		return false;
+	return !bdev_write_cache(iomap->bdev) || bdev_fua(iomap->bdev);
 }
 
 static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
@@ -340,52 +333,59 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 	const struct iomap *iomap = &iter->iomap;
 	struct inode *inode = iter->inode;
 	unsigned int fs_block_size = i_blocksize(inode), pad;
-	bool atomic_hw = iter->flags & IOMAP_ATOMIC_HW;
 	const loff_t length = iomap_length(iter);
 	loff_t pos = iter->pos;
-	blk_opf_t bio_opf;
+	blk_opf_t bio_opf = REQ_SYNC | REQ_IDLE;
 	struct bio *bio;
 	bool need_zeroout = false;
-	bool use_fua = false;
 	int nr_pages, ret = 0;
 	u64 copied = 0;
 	size_t orig_count;
 
-	if (atomic_hw && length != iter->len)
-		return -EINVAL;
-
 	if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) ||
 	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
 		return -EINVAL;
 
-	if (iomap->type == IOMAP_UNWRITTEN) {
-		dio->flags |= IOMAP_DIO_UNWRITTEN;
-		need_zeroout = true;
-	}
+	if (dio->flags & IOMAP_DIO_WRITE) {
+		bio_opf |= REQ_OP_WRITE;
+
+		if (iter->flags & IOMAP_ATOMIC_HW) {
+			if (length != iter->len)
+				return -EINVAL;
+			bio_opf |= REQ_ATOMIC;
+		}
+
+		if (iomap->type == IOMAP_UNWRITTEN) {
+			dio->flags |= IOMAP_DIO_UNWRITTEN;
+			need_zeroout = true;
+		}
 
-	if (iomap->flags & IOMAP_F_SHARED)
-		dio->flags |= IOMAP_DIO_COW;
+		if (iomap->flags & IOMAP_F_SHARED)
+			dio->flags |= IOMAP_DIO_COW;
+
+		if (iomap->flags & IOMAP_F_NEW) {
+			need_zeroout = true;
+		} else if (iomap->type == IOMAP_MAPPED) {
+			if (iomap_dio_can_use_fua(iomap, dio))
+				bio_opf |= REQ_FUA;
+			else
+				dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
+		}
 
-	if (iomap->flags & IOMAP_F_NEW) {
-		need_zeroout = true;
-	} else if (iomap->type == IOMAP_MAPPED) {
 		/*
-		 * Use a FUA write if we need datasync semantics, this is a pure
-		 * data IO that doesn't require any metadata updates (including
-		 * after IO completion such as unwritten extent conversion) and
-		 * the underlying device either supports FUA or doesn't have
-		 * a volatile write cache. This allows us to avoid cache flushes
-		 * on IO completion. If we can't use writethrough and need to
-		 * sync, disable in-task completions as dio completion will
-		 * need to call generic_write_sync() which will do a blocking
-		 * fsync / cache flush call.
+		 * We can only do deferred completion for pure overwrites that
+		 * don't require additional I/O at completion time.
+		 *
+		 * This rules out writes that need zeroing or extent conversion,
+		 * extend the file size, or issue metadata I/O or cache flushes
+		 * during completion processing.
 		 */
-		if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) &&
-		    (dio->flags & IOMAP_DIO_WRITE_THROUGH) &&
-		    (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev)))
-			use_fua = true;
-		else if (dio->flags & IOMAP_DIO_NEED_SYNC)
+		if (need_zeroout || (pos >= i_size_read(inode)) ||
+		    ((dio->flags & IOMAP_DIO_NEED_SYNC) &&
+		     !(bio_opf & REQ_FUA)))
 			dio->flags &= ~IOMAP_DIO_CALLER_COMP;
+	} else {
+		bio_opf |= REQ_OP_READ;
 	}
 
 	/*
@@ -399,18 +399,6 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 	if (!iov_iter_count(dio->submit.iter))
 		goto out;
 
-	/*
-	 * We can only do deferred completion for pure overwrites that
-	 * don't require additional IO at completion. This rules out
-	 * writes that need zeroing or extent conversion, extend
-	 * the file size, or issue journal IO or cache flushes
-	 * during completion processing.
-	 */
-	if (need_zeroout ||
-	    ((dio->flags & IOMAP_DIO_NEED_SYNC) && !use_fua) ||
-	    ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode)))
-		dio->flags &= ~IOMAP_DIO_CALLER_COMP;
-
 	/*
 	 * The rules for polled IO completions follow the guidelines as the
 	 * ones we set for inline and deferred completions. If none of those
@@ -428,8 +416,6 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 			goto out;
 	}
 
-	bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua, atomic_hw);
-
 	nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_VECS);
 	do {
 		size_t n;
@@ -461,7 +447,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 		}
 
 		n = bio->bi_iter.bi_size;
-		if (WARN_ON_ONCE(atomic_hw && n != length)) {
+		if (WARN_ON_ONCE((bio_opf & REQ_ATOMIC) && n != length)) {
 			/*
 			 * This bio should have covered the complete length,
 			 * which it doesn't, so error. We may need to zero out
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] iomap: comment on atomic write checks in iomap_dio_bio_iter()
  2025-03-20 12:02 [PATCH 0/3] further iomap large atomic writes changes John Garry
  2025-03-20 12:02 ` [PATCH 1/3] iomap: inline iomap_dio_bio_opflags() John Garry
@ 2025-03-20 12:02 ` John Garry
  2025-03-20 14:09   ` Christoph Hellwig
  2025-03-20 19:32   ` Ritesh Harjani
  2025-03-20 12:02 ` [PATCH 3/3] iomap: rework IOMAP atomic flags John Garry
  2025-03-20 14:16 ` [PATCH 0/3] further iomap large atomic writes changes Christian Brauner
  3 siblings, 2 replies; 15+ messages in thread
From: John Garry @ 2025-03-20 12:02 UTC (permalink / raw)
  To: brauner, djwong, hch
  Cc: linux-fsdevel, dchinner, linux-xfs, linux-kernel, ojaswin,
	ritesh.list, martin.petersen, tytso, linux-ext4, John Garry

Help explain the code.

Also clarify the comment for bio size check.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/iomap/direct-io.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 8c1bec473586..b9f59ca43c15 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -350,6 +350,11 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 		bio_opf |= REQ_OP_WRITE;
 
 		if (iter->flags & IOMAP_ATOMIC_HW) {
+			/*
+			 * Ensure that the mapping covers the full write
+			 * length, otherwise it won't be submitted as a single
+			 * bio, which is required to use hardware atomics.
+			 */
 			if (length != iter->len)
 				return -EINVAL;
 			bio_opf |= REQ_ATOMIC;
@@ -449,7 +454,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 		n = bio->bi_iter.bi_size;
 		if (WARN_ON_ONCE((bio_opf & REQ_ATOMIC) && n != length)) {
 			/*
-			 * This bio should have covered the complete length,
+			 * An atomic write bio must cover the complete length,
 			 * which it doesn't, so error. We may need to zero out
 			 * the tail (complete FS block), similar to when
 			 * bio_iov_iter_get_pages() returns an error, above.
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] iomap: rework IOMAP atomic flags
  2025-03-20 12:02 [PATCH 0/3] further iomap large atomic writes changes John Garry
  2025-03-20 12:02 ` [PATCH 1/3] iomap: inline iomap_dio_bio_opflags() John Garry
  2025-03-20 12:02 ` [PATCH 2/3] iomap: comment on atomic write checks in iomap_dio_bio_iter() John Garry
@ 2025-03-20 12:02 ` John Garry
  2025-03-20 14:10   ` Christoph Hellwig
                     ` (2 more replies)
  2025-03-20 14:16 ` [PATCH 0/3] further iomap large atomic writes changes Christian Brauner
  3 siblings, 3 replies; 15+ messages in thread
From: John Garry @ 2025-03-20 12:02 UTC (permalink / raw)
  To: brauner, djwong, hch
  Cc: linux-fsdevel, dchinner, linux-xfs, linux-kernel, ojaswin,
	ritesh.list, martin.petersen, tytso, linux-ext4, John Garry

Flag IOMAP_ATOMIC_SW is not really required. The idea of having this flag
is that the FS ->iomap_begin callback could check if this flag is set to
decide whether to do a SW (FS-based) atomic write. But the FS can set
which ->iomap_begin callback it wants when deciding to do a FS-based
atomic write.

Furthermore, it was thought that IOMAP_ATOMIC_HW is not a proper name, as
the block driver can use SW-methods to emulate an atomic write. So change
back to IOMAP_ATOMIC.

The ->iomap_begin callback needs though to indicate to iomap core that
REQ_ATOMIC needs to be set, so add IOMAP_F_ATOMIC_BIO for that.

These changes were suggested by Christoph Hellwig and Dave Chinner.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 .../filesystems/iomap/operations.rst          | 35 ++++++++++---------
 fs/ext4/inode.c                               |  6 +++-
 fs/iomap/direct-io.c                          |  8 ++---
 fs/iomap/trace.h                              |  2 +-
 fs/xfs/xfs_iomap.c                            |  4 +++
 include/linux/iomap.h                         | 12 +++----
 6 files changed, 37 insertions(+), 30 deletions(-)

diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index b08a79d11d9f..3b628e370d88 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -514,29 +514,32 @@ IOMAP_WRITE`` with any combination of the following enhancements:
    if the mapping is unwritten and the filesystem cannot handle zeroing
    the unaligned regions without exposing stale contents.
 
- * ``IOMAP_ATOMIC_HW``: This write is being issued with torn-write
-   protection based on HW-offload support.
-   Only a single bio can be created for the write, and the write must
-   not be split into multiple I/O requests, i.e. flag REQ_ATOMIC must be
-   set.
+ * ``IOMAP_ATOMIC``: This write is being issued with torn-write
+   protection.
+   Torn-write protection may be provided based on HW-offload or by a
+   software mechanism provided by the filesystem.
+
+   For HW-offload based support, only a single bio can be created for the
+   write, and the write must not be split into multiple I/O requests, i.e.
+   flag REQ_ATOMIC must be set.
    The file range to write must be aligned to satisfy the requirements
    of both the filesystem and the underlying block device's atomic
    commit capabilities.
    If filesystem metadata updates are required (e.g. unwritten extent
-   conversion or copy on write), all updates for the entire file range
+   conversion or copy-on-write), all updates for the entire file range
    must be committed atomically as well.
-   Only one space mapping is allowed per untorn write.
-   Untorn writes may be longer than a single file block. In all cases,
+   Untorn-writes may be longer than a single file block. In all cases,
    the mapping start disk block must have at least the same alignment as
    the write offset.
-
- * ``IOMAP_ATOMIC_SW``: This write is being issued with torn-write
-   protection via a software mechanism provided by the filesystem.
-   All the disk block alignment and single bio restrictions which apply
-   to IOMAP_ATOMIC_HW do not apply here.
-   SW-based untorn writes would typically be used as a fallback when
-   HW-based untorn writes may not be issued, e.g. the range of the write
-   covers multiple extents, meaning that it is not possible to issue
+   The filesystems must set IOMAP_F_ATOMIC_BIO to inform iomap core of an
+   untorn-write based on HW-offload.
+
+   For untorn-writes based on a software mechanism provided by the
+   filesystem, all the disk block alignment and single bio restrictions
+   which apply for HW-offload based untorn-writes do not apply.
+   The mechanism would typically be used as a fallback for when
+   HW-offload based untorn-writes may not be issued, e.g. the range of the
+   write covers multiple extents, meaning that it is not possible to issue
    a single bio.
    All filesystem metadata updates for the entire file range must be
    committed atomically as well.
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index ba2f1e3db7c7..d04d8a7f12e7 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3290,6 +3290,10 @@ static void ext4_set_iomap(struct inode *inode, struct iomap *iomap,
 	if (map->m_flags & EXT4_MAP_NEW)
 		iomap->flags |= IOMAP_F_NEW;
 
+	/* HW-offload atomics are always used */
+	if (flags & IOMAP_ATOMIC)
+		iomap->flags |= IOMAP_F_ATOMIC_BIO;
+
 	if (flags & IOMAP_DAX)
 		iomap->dax_dev = EXT4_SB(inode->i_sb)->s_daxdev;
 	else
@@ -3467,7 +3471,7 @@ static inline bool ext4_want_directio_fallback(unsigned flags, ssize_t written)
 		return false;
 
 	/* atomic writes are all-or-nothing */
-	if (flags & IOMAP_ATOMIC_HW)
+	if (flags & IOMAP_ATOMIC)
 		return false;
 
 	/* can only try again if we wrote nothing */
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index b9f59ca43c15..6ac7a1534f7c 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -349,7 +349,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 	if (dio->flags & IOMAP_DIO_WRITE) {
 		bio_opf |= REQ_OP_WRITE;
 
-		if (iter->flags & IOMAP_ATOMIC_HW) {
+		if (iomap->flags & IOMAP_F_ATOMIC_BIO) {
 			/*
 			 * Ensure that the mapping covers the full write
 			 * length, otherwise it won't be submitted as a single
@@ -677,10 +677,8 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 			iomi.flags |= IOMAP_OVERWRITE_ONLY;
 		}
 
-		if (dio_flags & IOMAP_DIO_ATOMIC_SW)
-			iomi.flags |= IOMAP_ATOMIC_SW;
-		else if (iocb->ki_flags & IOCB_ATOMIC)
-			iomi.flags |= IOMAP_ATOMIC_HW;
+		if (iocb->ki_flags & IOCB_ATOMIC)
+			iomi.flags |= IOMAP_ATOMIC;
 
 		/* for data sync or sync, we need sync completion processing */
 		if (iocb_is_dsync(iocb)) {
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index 69af89044ebd..9eab2c8ac3c5 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -99,7 +99,7 @@ DEFINE_RANGE_EVENT(iomap_dio_rw_queued);
 	{ IOMAP_FAULT,		"FAULT" }, \
 	{ IOMAP_DIRECT,		"DIRECT" }, \
 	{ IOMAP_NOWAIT,		"NOWAIT" }, \
-	{ IOMAP_ATOMIC_HW,	"ATOMIC_HW" }
+	{ IOMAP_ATOMIC,		"ATOMIC" }
 
 #define IOMAP_F_FLAGS_STRINGS \
 	{ IOMAP_F_NEW,		"NEW" }, \
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 5dd0922fe2d1..ee40dc509413 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -828,6 +828,10 @@ xfs_direct_write_iomap_begin(
 	if (offset + length > i_size_read(inode))
 		iomap_flags |= IOMAP_F_DIRTY;
 
+	/* HW-offload atomics are always used in this path */
+	if (flags & IOMAP_ATOMIC)
+		iomap_flags |= IOMAP_F_ATOMIC_BIO;
+
 	/*
 	 * COW writes may allocate delalloc space or convert unwritten COW
 	 * extents, so we need to make sure to take the lock exclusively here.
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 9cd93530013c..02fe001feebb 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -60,6 +60,9 @@ struct vm_fault;
  * IOMAP_F_ANON_WRITE indicates that (write) I/O does not have a target block
  * assigned to it yet and the file system will do that in the bio submission
  * handler, splitting the I/O as needed.
+ *
+ * IOMAP_F_ATOMIC_BIO indicates that (write) I/O will be issued as an atomic
+ * bio, i.e. set REQ_ATOMIC.
  */
 #define IOMAP_F_NEW		(1U << 0)
 #define IOMAP_F_DIRTY		(1U << 1)
@@ -73,6 +76,7 @@ struct vm_fault;
 #define IOMAP_F_XATTR		(1U << 5)
 #define IOMAP_F_BOUNDARY	(1U << 6)
 #define IOMAP_F_ANON_WRITE	(1U << 7)
+#define IOMAP_F_ATOMIC_BIO	(1U << 8)
 
 /*
  * Flags set by the core iomap code during operations:
@@ -189,9 +193,8 @@ struct iomap_folio_ops {
 #else
 #define IOMAP_DAX		0
 #endif /* CONFIG_FS_DAX */
-#define IOMAP_ATOMIC_HW		(1 << 9) /* HW-based torn-write protection */
+#define IOMAP_ATOMIC		(1 << 9) /* torn-write protection */
 #define IOMAP_DONTCACHE		(1 << 10)
-#define IOMAP_ATOMIC_SW		(1 << 11)/* SW-based torn-write protection */
 
 struct iomap_ops {
 	/*
@@ -503,11 +506,6 @@ struct iomap_dio_ops {
  */
 #define IOMAP_DIO_PARTIAL		(1 << 2)
 
-/*
- * Use software-based torn-write protection.
- */
-#define IOMAP_DIO_ATOMIC_SW		(1 << 3)
-
 ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
 		unsigned int dio_flags, void *private, size_t done_before);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] iomap: comment on atomic write checks in iomap_dio_bio_iter()
  2025-03-20 12:02 ` [PATCH 2/3] iomap: comment on atomic write checks in iomap_dio_bio_iter() John Garry
@ 2025-03-20 14:09   ` Christoph Hellwig
  2025-03-20 19:32   ` Ritesh Harjani
  1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2025-03-20 14:09 UTC (permalink / raw)
  To: John Garry
  Cc: brauner, djwong, hch, linux-fsdevel, dchinner, linux-xfs,
	linux-kernel, ojaswin, ritesh.list, martin.petersen, tytso,
	linux-ext4

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] iomap: rework IOMAP atomic flags
  2025-03-20 12:02 ` [PATCH 3/3] iomap: rework IOMAP atomic flags John Garry
@ 2025-03-20 14:10   ` Christoph Hellwig
  2025-03-20 19:29   ` Ritesh Harjani
  2025-03-22 19:47   ` Ritesh Harjani
  2 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2025-03-20 14:10 UTC (permalink / raw)
  To: John Garry
  Cc: brauner, djwong, hch, linux-fsdevel, dchinner, linux-xfs,
	linux-kernel, ojaswin, ritesh.list, martin.petersen, tytso,
	linux-ext4

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] further iomap large atomic writes changes
  2025-03-20 12:02 [PATCH 0/3] further iomap large atomic writes changes John Garry
                   ` (2 preceding siblings ...)
  2025-03-20 12:02 ` [PATCH 3/3] iomap: rework IOMAP atomic flags John Garry
@ 2025-03-20 14:16 ` Christian Brauner
  3 siblings, 0 replies; 15+ messages in thread
From: Christian Brauner @ 2025-03-20 14:16 UTC (permalink / raw)
  To: djwong, hch, John Garry
  Cc: Christian Brauner, linux-fsdevel, dchinner, linux-xfs,
	linux-kernel, ojaswin, ritesh.list, martin.petersen, tytso,
	linux-ext4

On Thu, 20 Mar 2025 12:02:47 +0000, John Garry wrote:
> These iomap changes are spun-off the XFS large atomic writes series at
> https://lore.kernel.org/linux-xfs/86a64256-497a-453b-bbba-a5ac6b4cb056@oracle.com/T/#ma99c763221de9d49ea2ccfca9ff9b8d71c8b2677
> 
> The XFS parts there are not ready yet, but it is worth having the iomap
> changes queued in advance.
> 
> Some much earlier changes from that same series were already queued in the
> vfs tree, and these patches rework those changes - specifically the
> first patch in this series does.
> 
> [...]

Applied to the vfs-6.15.iomap branch of the vfs/vfs.git tree.
Patches in the vfs-6.15.iomap branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.15.iomap

[1/3] iomap: inline iomap_dio_bio_opflags()
      https://git.kernel.org/vfs/vfs/c/d279c80e0bac
[2/3] iomap: comment on atomic write checks in iomap_dio_bio_iter()
      https://git.kernel.org/vfs/vfs/c/aacd436e40b0
[3/3] iomap: rework IOMAP atomic flags
      https://git.kernel.org/vfs/vfs/c/370a6de7651b

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] iomap: rework IOMAP atomic flags
  2025-03-20 12:02 ` [PATCH 3/3] iomap: rework IOMAP atomic flags John Garry
  2025-03-20 14:10   ` Christoph Hellwig
@ 2025-03-20 19:29   ` Ritesh Harjani
  2025-03-22 19:47   ` Ritesh Harjani
  2 siblings, 0 replies; 15+ messages in thread
From: Ritesh Harjani @ 2025-03-20 19:29 UTC (permalink / raw)
  To: John Garry, brauner, djwong, hch
  Cc: linux-fsdevel, dchinner, linux-xfs, linux-kernel, ojaswin,
	martin.petersen, tytso, linux-ext4, John Garry

John Garry <john.g.garry@oracle.com> writes:

> Flag IOMAP_ATOMIC_SW is not really required. The idea of having this flag
> is that the FS ->iomap_begin callback could check if this flag is set to
> decide whether to do a SW (FS-based) atomic write. But the FS can set
> which ->iomap_begin callback it wants when deciding to do a FS-based
> atomic write.
>
> Furthermore, it was thought that IOMAP_ATOMIC_HW is not a proper name, as
> the block driver can use SW-methods to emulate an atomic write. So change
> back to IOMAP_ATOMIC.
>
> The ->iomap_begin callback needs though to indicate to iomap core that
> REQ_ATOMIC needs to be set, so add IOMAP_F_ATOMIC_BIO for that.
>
> These changes were suggested by Christoph Hellwig and Dave Chinner.

Looks good to me. Thanks for updating the iomap design document as well.
Feel free to add:

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>


>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  .../filesystems/iomap/operations.rst          | 35 ++++++++++---------
>  fs/ext4/inode.c                               |  6 +++-
>  fs/iomap/direct-io.c                          |  8 ++---
>  fs/iomap/trace.h                              |  2 +-
>  fs/xfs/xfs_iomap.c                            |  4 +++
>  include/linux/iomap.h                         | 12 +++----
>  6 files changed, 37 insertions(+), 30 deletions(-)
>
> diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
> index b08a79d11d9f..3b628e370d88 100644
> --- a/Documentation/filesystems/iomap/operations.rst
> +++ b/Documentation/filesystems/iomap/operations.rst
> @@ -514,29 +514,32 @@ IOMAP_WRITE`` with any combination of the following enhancements:
>     if the mapping is unwritten and the filesystem cannot handle zeroing
>     the unaligned regions without exposing stale contents.
>  
> - * ``IOMAP_ATOMIC_HW``: This write is being issued with torn-write
> -   protection based on HW-offload support.
> -   Only a single bio can be created for the write, and the write must
> -   not be split into multiple I/O requests, i.e. flag REQ_ATOMIC must be
> -   set.
> + * ``IOMAP_ATOMIC``: This write is being issued with torn-write
> +   protection.
> +   Torn-write protection may be provided based on HW-offload or by a
> +   software mechanism provided by the filesystem.
> +
> +   For HW-offload based support, only a single bio can be created for the
> +   write, and the write must not be split into multiple I/O requests, i.e.
> +   flag REQ_ATOMIC must be set.
>     The file range to write must be aligned to satisfy the requirements
>     of both the filesystem and the underlying block device's atomic
>     commit capabilities.
>     If filesystem metadata updates are required (e.g. unwritten extent
> -   conversion or copy on write), all updates for the entire file range
> +   conversion or copy-on-write), all updates for the entire file range
>     must be committed atomically as well.
> -   Only one space mapping is allowed per untorn write.
> -   Untorn writes may be longer than a single file block. In all cases,
> +   Untorn-writes may be longer than a single file block. In all cases,
>     the mapping start disk block must have at least the same alignment as
>     the write offset.
> -
> - * ``IOMAP_ATOMIC_SW``: This write is being issued with torn-write
> -   protection via a software mechanism provided by the filesystem.
> -   All the disk block alignment and single bio restrictions which apply
> -   to IOMAP_ATOMIC_HW do not apply here.
> -   SW-based untorn writes would typically be used as a fallback when
> -   HW-based untorn writes may not be issued, e.g. the range of the write
> -   covers multiple extents, meaning that it is not possible to issue
> +   The filesystems must set IOMAP_F_ATOMIC_BIO to inform iomap core of an
> +   untorn-write based on HW-offload.
> +
> +   For untorn-writes based on a software mechanism provided by the
> +   filesystem, all the disk block alignment and single bio restrictions
> +   which apply for HW-offload based untorn-writes do not apply.
> +   The mechanism would typically be used as a fallback for when
> +   HW-offload based untorn-writes may not be issued, e.g. the range of the
> +   write covers multiple extents, meaning that it is not possible to issue
>     a single bio.
>     All filesystem metadata updates for the entire file range must be
>     committed atomically as well.
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index ba2f1e3db7c7..d04d8a7f12e7 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3290,6 +3290,10 @@ static void ext4_set_iomap(struct inode *inode, struct iomap *iomap,
>  	if (map->m_flags & EXT4_MAP_NEW)
>  		iomap->flags |= IOMAP_F_NEW;
>  
> +	/* HW-offload atomics are always used */
> +	if (flags & IOMAP_ATOMIC)
> +		iomap->flags |= IOMAP_F_ATOMIC_BIO;
> +
>  	if (flags & IOMAP_DAX)
>  		iomap->dax_dev = EXT4_SB(inode->i_sb)->s_daxdev;
>  	else
> @@ -3467,7 +3471,7 @@ static inline bool ext4_want_directio_fallback(unsigned flags, ssize_t written)
>  		return false;
>  
>  	/* atomic writes are all-or-nothing */
> -	if (flags & IOMAP_ATOMIC_HW)
> +	if (flags & IOMAP_ATOMIC)
>  		return false;
>  
>  	/* can only try again if we wrote nothing */
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index b9f59ca43c15..6ac7a1534f7c 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -349,7 +349,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
>  	if (dio->flags & IOMAP_DIO_WRITE) {
>  		bio_opf |= REQ_OP_WRITE;
>  
> -		if (iter->flags & IOMAP_ATOMIC_HW) {
> +		if (iomap->flags & IOMAP_F_ATOMIC_BIO) {
>  			/*
>  			 * Ensure that the mapping covers the full write
>  			 * length, otherwise it won't be submitted as a single
> @@ -677,10 +677,8 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  			iomi.flags |= IOMAP_OVERWRITE_ONLY;
>  		}
>  
> -		if (dio_flags & IOMAP_DIO_ATOMIC_SW)
> -			iomi.flags |= IOMAP_ATOMIC_SW;
> -		else if (iocb->ki_flags & IOCB_ATOMIC)
> -			iomi.flags |= IOMAP_ATOMIC_HW;
> +		if (iocb->ki_flags & IOCB_ATOMIC)
> +			iomi.flags |= IOMAP_ATOMIC;
>  
>  		/* for data sync or sync, we need sync completion processing */
>  		if (iocb_is_dsync(iocb)) {
> diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
> index 69af89044ebd..9eab2c8ac3c5 100644
> --- a/fs/iomap/trace.h
> +++ b/fs/iomap/trace.h
> @@ -99,7 +99,7 @@ DEFINE_RANGE_EVENT(iomap_dio_rw_queued);
>  	{ IOMAP_FAULT,		"FAULT" }, \
>  	{ IOMAP_DIRECT,		"DIRECT" }, \
>  	{ IOMAP_NOWAIT,		"NOWAIT" }, \
> -	{ IOMAP_ATOMIC_HW,	"ATOMIC_HW" }
> +	{ IOMAP_ATOMIC,		"ATOMIC" }
>  
>  #define IOMAP_F_FLAGS_STRINGS \
>  	{ IOMAP_F_NEW,		"NEW" }, \
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 5dd0922fe2d1..ee40dc509413 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -828,6 +828,10 @@ xfs_direct_write_iomap_begin(
>  	if (offset + length > i_size_read(inode))
>  		iomap_flags |= IOMAP_F_DIRTY;
>  
> +	/* HW-offload atomics are always used in this path */
> +	if (flags & IOMAP_ATOMIC)
> +		iomap_flags |= IOMAP_F_ATOMIC_BIO;
> +
>  	/*
>  	 * COW writes may allocate delalloc space or convert unwritten COW
>  	 * extents, so we need to make sure to take the lock exclusively here.
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 9cd93530013c..02fe001feebb 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -60,6 +60,9 @@ struct vm_fault;
>   * IOMAP_F_ANON_WRITE indicates that (write) I/O does not have a target block
>   * assigned to it yet and the file system will do that in the bio submission
>   * handler, splitting the I/O as needed.
> + *
> + * IOMAP_F_ATOMIC_BIO indicates that (write) I/O will be issued as an atomic
> + * bio, i.e. set REQ_ATOMIC.
>   */
>  #define IOMAP_F_NEW		(1U << 0)
>  #define IOMAP_F_DIRTY		(1U << 1)
> @@ -73,6 +76,7 @@ struct vm_fault;
>  #define IOMAP_F_XATTR		(1U << 5)
>  #define IOMAP_F_BOUNDARY	(1U << 6)
>  #define IOMAP_F_ANON_WRITE	(1U << 7)
> +#define IOMAP_F_ATOMIC_BIO	(1U << 8)
>  
>  /*
>   * Flags set by the core iomap code during operations:
> @@ -189,9 +193,8 @@ struct iomap_folio_ops {
>  #else
>  #define IOMAP_DAX		0
>  #endif /* CONFIG_FS_DAX */
> -#define IOMAP_ATOMIC_HW		(1 << 9) /* HW-based torn-write protection */
> +#define IOMAP_ATOMIC		(1 << 9) /* torn-write protection */
>  #define IOMAP_DONTCACHE		(1 << 10)
> -#define IOMAP_ATOMIC_SW		(1 << 11)/* SW-based torn-write protection */
>  
>  struct iomap_ops {
>  	/*
> @@ -503,11 +506,6 @@ struct iomap_dio_ops {
>   */
>  #define IOMAP_DIO_PARTIAL		(1 << 2)
>  
> -/*
> - * Use software-based torn-write protection.
> - */
> -#define IOMAP_DIO_ATOMIC_SW		(1 << 3)
> -
>  ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
>  		unsigned int dio_flags, void *private, size_t done_before);
> -- 
> 2.31.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] iomap: comment on atomic write checks in iomap_dio_bio_iter()
  2025-03-20 12:02 ` [PATCH 2/3] iomap: comment on atomic write checks in iomap_dio_bio_iter() John Garry
  2025-03-20 14:09   ` Christoph Hellwig
@ 2025-03-20 19:32   ` Ritesh Harjani
  1 sibling, 0 replies; 15+ messages in thread
From: Ritesh Harjani @ 2025-03-20 19:32 UTC (permalink / raw)
  To: John Garry, brauner, djwong, hch
  Cc: linux-fsdevel, dchinner, linux-xfs, linux-kernel, ojaswin,
	martin.petersen, tytso, linux-ext4, John Garry

John Garry <john.g.garry@oracle.com> writes:

> Help explain the code.
>
> Also clarify the comment for bio size check.

Looks good to me. Feel free to add:

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>



>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  fs/iomap/direct-io.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 8c1bec473586..b9f59ca43c15 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -350,6 +350,11 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
>  		bio_opf |= REQ_OP_WRITE;
>  
>  		if (iter->flags & IOMAP_ATOMIC_HW) {
> +			/*
> +			 * Ensure that the mapping covers the full write
> +			 * length, otherwise it won't be submitted as a single
> +			 * bio, which is required to use hardware atomics.
> +			 */
>  			if (length != iter->len)
>  				return -EINVAL;
>  			bio_opf |= REQ_ATOMIC;
> @@ -449,7 +454,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
>  		n = bio->bi_iter.bi_size;
>  		if (WARN_ON_ONCE((bio_opf & REQ_ATOMIC) && n != length)) {
>  			/*
> -			 * This bio should have covered the complete length,
> +			 * An atomic write bio must cover the complete length,
>  			 * which it doesn't, so error. We may need to zero out
>  			 * the tail (complete FS block), similar to when
>  			 * bio_iov_iter_get_pages() returns an error, above.
> -- 
> 2.31.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] iomap: rework IOMAP atomic flags
  2025-03-20 12:02 ` [PATCH 3/3] iomap: rework IOMAP atomic flags John Garry
  2025-03-20 14:10   ` Christoph Hellwig
  2025-03-20 19:29   ` Ritesh Harjani
@ 2025-03-22 19:47   ` Ritesh Harjani
  2025-03-23  6:38     ` Christoph Hellwig
  2 siblings, 1 reply; 15+ messages in thread
From: Ritesh Harjani @ 2025-03-22 19:47 UTC (permalink / raw)
  To: John Garry, brauner, djwong, hch
  Cc: linux-fsdevel, dchinner, linux-xfs, linux-kernel, ojaswin,
	martin.petersen, tytso, linux-ext4, John Garry

John Garry <john.g.garry@oracle.com> writes:

> Flag IOMAP_ATOMIC_SW is not really required. The idea of having this flag
> is that the FS ->iomap_begin callback could check if this flag is set to
> decide whether to do a SW (FS-based) atomic write. But the FS can set
> which ->iomap_begin callback it wants when deciding to do a FS-based
> atomic write.
>
> Furthermore, it was thought that IOMAP_ATOMIC_HW is not a proper name, as
> the block driver can use SW-methods to emulate an atomic write. So change
> back to IOMAP_ATOMIC.
>
> The ->iomap_begin callback needs though to indicate to iomap core that
> REQ_ATOMIC needs to be set, so add IOMAP_F_ATOMIC_BIO for that.
>
> These changes were suggested by Christoph Hellwig and Dave Chinner.
>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  .../filesystems/iomap/operations.rst          | 35 ++++++++++---------
>  fs/ext4/inode.c                               |  6 +++-
>  fs/iomap/direct-io.c                          |  8 ++---
>  fs/iomap/trace.h                              |  2 +-
>  fs/xfs/xfs_iomap.c                            |  4 +++
>  include/linux/iomap.h                         | 12 +++----
>  6 files changed, 37 insertions(+), 30 deletions(-)
>
> diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
> index b08a79d11d9f..3b628e370d88 100644
> --- a/Documentation/filesystems/iomap/operations.rst
> +++ b/Documentation/filesystems/iomap/operations.rst
> @@ -514,29 +514,32 @@ IOMAP_WRITE`` with any combination of the following enhancements:
>     if the mapping is unwritten and the filesystem cannot handle zeroing
>     the unaligned regions without exposing stale contents.
>  
> - * ``IOMAP_ATOMIC_HW``: This write is being issued with torn-write
> -   protection based on HW-offload support.
> -   Only a single bio can be created for the write, and the write must
> -   not be split into multiple I/O requests, i.e. flag REQ_ATOMIC must be
> -   set.
> + * ``IOMAP_ATOMIC``: This write is being issued with torn-write
> +   protection.
> +   Torn-write protection may be provided based on HW-offload or by a
> +   software mechanism provided by the filesystem.
> +
> +   For HW-offload based support, only a single bio can be created for the
> +   write, and the write must not be split into multiple I/O requests, i.e.
> +   flag REQ_ATOMIC must be set.
>     The file range to write must be aligned to satisfy the requirements
>     of both the filesystem and the underlying block device's atomic
>     commit capabilities.
>     If filesystem metadata updates are required (e.g. unwritten extent
> -   conversion or copy on write), all updates for the entire file range
> +   conversion or copy-on-write), all updates for the entire file range
>     must be committed atomically as well.
> -   Only one space mapping is allowed per untorn write.
> -   Untorn writes may be longer than a single file block. In all cases,
> +   Untorn-writes may be longer than a single file block. In all cases,
>     the mapping start disk block must have at least the same alignment as
>     the write offset.
> -
> - * ``IOMAP_ATOMIC_SW``: This write is being issued with torn-write
> -   protection via a software mechanism provided by the filesystem.
> -   All the disk block alignment and single bio restrictions which apply
> -   to IOMAP_ATOMIC_HW do not apply here.
> -   SW-based untorn writes would typically be used as a fallback when
> -   HW-based untorn writes may not be issued, e.g. the range of the write
> -   covers multiple extents, meaning that it is not possible to issue
> +   The filesystems must set IOMAP_F_ATOMIC_BIO to inform iomap core of an
> +   untorn-write based on HW-offload.
> +
> +   For untorn-writes based on a software mechanism provided by the
> +   filesystem, all the disk block alignment and single bio restrictions
> +   which apply for HW-offload based untorn-writes do not apply.
> +   The mechanism would typically be used as a fallback for when
> +   HW-offload based untorn-writes may not be issued, e.g. the range of the
> +   write covers multiple extents, meaning that it is not possible to issue
>     a single bio.
>     All filesystem metadata updates for the entire file range must be
>     committed atomically as well.
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index ba2f1e3db7c7..d04d8a7f12e7 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3290,6 +3290,10 @@ static void ext4_set_iomap(struct inode *inode, struct iomap *iomap,
>  	if (map->m_flags & EXT4_MAP_NEW)
>  		iomap->flags |= IOMAP_F_NEW;
>  
> +	/* HW-offload atomics are always used */
> +	if (flags & IOMAP_ATOMIC)
> +		iomap->flags |= IOMAP_F_ATOMIC_BIO;
> +
>  	if (flags & IOMAP_DAX)
>  		iomap->dax_dev = EXT4_SB(inode->i_sb)->s_daxdev;
>  	else
> @@ -3467,7 +3471,7 @@ static inline bool ext4_want_directio_fallback(unsigned flags, ssize_t written)
>  		return false;
>  
>  	/* atomic writes are all-or-nothing */
> -	if (flags & IOMAP_ATOMIC_HW)
> +	if (flags & IOMAP_ATOMIC)
>  		return false;
>  
>  	/* can only try again if we wrote nothing */
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index b9f59ca43c15..6ac7a1534f7c 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -349,7 +349,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
>  	if (dio->flags & IOMAP_DIO_WRITE) {
>  		bio_opf |= REQ_OP_WRITE;
>  
> -		if (iter->flags & IOMAP_ATOMIC_HW) {
> +		if (iomap->flags & IOMAP_F_ATOMIC_BIO) {
>  			/*
>  			 * Ensure that the mapping covers the full write
>  			 * length, otherwise it won't be submitted as a single
> @@ -677,10 +677,8 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  			iomi.flags |= IOMAP_OVERWRITE_ONLY;
>  		}
>  
> -		if (dio_flags & IOMAP_DIO_ATOMIC_SW)
> -			iomi.flags |= IOMAP_ATOMIC_SW;
> -		else if (iocb->ki_flags & IOCB_ATOMIC)
> -			iomi.flags |= IOMAP_ATOMIC_HW;
> +		if (iocb->ki_flags & IOCB_ATOMIC)
> +			iomi.flags |= IOMAP_ATOMIC;
>  
>  		/* for data sync or sync, we need sync completion processing */
>  		if (iocb_is_dsync(iocb)) {
> diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
> index 69af89044ebd..9eab2c8ac3c5 100644
> --- a/fs/iomap/trace.h
> +++ b/fs/iomap/trace.h
> @@ -99,7 +99,7 @@ DEFINE_RANGE_EVENT(iomap_dio_rw_queued);
>  	{ IOMAP_FAULT,		"FAULT" }, \
>  	{ IOMAP_DIRECT,		"DIRECT" }, \
>  	{ IOMAP_NOWAIT,		"NOWAIT" }, \
> -	{ IOMAP_ATOMIC_HW,	"ATOMIC_HW" }
> +	{ IOMAP_ATOMIC,		"ATOMIC" }
>  
>  #define IOMAP_F_FLAGS_STRINGS \
>  	{ IOMAP_F_NEW,		"NEW" }, \
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 5dd0922fe2d1..ee40dc509413 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -828,6 +828,10 @@ xfs_direct_write_iomap_begin(
>  	if (offset + length > i_size_read(inode))
>  		iomap_flags |= IOMAP_F_DIRTY;
>  
> +	/* HW-offload atomics are always used in this path */
> +	if (flags & IOMAP_ATOMIC)
> +		iomap_flags |= IOMAP_F_ATOMIC_BIO;
> +
>  	/*
>  	 * COW writes may allocate delalloc space or convert unwritten COW
>  	 * extents, so we need to make sure to take the lock exclusively here.
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 9cd93530013c..02fe001feebb 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -60,6 +60,9 @@ struct vm_fault;
>   * IOMAP_F_ANON_WRITE indicates that (write) I/O does not have a target block
>   * assigned to it yet and the file system will do that in the bio submission
>   * handler, splitting the I/O as needed.
> + *
> + * IOMAP_F_ATOMIC_BIO indicates that (write) I/O will be issued as an atomic
> + * bio, i.e. set REQ_ATOMIC.
>   */
>  #define IOMAP_F_NEW		(1U << 0)
>  #define IOMAP_F_DIRTY		(1U << 1)
> @@ -73,6 +76,7 @@ struct vm_fault;
>  #define IOMAP_F_XATTR		(1U << 5)
>  #define IOMAP_F_BOUNDARY	(1U << 6)
>  #define IOMAP_F_ANON_WRITE	(1U << 7)
> +#define IOMAP_F_ATOMIC_BIO	(1U << 8)


Oops, sorry I am not sure how did I miss this during review.
(1U << 8) is already taken by IOMAP_F_SIZE_CHANGED flag. Then I guess
it will be wrong to use the same value for IOMAP_F_ATOMIC_BIO too, since
both are used for setting iomap->flags.

Although IOMAP_F_SIZE_CHANGED is only set in buffered-io operation i.e.
iomap_write_iter() , so it wouldn't break anything as of now, until the
atomic write support gets added to buffered-io, at which this will be a
problem. 
Either ways I guess, this needs to be fixed.

<snip from include/linux/iomap.h>
#define IOMAP_F_ATOMIC_BIO	(1U << 8)

/*
 * Flags set by the core iomap code during operations:
 *
 * IOMAP_F_SIZE_CHANGED indicates to the iomap_end method that the file size
 * has changed as the result of this write operation.
 *
 * IOMAP_F_STALE indicates that the iomap is not valid any longer and the file
 * range it covers needs to be remapped by the high level before the operation
 * can proceed.
 */
#define IOMAP_F_SIZE_CHANGED	(1U << 8)



So, I guess we can shift IOMAP_F_SIZE_CHANGED and IOMAP_F_STALE by
1 bit. So it will all look like.. 


#define IOMAP_F_ATOMIC_BIO	(1U << 8)

/*
 * Flags set by the core iomap code during operations:
 *
 * IOMAP_F_SIZE_CHANGED indicates to the iomap_end method that the file size
 * has changed as the result of this write operation.
 *
 * IOMAP_F_STALE indicates that the iomap is not valid any longer and the file
 * range it covers needs to be remapped by the high level before the operation
 * can proceed.
 */

#define IOMAP_F_SIZE_CHANGED	(1U << 9)
#define IOMAP_F_STALE		(1U << 10)

...
/*
 * Flags from 0x1000 up are for file system specific usage:
 */
#define IOMAP_F_PRIVATE		(1U << 12)


Thoughts?


-ritesh


>  
>  /*
>   * Flags set by the core iomap code during operations:
> @@ -189,9 +193,8 @@ struct iomap_folio_ops {
>  #else
>  #define IOMAP_DAX		0
>  #endif /* CONFIG_FS_DAX */
> -#define IOMAP_ATOMIC_HW		(1 << 9) /* HW-based torn-write protection */
> +#define IOMAP_ATOMIC		(1 << 9) /* torn-write protection */
>  #define IOMAP_DONTCACHE		(1 << 10)
> -#define IOMAP_ATOMIC_SW		(1 << 11)/* SW-based torn-write protection */
>  
>  struct iomap_ops {
>  	/*
> @@ -503,11 +506,6 @@ struct iomap_dio_ops {
>   */
>  #define IOMAP_DIO_PARTIAL		(1 << 2)
>  
> -/*
> - * Use software-based torn-write protection.
> - */
> -#define IOMAP_DIO_ATOMIC_SW		(1 << 3)
> -
>  ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
>  		unsigned int dio_flags, void *private, size_t done_before);
> -- 
> 2.31.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] iomap: rework IOMAP atomic flags
  2025-03-22 19:47   ` Ritesh Harjani
@ 2025-03-23  6:38     ` Christoph Hellwig
  2025-03-23 13:07       ` John Garry
  2025-03-23 13:42       ` Ritesh Harjani
  0 siblings, 2 replies; 15+ messages in thread
From: Christoph Hellwig @ 2025-03-23  6:38 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: John Garry, brauner, djwong, hch, linux-fsdevel, dchinner,
	linux-xfs, linux-kernel, ojaswin, martin.petersen, tytso,
	linux-ext4

On Sun, Mar 23, 2025 at 01:17:08AM +0530, Ritesh Harjani wrote:

[full quote deleted, can you please properly trim your replies?]

> So, I guess we can shift IOMAP_F_SIZE_CHANGED and IOMAP_F_STALE by
> 1 bit. So it will all look like.. 

Let's create some more space to avoid this for the next round, e.g.
count the core set flags from 31 down, and limit IOMAP_F_PRIVATE to a
single flag, which is how it is used.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] iomap: rework IOMAP atomic flags
  2025-03-23  6:38     ` Christoph Hellwig
@ 2025-03-23 13:07       ` John Garry
  2025-03-23 13:42       ` Ritesh Harjani
  1 sibling, 0 replies; 15+ messages in thread
From: John Garry @ 2025-03-23 13:07 UTC (permalink / raw)
  To: Christoph Hellwig, Ritesh Harjani
  Cc: brauner, djwong, linux-fsdevel, dchinner, linux-xfs, linux-kernel,
	ojaswin, martin.petersen, tytso, linux-ext4

On 23/03/2025 06:38, Christoph Hellwig wrote:
> On Sun, Mar 23, 2025 at 01:17:08AM +0530, Ritesh Harjani wrote:

@ Ritesh, thanks for the notice - are you ok to send a fix for this? At 
a glance, it seems that those two conflicting flags won't cross paths in 
practice (but obvs still need to fix this).

> 
> [full quote deleted, can you please properly trim your replies?]
> 
>> So, I guess we can shift IOMAP_F_SIZE_CHANGED and IOMAP_F_STALE by
>> 1 bit. So it will all look like..
> 
> Let's create some more space to avoid this for the next round, e.g.
> count the core set flags from 31 down, and limit IOMAP_F_PRIVATE to a
> single flag, which is how it is used.
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] iomap: rework IOMAP atomic flags
  2025-03-23  6:38     ` Christoph Hellwig
  2025-03-23 13:07       ` John Garry
@ 2025-03-23 13:42       ` Ritesh Harjani
  2025-03-26 15:50         ` John Garry
  2025-03-27 10:46         ` Christoph Hellwig
  1 sibling, 2 replies; 15+ messages in thread
From: Ritesh Harjani @ 2025-03-23 13:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: John Garry, brauner, djwong, hch, linux-fsdevel, dchinner,
	linux-xfs, linux-kernel, ojaswin, martin.petersen, tytso,
	linux-ext4

Christoph Hellwig <hch@lst.de> writes:

> On Sun, Mar 23, 2025 at 01:17:08AM +0530, Ritesh Harjani wrote:
>
> [full quote deleted, can you please properly trim your replies?]
>

Sure.

>> So, I guess we can shift IOMAP_F_SIZE_CHANGED and IOMAP_F_STALE by
>> 1 bit. So it will all look like.. 
>
> Let's create some more space to avoid this for the next round, e.g.

Sure, that make sense. 

> count the core set flags from 31 down, and limit IOMAP_F_PRIVATE to a
> single flag, which is how it is used.

flags in struct iomap is of type u16. So will make core iomap flags
starting from bit 15, moving downwards. 

Here is a diff of what I think you meant - let me know if this diff
looks good to you? 



diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 02fe001feebb..68416b135151 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -78,6 +78,11 @@ struct vm_fault;
 #define IOMAP_F_ANON_WRITE     (1U << 7)
 #define IOMAP_F_ATOMIC_BIO     (1U << 8)

+/*
+ * Flag reserved for file system specific usage
+ */
+#define IOMAP_F_PRIVATE                (1U << 12)
+
 /*
  * Flags set by the core iomap code during operations:
  *
@@ -88,14 +93,8 @@ struct vm_fault;
  * range it covers needs to be remapped by the high level before the operation
  * can proceed.
  */
-#define IOMAP_F_SIZE_CHANGED   (1U << 8)
-#define IOMAP_F_STALE          (1U << 9)
-
-/*
- * Flags from 0x1000 up are for file system specific usage:
- */
-#define IOMAP_F_PRIVATE                (1U << 12)
-
+#define IOMAP_F_SIZE_CHANGED   (1U << 14)
+#define IOMAP_F_STALE          (1U << 15)

 /*
  * Magic value for addr:



(PS: I might be on transit / travel for some other work for a week. My reponses may be delayed.)
-ritesh


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] iomap: rework IOMAP atomic flags
  2025-03-23 13:42       ` Ritesh Harjani
@ 2025-03-26 15:50         ` John Garry
  2025-03-27 10:46         ` Christoph Hellwig
  1 sibling, 0 replies; 15+ messages in thread
From: John Garry @ 2025-03-26 15:50 UTC (permalink / raw)
  To: Ritesh Harjani (IBM), Christoph Hellwig
  Cc: brauner, djwong, linux-fsdevel, dchinner, linux-xfs, linux-kernel,
	ojaswin, martin.petersen, tytso, linux-ext4

>>> So, I guess we can shift IOMAP_F_SIZE_CHANGED and IOMAP_F_STALE by
>>> 1 bit. So it will all look like..
>>
>> Let's create some more space to avoid this for the next round, e.g.
> 
> Sure, that make sense.
> 
>> count the core set flags from 31 down, and limit IOMAP_F_PRIVATE to a
>> single flag, which is how it is used.
> 
> flags in struct iomap is of type u16. So will make core iomap flags
> starting from bit 15, moving downwards.
> 
> Here is a diff of what I think you meant - let me know if this diff
> looks good to you?

This is still outstanding, and it would be nice to fix this ASAP.

How about we go to 32b and change IOMAP_F_PRIVATE for v6.16, while just 
fix as suggested originally (by renumbering) for v6.15?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] iomap: rework IOMAP atomic flags
  2025-03-23 13:42       ` Ritesh Harjani
  2025-03-26 15:50         ` John Garry
@ 2025-03-27 10:46         ` Christoph Hellwig
  1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2025-03-27 10:46 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: Christoph Hellwig, John Garry, brauner, djwong, linux-fsdevel,
	dchinner, linux-xfs, linux-kernel, ojaswin, martin.petersen,
	tytso, linux-ext4

On Sun, Mar 23, 2025 at 07:12:02PM +0530, Ritesh Harjani wrote:
> flags in struct iomap is of type u16. So will make core iomap flags
> starting from bit 15, moving downwards. 
> 
> Here is a diff of what I think you meant - let me know if this diff
> looks good to you? 

Yes, this looks good to me.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-03-27 10:46 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-20 12:02 [PATCH 0/3] further iomap large atomic writes changes John Garry
2025-03-20 12:02 ` [PATCH 1/3] iomap: inline iomap_dio_bio_opflags() John Garry
2025-03-20 12:02 ` [PATCH 2/3] iomap: comment on atomic write checks in iomap_dio_bio_iter() John Garry
2025-03-20 14:09   ` Christoph Hellwig
2025-03-20 19:32   ` Ritesh Harjani
2025-03-20 12:02 ` [PATCH 3/3] iomap: rework IOMAP atomic flags John Garry
2025-03-20 14:10   ` Christoph Hellwig
2025-03-20 19:29   ` Ritesh Harjani
2025-03-22 19:47   ` Ritesh Harjani
2025-03-23  6:38     ` Christoph Hellwig
2025-03-23 13:07       ` John Garry
2025-03-23 13:42       ` Ritesh Harjani
2025-03-26 15:50         ` John Garry
2025-03-27 10:46         ` Christoph Hellwig
2025-03-20 14:16 ` [PATCH 0/3] further iomap large atomic writes changes Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).