public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] large atomic writes for xfs
@ 2025-01-02 14:04 John Garry
  2025-01-02 14:04 ` [PATCH v3 1/7] iomap: Increase iomap_dio_zero() size limit John Garry
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: John Garry @ 2025-01-02 14:04 UTC (permalink / raw)
  To: brauner, djwong, cem, dchinner, hch, ritesh.list
  Cc: linux-xfs, linux-fsdevel, linux-kernel, martin.petersen,
	John Garry

Currently the atomic write unit min and max is fixed at the FS blocksize
for xfs and ext4.

This series expands support to allow multiple FS blocks to be written
atomically.

To allow multiple blocks be written atomically, the fs must ensure blocks
are allocated with some alignment and granularity. For xfs, today only
rtvol provides this through rt_extsize. So initial support for large
atomic writes will be for rtvol here. Support can easily be expanded to
regular files through the proposed forcealign feature.

An atomic write which spans mixed unwritten and mapped extents will be
required to have the unwritten extents pre-zeroed, which will be supported
in iomap.

Based on bf354410af83 ("Merge tag 'xfs-6.13-fixes_2024-12-12' of
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into next-rc")

Patches available at the following:
https://github.com/johnpgarry/linux/tree/atomic-write-large-atomics-v6.13-v3

Changes since v2:
- Don't zero unwritten for single block atomic write
- Store RT atomic write unit max in FSBs

Changes since v1:
- Add extent zeroing support
- Rebase

John Garry (6):
  iomap: Increase iomap_dio_zero() size limit
  iomap: Add zero unwritten mappings dio support
  xfs: Add extent zeroing support for atomic writes
  xfs: Switch atomic write size check in xfs_file_write_iter()
  xfs: Add RT atomic write unit max to xfs_mount
  xfs: Update xfs_get_atomic_write_attr() for large atomic writes

Ritesh Harjani (IBM) (1):
  iomap: Lift blocksize restriction on atomic writes

 fs/iomap/direct-io.c   | 100 +++++++++++++++++++++++++++++++++++---
 fs/xfs/libxfs/xfs_sb.c |   3 ++
 fs/xfs/xfs_file.c      | 108 ++++++++++++++++++++++++++++++++++++++---
 fs/xfs/xfs_iops.c      |  21 +++++++-
 fs/xfs/xfs_iops.h      |   2 +
 fs/xfs/xfs_mount.h     |   1 +
 fs/xfs/xfs_rtalloc.c   |  23 +++++++++
 fs/xfs/xfs_rtalloc.h   |   4 ++
 include/linux/iomap.h  |   3 ++
 9 files changed, 248 insertions(+), 17 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/7] iomap: Increase iomap_dio_zero() size limit
  2025-01-02 14:04 [PATCH v3 0/7] large atomic writes for xfs John Garry
@ 2025-01-02 14:04 ` John Garry
  2025-01-02 14:04 ` [PATCH v3 2/7] iomap: Add zero unwritten mappings dio support John Garry
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: John Garry @ 2025-01-02 14:04 UTC (permalink / raw)
  To: brauner, djwong, cem, dchinner, hch, ritesh.list
  Cc: linux-xfs, linux-fsdevel, linux-kernel, martin.petersen,
	John Garry

Currently iomap_dio_zero() is limited to using a single bio to write up to
64K.

To support atomic writes larger than the FS block size, it may be required
to pre-zero some extents larger than 64K.

To increase the limit, fill each bio up in a loop.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/iomap/direct-io.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index b521eb15759e..23fdad16e6a8 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -240,27 +240,35 @@ void iomap_dio_bio_end_io(struct bio *bio)
 EXPORT_SYMBOL_GPL(iomap_dio_bio_end_io);
 
 static int iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
-		loff_t pos, unsigned len)
+		const loff_t pos, const unsigned len)
 {
 	struct inode *inode = file_inode(dio->iocb->ki_filp);
+	unsigned int remaining = len;
+	unsigned int nr_vecs;
 	struct bio *bio;
+	int i;
 
 	if (!len)
 		return 0;
-	/*
-	 * Max block size supported is 64k
-	 */
-	if (WARN_ON_ONCE(len > IOMAP_ZERO_PAGE_SIZE))
+
+	nr_vecs = DIV_ROUND_UP(len, IOMAP_ZERO_PAGE_SIZE);
+	if (WARN_ON_ONCE(nr_vecs > BIO_MAX_VECS))
 		return -EINVAL;
 
-	bio = iomap_dio_alloc_bio(iter, dio, 1, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE);
+	bio = iomap_dio_alloc_bio(iter, dio, nr_vecs,
+			REQ_OP_WRITE | REQ_SYNC | REQ_IDLE);
 	fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits,
 				  GFP_KERNEL);
 	bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos);
 	bio->bi_private = dio;
 	bio->bi_end_io = iomap_dio_bio_end_io;
 
-	__bio_add_page(bio, zero_page, len, 0);
+	for (i = 0; i < nr_vecs; i++) {
+		__bio_add_page(bio, zero_page,
+			min(remaining, IOMAP_ZERO_PAGE_SIZE), 0);
+		remaining -= IOMAP_ZERO_PAGE_SIZE;
+	}
+
 	iomap_dio_submit_bio(iter, dio, bio, pos);
 	return 0;
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/7] iomap: Add zero unwritten mappings dio support
  2025-01-02 14:04 [PATCH v3 0/7] large atomic writes for xfs John Garry
  2025-01-02 14:04 ` [PATCH v3 1/7] iomap: Increase iomap_dio_zero() size limit John Garry
@ 2025-01-02 14:04 ` John Garry
  2025-01-02 14:04 ` [PATCH v3 3/7] iomap: Lift blocksize restriction on atomic writes John Garry
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: John Garry @ 2025-01-02 14:04 UTC (permalink / raw)
  To: brauner, djwong, cem, dchinner, hch, ritesh.list
  Cc: linux-xfs, linux-fsdevel, linux-kernel, martin.petersen,
	John Garry

For atomic writes support, it is required to only ever submit a single bio
(for an atomic write).

Furthermore, currently the atomic write unit min and max limit is fixed at
the FS block size.

For lifting the atomic write unit max limit, it may occur that an atomic
write spans mixed unwritten and mapped extents. For this case, due to the
iterative nature of iomap, multiple bios would be produced, which is
intolerable.

Add a function to zero unwritten extents in a certain range, which may be
used to ensure that unwritten extents are zeroed prior to issuing of an
atomic write.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/iomap/direct-io.c  | 76 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/iomap.h |  3 ++
 2 files changed, 79 insertions(+)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 23fdad16e6a8..18c888f0c11f 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -805,6 +805,82 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 }
 EXPORT_SYMBOL_GPL(iomap_dio_rw);
 
+static loff_t
+iomap_dio_zero_unwritten_iter(struct iomap_iter *iter, struct iomap_dio *dio)
+{
+	const struct iomap *iomap = &iter->iomap;
+	loff_t length = iomap_length(iter);
+	loff_t pos = iter->pos;
+
+	if (iomap->type == IOMAP_UNWRITTEN) {
+		int ret;
+
+		dio->flags |= IOMAP_DIO_UNWRITTEN;
+		ret = iomap_dio_zero(iter, dio, pos, length);
+		if (ret)
+			return ret;
+	}
+
+	dio->size += length;
+
+	return length;
+}
+
+ssize_t
+iomap_dio_zero_unwritten(struct kiocb *iocb, struct iov_iter *iter,
+		const struct iomap_ops *ops, const struct iomap_dio_ops *dops)
+{
+	struct inode *inode = file_inode(iocb->ki_filp);
+	struct iomap_dio *dio;
+	ssize_t ret;
+	struct iomap_iter iomi = {
+		.inode		= inode,
+		.pos		= iocb->ki_pos,
+		.len		= iov_iter_count(iter),
+		.flags		= IOMAP_WRITE,
+	};
+
+	dio = kzalloc(sizeof(*dio), GFP_KERNEL);
+	if (!dio)
+		return -ENOMEM;
+
+	dio->iocb = iocb;
+	atomic_set(&dio->ref, 1);
+	dio->i_size = i_size_read(inode);
+	dio->dops = dops;
+	dio->submit.waiter = current;
+	dio->wait_for_completion = true;
+
+	inode_dio_begin(inode);
+
+	while ((ret = iomap_iter(&iomi, ops)) > 0)
+		iomi.processed = iomap_dio_zero_unwritten_iter(&iomi, dio);
+
+	if (ret < 0)
+		iomap_dio_set_error(dio, ret);
+
+	if (!atomic_dec_and_test(&dio->ref)) {
+		for (;;) {
+			set_current_state(TASK_UNINTERRUPTIBLE);
+			if (!READ_ONCE(dio->submit.waiter))
+				break;
+
+			blk_io_schedule();
+		}
+		__set_current_state(TASK_RUNNING);
+	}
+
+	if (dops && dops->end_io)
+		ret = dops->end_io(iocb, dio->size, ret, dio->flags);
+
+	kfree(dio);
+
+	inode_dio_end(file_inode(iocb->ki_filp));
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iomap_dio_zero_unwritten);
+
 static int __init iomap_dio_init(void)
 {
 	zero_page = alloc_pages(GFP_KERNEL | __GFP_ZERO,
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 5675af6b740c..c2d44b9e446d 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -440,6 +440,9 @@ ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 struct iomap_dio *__iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
 		unsigned int dio_flags, void *private, size_t done_before);
+ssize_t iomap_dio_zero_unwritten(struct kiocb *iocb, struct iov_iter *iter,
+		const struct iomap_ops *ops, const struct iomap_dio_ops *dops);
+
 ssize_t iomap_dio_complete(struct iomap_dio *dio);
 void iomap_dio_bio_end_io(struct bio *bio);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 3/7] iomap: Lift blocksize restriction on atomic writes
  2025-01-02 14:04 [PATCH v3 0/7] large atomic writes for xfs John Garry
  2025-01-02 14:04 ` [PATCH v3 1/7] iomap: Increase iomap_dio_zero() size limit John Garry
  2025-01-02 14:04 ` [PATCH v3 2/7] iomap: Add zero unwritten mappings dio support John Garry
@ 2025-01-02 14:04 ` John Garry
  2025-01-08  0:41   ` Darrick J. Wong
  2025-01-02 14:04 ` [PATCH v3 4/7] xfs: Add extent zeroing support for " John Garry
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: John Garry @ 2025-01-02 14:04 UTC (permalink / raw)
  To: brauner, djwong, cem, dchinner, hch, ritesh.list
  Cc: linux-xfs, linux-fsdevel, linux-kernel, martin.petersen,
	John Garry

From: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>

Filesystems like ext4 can submit writes in multiples of blocksizes.
But we still can't allow the writes to be split. Hence let's check if
the iomap_length() is same as iter->len or not.

It is the role of the FS to ensure that a single mapping may be created
for an atomic write. The FS will also continue to check size and alignment
legality.

Signed-off-by: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
jpg: Tweak commit message
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/iomap/direct-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 18c888f0c11f..6510bb5d5a6f 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -314,7 +314,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
 	size_t copied = 0;
 	size_t orig_count;
 
-	if (atomic && length != fs_block_size)
+	if (atomic && length != iter->len)
 		return -EINVAL;
 
 	if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) ||
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 4/7] xfs: Add extent zeroing support for atomic writes
  2025-01-02 14:04 [PATCH v3 0/7] large atomic writes for xfs John Garry
                   ` (2 preceding siblings ...)
  2025-01-02 14:04 ` [PATCH v3 3/7] iomap: Lift blocksize restriction on atomic writes John Garry
@ 2025-01-02 14:04 ` John Garry
  2025-01-02 14:04 ` [PATCH v3 5/7] xfs: Switch atomic write size check in xfs_file_write_iter() John Garry
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: John Garry @ 2025-01-02 14:04 UTC (permalink / raw)
  To: brauner, djwong, cem, dchinner, hch, ritesh.list
  Cc: linux-xfs, linux-fsdevel, linux-kernel, martin.petersen,
	John Garry

An atomic write which spans mixed unwritten and mapped extents would be
rejected. This is one reason why atomic write unit min and max is
currently fixed at the block size.

To enable large atomic writes, any unwritten extents need to be zeroed
before issuing the atomic write. So call iomap_dio_zero_unwritten() for
this scenario and retry the atomic write.

It can be detected if there is any unwritten extents by passing
IOMAP_DIO_OVERWRITE_ONLY to the original iomap_dio_rw() call.

After iomap_dio_zero_unwritten() is called then iomap_dio_rw() is retried -
if that fails then there really is something wrong.

However keep the same behaviour for writing a single block, i.e. we don't
need to pre-zero.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/xfs_file.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 96 insertions(+)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 9a435b1ff264..2c810f75dbbd 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -578,10 +578,47 @@ xfs_dio_write_end_io(
 	return error;
 }
 
+static int
+xfs_dio_write_end_zero_unwritten(
+	struct kiocb		*iocb,
+	ssize_t			size,
+	int			error,
+	unsigned		flags)
+{
+	struct inode		*inode = file_inode(iocb->ki_filp);
+	struct xfs_inode	*ip = XFS_I(inode);
+	loff_t			offset = iocb->ki_pos;
+	unsigned int		nofs_flag;
+
+	trace_xfs_end_io_direct_write(ip, offset, size);
+
+	if (xfs_is_shutdown(ip->i_mount))
+		return -EIO;
+
+	if (error)
+		return error;
+	if (WARN_ON_ONCE(!size))
+		return 0;
+	if (!(flags & IOMAP_DIO_UNWRITTEN))
+		return 0;
+
+	/* Same as xfs_dio_write_end_io() ... */
+	nofs_flag = memalloc_nofs_save();
+
+	error = xfs_iomap_write_unwritten(ip, offset, size, true);
+
+	memalloc_nofs_restore(nofs_flag);
+	return error;
+}
+
 static const struct iomap_dio_ops xfs_dio_write_ops = {
 	.end_io		= xfs_dio_write_end_io,
 };
 
+static const struct iomap_dio_ops xfs_dio_zero_ops = {
+	.end_io		= xfs_dio_write_end_zero_unwritten,
+};
+
 /*
  * Handle block aligned direct I/O writes
  */
@@ -619,6 +656,63 @@ xfs_file_dio_write_aligned(
 	return ret;
 }
 
+static noinline ssize_t
+xfs_file_dio_write_atomic(
+	struct xfs_inode	*ip,
+	struct kiocb		*iocb,
+	struct iov_iter		*from)
+{
+	unsigned int		iolock = XFS_IOLOCK_SHARED;
+	bool			do_zero = false;
+	unsigned int		dio_flags;
+	ssize_t			ret;
+
+	/*
+	 * Zero unwritten only for writing multiple blocks. Leverage
+	 * IOMAP_DIO_OVERWRITE_ONLY detecting when zeroing is required, as
+	 * it ensures that a single written mapping is provided.
+	 */
+	if (iov_iter_count(from) > ip->i_mount->m_sb.sb_blocksize)
+		dio_flags = IOMAP_DIO_OVERWRITE_ONLY;
+	else
+		dio_flags = 0;
+
+retry:
+	ret = xfs_ilock_iocb_for_write(iocb, &iolock);
+	if (ret)
+		return ret;
+
+	ret = xfs_file_write_checks(iocb, from, &iolock);
+	if (ret)
+		goto out_unlock;
+
+	if (do_zero) {
+		ret = iomap_dio_zero_unwritten(iocb, from,
+				&xfs_direct_write_iomap_ops,
+				&xfs_dio_zero_ops);
+		if (ret)
+			goto out_unlock;
+	}
+
+	trace_xfs_file_direct_write(iocb, from);
+	ret = iomap_dio_rw(iocb, from, &xfs_direct_write_iomap_ops,
+			&xfs_dio_write_ops, dio_flags, NULL, 0);
+
+	if (do_zero && ret < 0)
+		goto out_unlock;
+
+	if (ret == -EAGAIN && !(iocb->ki_flags & IOCB_NOWAIT)) {
+		xfs_iunlock(ip, iolock);
+		do_zero = true;
+		goto retry;
+	}
+
+out_unlock:
+	if (iolock)
+		xfs_iunlock(ip, iolock);
+	return ret;
+}
+
 /*
  * Handle block unaligned direct I/O writes
  *
@@ -723,6 +817,8 @@ xfs_file_dio_write(
 		return -EINVAL;
 	if ((iocb->ki_pos | count) & ip->i_mount->m_blockmask)
 		return xfs_file_dio_write_unaligned(ip, iocb, from);
+	if (iocb->ki_flags & IOCB_ATOMIC)
+		return xfs_file_dio_write_atomic(ip, iocb, from);
 	return xfs_file_dio_write_aligned(ip, iocb, from);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 5/7] xfs: Switch atomic write size check in xfs_file_write_iter()
  2025-01-02 14:04 [PATCH v3 0/7] large atomic writes for xfs John Garry
                   ` (3 preceding siblings ...)
  2025-01-02 14:04 ` [PATCH v3 4/7] xfs: Add extent zeroing support for " John Garry
@ 2025-01-02 14:04 ` John Garry
  2025-01-08  0:50   ` Darrick J. Wong
  2025-01-02 14:04 ` [PATCH v3 6/7] xfs: Add RT atomic write unit max to xfs_mount John Garry
  2025-01-02 14:04 ` [PATCH v3 7/7] xfs: Update xfs_get_atomic_write_attr() for large atomic writes John Garry
  6 siblings, 1 reply; 12+ messages in thread
From: John Garry @ 2025-01-02 14:04 UTC (permalink / raw)
  To: brauner, djwong, cem, dchinner, hch, ritesh.list
  Cc: linux-xfs, linux-fsdevel, linux-kernel, martin.petersen,
	John Garry

Currently atomic writes size permitted is fixed at the blocksize.

To start to remove this restriction, use xfs_get_atomic_write_attr() to
find the per-inode atomic write limits and check according to that.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/xfs_file.c | 12 +++++-------
 fs/xfs/xfs_iops.c |  2 +-
 fs/xfs/xfs_iops.h |  2 ++
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 2c810f75dbbd..68c22c0ab235 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -949,14 +949,12 @@ xfs_file_write_iter(
 		return xfs_file_dax_write(iocb, from);
 
 	if (iocb->ki_flags & IOCB_ATOMIC) {
-		/*
-		 * Currently only atomic writing of a single FS block is
-		 * supported. It would be possible to atomic write smaller than
-		 * a FS block, but there is no requirement to support this.
-		 * Note that iomap also does not support this yet.
-		 */
-		if (ocount != ip->i_mount->m_sb.sb_blocksize)
+		unsigned int unit_min, unit_max;
+
+		xfs_get_atomic_write_attr(ip, &unit_min, &unit_max);
+		if (ocount < unit_min || ocount > unit_max)
 			return -EINVAL;
+
 		ret = generic_atomic_write_valid(iocb, from);
 		if (ret)
 			return ret;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 207e0dadffc3..883ec45ae708 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -572,7 +572,7 @@ xfs_stat_blksize(
 	return max_t(uint32_t, PAGE_SIZE, mp->m_sb.sb_blocksize);
 }
 
-static void
+void
 xfs_get_atomic_write_attr(
 	struct xfs_inode	*ip,
 	unsigned int		*unit_min,
diff --git a/fs/xfs/xfs_iops.h b/fs/xfs/xfs_iops.h
index 3c1a2605ffd2..82d3ffbf7024 100644
--- a/fs/xfs/xfs_iops.h
+++ b/fs/xfs/xfs_iops.h
@@ -19,5 +19,7 @@ int xfs_inode_init_security(struct inode *inode, struct inode *dir,
 extern void xfs_setup_inode(struct xfs_inode *ip);
 extern void xfs_setup_iops(struct xfs_inode *ip);
 extern void xfs_diflags_to_iflags(struct xfs_inode *ip, bool init);
+extern void xfs_get_atomic_write_attr(struct xfs_inode	*ip,
+		unsigned int *unit_min, unsigned int *unit_max);
 
 #endif /* __XFS_IOPS_H__ */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 6/7] xfs: Add RT atomic write unit max to xfs_mount
  2025-01-02 14:04 [PATCH v3 0/7] large atomic writes for xfs John Garry
                   ` (4 preceding siblings ...)
  2025-01-02 14:04 ` [PATCH v3 5/7] xfs: Switch atomic write size check in xfs_file_write_iter() John Garry
@ 2025-01-02 14:04 ` John Garry
  2025-01-08  0:55   ` Darrick J. Wong
  2025-01-02 14:04 ` [PATCH v3 7/7] xfs: Update xfs_get_atomic_write_attr() for large atomic writes John Garry
  6 siblings, 1 reply; 12+ messages in thread
From: John Garry @ 2025-01-02 14:04 UTC (permalink / raw)
  To: brauner, djwong, cem, dchinner, hch, ritesh.list
  Cc: linux-xfs, linux-fsdevel, linux-kernel, martin.petersen,
	John Garry

rtvol guarantees alloc unit alignment through rt_extsize. As such, it is
possible to atomically write multiple FS blocks in a rtvol (up to
rt_extsize).

Add a member to xfs_mount to hold the pre-calculated atomic write unit max.

The value in rt_extsize is not necessarily a power-of-2, so find the
largest power-of-2 evenly divisible into rt_extsize.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_sb.c |  3 +++
 fs/xfs/xfs_mount.h     |  1 +
 fs/xfs/xfs_rtalloc.c   | 23 +++++++++++++++++++++++
 fs/xfs/xfs_rtalloc.h   |  4 ++++
 4 files changed, 31 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 3b5623611eba..6381060df901 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -25,6 +25,7 @@
 #include "xfs_da_format.h"
 #include "xfs_health.h"
 #include "xfs_ag.h"
+#include "xfs_rtalloc.h"
 #include "xfs_rtbitmap.h"
 #include "xfs_exchrange.h"
 #include "xfs_rtgroup.h"
@@ -1149,6 +1150,8 @@ xfs_sb_mount_rextsize(
 		rgs->blklog = 0;
 		rgs->blkmask = (uint64_t)-1;
 	}
+
+	xfs_rt_awu_update(mp);
 }
 
 /* Update incore sb rt extent size, then recompute the cached rt geometry. */
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index db9dade7d22a..f2f1d2c667cc 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -191,6 +191,7 @@ typedef struct xfs_mount {
 	bool			m_fail_unmount;
 	bool			m_finobt_nores; /* no per-AG finobt resv. */
 	bool			m_update_sb;	/* sb needs update in mount */
+	xfs_extlen_t		m_rt_awu_max;   /* rt atomic write unit max */
 
 	/*
 	 * Bitsets of per-fs metadata that have been checked and/or are sick.
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index fcfa6e0eb3ad..e3093f3c7670 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -735,6 +735,28 @@ xfs_rtginode_ensure(
 	return xfs_rtginode_create(rtg, type, true);
 }
 
+void
+xfs_rt_awu_update(
+	struct xfs_mount	*mp)
+{
+	xfs_agblock_t		rsize = mp->m_sb.sb_rextsize;
+	xfs_extlen_t		awu_max;
+
+	if (is_power_of_2(rsize)) {
+		mp->m_rt_awu_max = rsize;
+		return;
+	}
+
+	/* Find highest power-of-2 evenly divisible into sb_rextsize */
+	awu_max = 1;
+	while (1) {
+		if (rsize % (awu_max * 2))
+			break;
+		awu_max *= 2;
+	}
+	mp->m_rt_awu_max = awu_max;
+}
+
 static struct xfs_mount *
 xfs_growfs_rt_alloc_fake_mount(
 	const struct xfs_mount	*mp,
@@ -969,6 +991,7 @@ xfs_growfs_rt_bmblock(
 	 */
 	mp->m_rsumlevels = nmp->m_rsumlevels;
 	mp->m_rsumblocks = nmp->m_rsumblocks;
+	mp->m_rt_awu_max = nmp->m_rt_awu_max;
 
 	/*
 	 * Recompute the growfsrt reservation from the new rsumsize.
diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h
index 8e2a07b8174b..fcb7bb3df470 100644
--- a/fs/xfs/xfs_rtalloc.h
+++ b/fs/xfs/xfs_rtalloc.h
@@ -42,6 +42,10 @@ xfs_growfs_rt(
 	struct xfs_mount	*mp,	/* file system mount structure */
 	xfs_growfs_rt_t		*in);	/* user supplied growfs struct */
 
+void
+xfs_rt_awu_update(
+	struct xfs_mount	*mp);
+
 int xfs_rtalloc_reinit_frextents(struct xfs_mount *mp);
 #else
 # define xfs_growfs_rt(mp,in)				(-ENOSYS)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 7/7] xfs: Update xfs_get_atomic_write_attr() for large atomic writes
  2025-01-02 14:04 [PATCH v3 0/7] large atomic writes for xfs John Garry
                   ` (5 preceding siblings ...)
  2025-01-02 14:04 ` [PATCH v3 6/7] xfs: Add RT atomic write unit max to xfs_mount John Garry
@ 2025-01-02 14:04 ` John Garry
  2025-01-08  0:56   ` Darrick J. Wong
  6 siblings, 1 reply; 12+ messages in thread
From: John Garry @ 2025-01-02 14:04 UTC (permalink / raw)
  To: brauner, djwong, cem, dchinner, hch, ritesh.list
  Cc: linux-xfs, linux-fsdevel, linux-kernel, martin.petersen,
	John Garry

Update xfs_get_atomic_write_attr() to take into account that rtvol can
support atomic writes spanning multiple FS blocks.

For non-rtvol, we are still limited in min and max by the blocksize.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/xfs_iops.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 883ec45ae708..02b3f697936b 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -572,18 +572,35 @@ xfs_stat_blksize(
 	return max_t(uint32_t, PAGE_SIZE, mp->m_sb.sb_blocksize);
 }
 
+/* Returns max atomic write unit for a file, in bytes. */
+static unsigned int
+xfs_inode_atomicwrite_max(
+	struct xfs_inode	*ip)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+
+	if (XFS_IS_REALTIME_INODE(ip))
+		return XFS_FSB_TO_B(mp, mp->m_rt_awu_max);
+
+	return mp->m_sb.sb_blocksize;
+}
+
 void
 xfs_get_atomic_write_attr(
 	struct xfs_inode	*ip,
 	unsigned int		*unit_min,
 	unsigned int		*unit_max)
 {
+	struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
+	unsigned int		awu_max = xfs_inode_atomicwrite_max(ip);
+
 	if (!xfs_inode_can_atomicwrite(ip)) {
 		*unit_min = *unit_max = 0;
 		return;
 	}
 
-	*unit_min = *unit_max = ip->i_mount->m_sb.sb_blocksize;
+	*unit_min = ip->i_mount->m_sb.sb_blocksize;
+	*unit_max =  min(target->bt_bdev_awu_max, awu_max);
 }
 
 STATIC int
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/7] iomap: Lift blocksize restriction on atomic writes
  2025-01-02 14:04 ` [PATCH v3 3/7] iomap: Lift blocksize restriction on atomic writes John Garry
@ 2025-01-08  0:41   ` Darrick J. Wong
  0 siblings, 0 replies; 12+ messages in thread
From: Darrick J. Wong @ 2025-01-08  0:41 UTC (permalink / raw)
  To: John Garry
  Cc: brauner, cem, dchinner, hch, ritesh.list, linux-xfs,
	linux-fsdevel, linux-kernel, martin.petersen

On Thu, Jan 02, 2025 at 02:04:07PM +0000, John Garry wrote:
> From: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
> 
> Filesystems like ext4 can submit writes in multiples of blocksizes.
> But we still can't allow the writes to be split. Hence let's check if
> the iomap_length() is same as iter->len or not.
> 
> It is the role of the FS to ensure that a single mapping may be created
> for an atomic write. The FS will also continue to check size and alignment
> legality.
> 
> Signed-off-by: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
> jpg: Tweak commit message
> Signed-off-by: John Garry <john.g.garry@oracle.com>

Fine with me.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  fs/iomap/direct-io.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 18c888f0c11f..6510bb5d5a6f 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -314,7 +314,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
>  	size_t copied = 0;
>  	size_t orig_count;
>  
> -	if (atomic && length != fs_block_size)
> +	if (atomic && length != iter->len)
>  		return -EINVAL;
>  
>  	if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) ||
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 5/7] xfs: Switch atomic write size check in xfs_file_write_iter()
  2025-01-02 14:04 ` [PATCH v3 5/7] xfs: Switch atomic write size check in xfs_file_write_iter() John Garry
@ 2025-01-08  0:50   ` Darrick J. Wong
  0 siblings, 0 replies; 12+ messages in thread
From: Darrick J. Wong @ 2025-01-08  0:50 UTC (permalink / raw)
  To: John Garry
  Cc: brauner, cem, dchinner, hch, ritesh.list, linux-xfs,
	linux-fsdevel, linux-kernel, martin.petersen

On Thu, Jan 02, 2025 at 02:04:09PM +0000, John Garry wrote:
> Currently atomic writes size permitted is fixed at the blocksize.
> 
> To start to remove this restriction, use xfs_get_atomic_write_attr() to
> find the per-inode atomic write limits and check according to that.
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>

Seems reasonable to me.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_file.c | 12 +++++-------
>  fs/xfs/xfs_iops.c |  2 +-
>  fs/xfs/xfs_iops.h |  2 ++
>  3 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 2c810f75dbbd..68c22c0ab235 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -949,14 +949,12 @@ xfs_file_write_iter(
>  		return xfs_file_dax_write(iocb, from);
>  
>  	if (iocb->ki_flags & IOCB_ATOMIC) {
> -		/*
> -		 * Currently only atomic writing of a single FS block is
> -		 * supported. It would be possible to atomic write smaller than
> -		 * a FS block, but there is no requirement to support this.
> -		 * Note that iomap also does not support this yet.
> -		 */
> -		if (ocount != ip->i_mount->m_sb.sb_blocksize)
> +		unsigned int unit_min, unit_max;
> +
> +		xfs_get_atomic_write_attr(ip, &unit_min, &unit_max);
> +		if (ocount < unit_min || ocount > unit_max)
>  			return -EINVAL;
> +
>  		ret = generic_atomic_write_valid(iocb, from);
>  		if (ret)
>  			return ret;
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 207e0dadffc3..883ec45ae708 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -572,7 +572,7 @@ xfs_stat_blksize(
>  	return max_t(uint32_t, PAGE_SIZE, mp->m_sb.sb_blocksize);
>  }
>  
> -static void
> +void
>  xfs_get_atomic_write_attr(
>  	struct xfs_inode	*ip,
>  	unsigned int		*unit_min,
> diff --git a/fs/xfs/xfs_iops.h b/fs/xfs/xfs_iops.h
> index 3c1a2605ffd2..82d3ffbf7024 100644
> --- a/fs/xfs/xfs_iops.h
> +++ b/fs/xfs/xfs_iops.h
> @@ -19,5 +19,7 @@ int xfs_inode_init_security(struct inode *inode, struct inode *dir,
>  extern void xfs_setup_inode(struct xfs_inode *ip);
>  extern void xfs_setup_iops(struct xfs_inode *ip);
>  extern void xfs_diflags_to_iflags(struct xfs_inode *ip, bool init);
> +extern void xfs_get_atomic_write_attr(struct xfs_inode	*ip,
> +		unsigned int *unit_min, unsigned int *unit_max);
>  
>  #endif /* __XFS_IOPS_H__ */
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 6/7] xfs: Add RT atomic write unit max to xfs_mount
  2025-01-02 14:04 ` [PATCH v3 6/7] xfs: Add RT atomic write unit max to xfs_mount John Garry
@ 2025-01-08  0:55   ` Darrick J. Wong
  0 siblings, 0 replies; 12+ messages in thread
From: Darrick J. Wong @ 2025-01-08  0:55 UTC (permalink / raw)
  To: John Garry
  Cc: brauner, cem, dchinner, hch, ritesh.list, linux-xfs,
	linux-fsdevel, linux-kernel, martin.petersen

On Thu, Jan 02, 2025 at 02:04:10PM +0000, John Garry wrote:
> rtvol guarantees alloc unit alignment through rt_extsize. As such, it is
> possible to atomically write multiple FS blocks in a rtvol (up to
> rt_extsize).
> 
> Add a member to xfs_mount to hold the pre-calculated atomic write unit max.
> 
> The value in rt_extsize is not necessarily a power-of-2, so find the
> largest power-of-2 evenly divisible into rt_extsize.
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>

I guess that works.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_sb.c |  3 +++
>  fs/xfs/xfs_mount.h     |  1 +
>  fs/xfs/xfs_rtalloc.c   | 23 +++++++++++++++++++++++
>  fs/xfs/xfs_rtalloc.h   |  4 ++++
>  4 files changed, 31 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index 3b5623611eba..6381060df901 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -25,6 +25,7 @@
>  #include "xfs_da_format.h"
>  #include "xfs_health.h"
>  #include "xfs_ag.h"
> +#include "xfs_rtalloc.h"
>  #include "xfs_rtbitmap.h"
>  #include "xfs_exchrange.h"
>  #include "xfs_rtgroup.h"
> @@ -1149,6 +1150,8 @@ xfs_sb_mount_rextsize(
>  		rgs->blklog = 0;
>  		rgs->blkmask = (uint64_t)-1;
>  	}
> +
> +	xfs_rt_awu_update(mp);
>  }
>  
>  /* Update incore sb rt extent size, then recompute the cached rt geometry. */
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index db9dade7d22a..f2f1d2c667cc 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -191,6 +191,7 @@ typedef struct xfs_mount {
>  	bool			m_fail_unmount;
>  	bool			m_finobt_nores; /* no per-AG finobt resv. */
>  	bool			m_update_sb;	/* sb needs update in mount */
> +	xfs_extlen_t		m_rt_awu_max;   /* rt atomic write unit max */
>  
>  	/*
>  	 * Bitsets of per-fs metadata that have been checked and/or are sick.
> diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
> index fcfa6e0eb3ad..e3093f3c7670 100644
> --- a/fs/xfs/xfs_rtalloc.c
> +++ b/fs/xfs/xfs_rtalloc.c
> @@ -735,6 +735,28 @@ xfs_rtginode_ensure(
>  	return xfs_rtginode_create(rtg, type, true);
>  }
>  
> +void
> +xfs_rt_awu_update(
> +	struct xfs_mount	*mp)
> +{
> +	xfs_agblock_t		rsize = mp->m_sb.sb_rextsize;
> +	xfs_extlen_t		awu_max;
> +
> +	if (is_power_of_2(rsize)) {
> +		mp->m_rt_awu_max = rsize;
> +		return;
> +	}
> +
> +	/* Find highest power-of-2 evenly divisible into sb_rextsize */
> +	awu_max = 1;
> +	while (1) {
> +		if (rsize % (awu_max * 2))
> +			break;
> +		awu_max *= 2;
> +	}
> +	mp->m_rt_awu_max = awu_max;
> +}
> +
>  static struct xfs_mount *
>  xfs_growfs_rt_alloc_fake_mount(
>  	const struct xfs_mount	*mp,
> @@ -969,6 +991,7 @@ xfs_growfs_rt_bmblock(
>  	 */
>  	mp->m_rsumlevels = nmp->m_rsumlevels;
>  	mp->m_rsumblocks = nmp->m_rsumblocks;
> +	mp->m_rt_awu_max = nmp->m_rt_awu_max;
>  
>  	/*
>  	 * Recompute the growfsrt reservation from the new rsumsize.
> diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h
> index 8e2a07b8174b..fcb7bb3df470 100644
> --- a/fs/xfs/xfs_rtalloc.h
> +++ b/fs/xfs/xfs_rtalloc.h
> @@ -42,6 +42,10 @@ xfs_growfs_rt(
>  	struct xfs_mount	*mp,	/* file system mount structure */
>  	xfs_growfs_rt_t		*in);	/* user supplied growfs struct */
>  
> +void
> +xfs_rt_awu_update(
> +	struct xfs_mount	*mp);
> +
>  int xfs_rtalloc_reinit_frextents(struct xfs_mount *mp);
>  #else
>  # define xfs_growfs_rt(mp,in)				(-ENOSYS)
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 7/7] xfs: Update xfs_get_atomic_write_attr() for large atomic writes
  2025-01-02 14:04 ` [PATCH v3 7/7] xfs: Update xfs_get_atomic_write_attr() for large atomic writes John Garry
@ 2025-01-08  0:56   ` Darrick J. Wong
  0 siblings, 0 replies; 12+ messages in thread
From: Darrick J. Wong @ 2025-01-08  0:56 UTC (permalink / raw)
  To: John Garry
  Cc: brauner, cem, dchinner, hch, ritesh.list, linux-xfs,
	linux-fsdevel, linux-kernel, martin.petersen

On Thu, Jan 02, 2025 at 02:04:11PM +0000, John Garry wrote:
> Update xfs_get_atomic_write_attr() to take into account that rtvol can
> support atomic writes spanning multiple FS blocks.
> 
> For non-rtvol, we are still limited in min and max by the blocksize.
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>

Pretty straightforward to me.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_iops.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 883ec45ae708..02b3f697936b 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -572,18 +572,35 @@ xfs_stat_blksize(
>  	return max_t(uint32_t, PAGE_SIZE, mp->m_sb.sb_blocksize);
>  }
>  
> +/* Returns max atomic write unit for a file, in bytes. */
> +static unsigned int
> +xfs_inode_atomicwrite_max(
> +	struct xfs_inode	*ip)
> +{
> +	struct xfs_mount	*mp = ip->i_mount;
> +
> +	if (XFS_IS_REALTIME_INODE(ip))
> +		return XFS_FSB_TO_B(mp, mp->m_rt_awu_max);
> +
> +	return mp->m_sb.sb_blocksize;
> +}
> +
>  void
>  xfs_get_atomic_write_attr(
>  	struct xfs_inode	*ip,
>  	unsigned int		*unit_min,
>  	unsigned int		*unit_max)
>  {
> +	struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
> +	unsigned int		awu_max = xfs_inode_atomicwrite_max(ip);
> +
>  	if (!xfs_inode_can_atomicwrite(ip)) {
>  		*unit_min = *unit_max = 0;
>  		return;
>  	}
>  
> -	*unit_min = *unit_max = ip->i_mount->m_sb.sb_blocksize;
> +	*unit_min = ip->i_mount->m_sb.sb_blocksize;
> +	*unit_max =  min(target->bt_bdev_awu_max, awu_max);
>  }
>  
>  STATIC int
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-01-08  0:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-02 14:04 [PATCH v3 0/7] large atomic writes for xfs John Garry
2025-01-02 14:04 ` [PATCH v3 1/7] iomap: Increase iomap_dio_zero() size limit John Garry
2025-01-02 14:04 ` [PATCH v3 2/7] iomap: Add zero unwritten mappings dio support John Garry
2025-01-02 14:04 ` [PATCH v3 3/7] iomap: Lift blocksize restriction on atomic writes John Garry
2025-01-08  0:41   ` Darrick J. Wong
2025-01-02 14:04 ` [PATCH v3 4/7] xfs: Add extent zeroing support for " John Garry
2025-01-02 14:04 ` [PATCH v3 5/7] xfs: Switch atomic write size check in xfs_file_write_iter() John Garry
2025-01-08  0:50   ` Darrick J. Wong
2025-01-02 14:04 ` [PATCH v3 6/7] xfs: Add RT atomic write unit max to xfs_mount John Garry
2025-01-08  0:55   ` Darrick J. Wong
2025-01-02 14:04 ` [PATCH v3 7/7] xfs: Update xfs_get_atomic_write_attr() for large atomic writes John Garry
2025-01-08  0:56   ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox