From: John Garry <john.g.garry@oracle.com>
To: brauner@kernel.org, djwong@kernel.org, hch@lst.de,
viro@zeniv.linux.org.uk, jack@suse.cz, cem@kernel.org
Cc: linux-fsdevel@vger.kernel.org, dchinner@redhat.com,
linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org,
ojaswin@linux.ibm.com, ritesh.list@gmail.com,
martin.petersen@oracle.com, linux-ext4@vger.kernel.org,
linux-block@vger.kernel.org, catherine.hoang@oracle.com,
linux-api@vger.kernel.org, John Garry <john.g.garry@oracle.com>
Subject: [PATCH v11 13/16] xfs: add xfs_file_dio_write_atomic()
Date: Sun, 4 May 2025 08:59:20 +0000 [thread overview]
Message-ID: <20250504085923.1895402-14-john.g.garry@oracle.com> (raw)
In-Reply-To: <20250504085923.1895402-1-john.g.garry@oracle.com>
Add xfs_file_dio_write_atomic() for dedicated handling of atomic writes.
Now HW offload will not be required to support atomic writes and is
an optional feature.
CoW-based atomic writes will be supported with out-of-places write and
atomic extent remapping.
Either mode of operation may be used for an atomic write, depending on the
circumstances.
The preferred method is HW offload as it will be faster. If HW offload is
not available then we always use the CoW-based method. If HW offload is
available but not possible to use, then again we use the CoW-based method.
If available, HW offload would not be possible for the write length
exceeding the HW offload limit, the write spanning multiple extents,
unaligned disk blocks, etc.
Apart from the write exceeding the HW offload limit, other conditions for
HW offload usage can only be detected in the iomap handling for the write.
As such, we use a fallback method to issue the write if we detect in the
->iomap_begin() handler that HW offload is not possible. Special code
-ENOPROTOOPT is returned from ->iomap_begin() to inform that HW offload is
not possible.
In other words, atomic writes are supported on any filesystem that can
perform out of place write remapping atomically (i.e. reflink) up to
some fairly large size. If the conditions are right (a single correctly
aligned overwrite mapping) then the filesystem will use any available
hardware support to avoid the filesystem metadata updates.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
fs/xfs/xfs_file.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 68 insertions(+)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 32883ec8ca2e..f4a66ff85748 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -728,6 +728,72 @@ xfs_file_dio_write_zoned(
return ret;
}
+/*
+ * Handle block atomic writes
+ *
+ * Two methods of atomic writes are supported:
+ * - REQ_ATOMIC-based, which would typically use some form of HW offload in the
+ * disk
+ * - COW-based, which uses a COW fork as a staging extent for data updates
+ * before atomically updating extent mappings for the range being written
+ *
+ */
+static noinline ssize_t
+xfs_file_dio_write_atomic(
+ struct xfs_inode *ip,
+ struct kiocb *iocb,
+ struct iov_iter *from)
+{
+ unsigned int iolock = XFS_IOLOCK_SHARED;
+ ssize_t ret, ocount = iov_iter_count(from);
+ const struct iomap_ops *dops;
+
+ /*
+ * HW offload should be faster, so try that first if it is already
+ * known that the write length is not too large.
+ */
+ if (ocount > xfs_inode_buftarg(ip)->bt_bdev_awu_max)
+ dops = &xfs_atomic_write_cow_iomap_ops;
+ else
+ dops = &xfs_direct_write_iomap_ops;
+
+retry:
+ ret = xfs_ilock_iocb_for_write(iocb, &iolock);
+ if (ret)
+ return ret;
+
+ ret = xfs_file_write_checks(iocb, from, &iolock, NULL);
+ if (ret)
+ goto out_unlock;
+
+ /* Demote similar to xfs_file_dio_write_aligned() */
+ if (iolock == XFS_IOLOCK_EXCL) {
+ xfs_ilock_demote(ip, XFS_IOLOCK_EXCL);
+ iolock = XFS_IOLOCK_SHARED;
+ }
+
+ trace_xfs_file_direct_write(iocb, from);
+ ret = iomap_dio_rw(iocb, from, dops, &xfs_dio_write_ops,
+ 0, NULL, 0);
+
+ /*
+ * The retry mechanism is based on the ->iomap_begin method returning
+ * -ENOPROTOOPT, which would be when the REQ_ATOMIC-based write is not
+ * possible. The REQ_ATOMIC-based method typically not be possible if
+ * the write spans multiple extents or the disk blocks are misaligned.
+ */
+ if (ret == -ENOPROTOOPT && dops == &xfs_direct_write_iomap_ops) {
+ xfs_iunlock(ip, iolock);
+ dops = &xfs_atomic_write_cow_iomap_ops;
+ goto retry;
+ }
+
+out_unlock:
+ if (iolock)
+ xfs_iunlock(ip, iolock);
+ return ret;
+}
+
/*
* Handle block unaligned direct I/O writes
*
@@ -843,6 +909,8 @@ xfs_file_dio_write(
return xfs_file_dio_write_unaligned(ip, iocb, from);
if (xfs_is_zoned_inode(ip))
return xfs_file_dio_write_zoned(ip, iocb, from);
+ if (iocb->ki_flags & IOCB_ATOMIC)
+ return xfs_file_dio_write_atomic(ip, iocb, from);
return xfs_file_dio_write_aligned(ip, iocb, from,
&xfs_direct_write_iomap_ops, &xfs_dio_write_ops, NULL);
}
--
2.31.1
next prev parent reply other threads:[~2025-05-04 9:02 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-04 8:59 [PATCH v11 00/16] large atomic writes for xfs John Garry
2025-05-04 8:59 ` [PATCH v11 01/16] fs: add atomic write unit max opt to statx John Garry
2025-05-04 8:59 ` [PATCH v11 02/16] xfs: only call xfs_setsize_buftarg once per buffer target John Garry
2025-05-05 5:40 ` Christoph Hellwig
2025-05-05 10:04 ` John Garry
2025-05-05 10:49 ` Christoph Hellwig
2025-05-05 10:55 ` John Garry
2025-05-05 14:22 ` Darrick J. Wong
2025-05-05 14:48 ` John Garry
2025-05-05 15:27 ` John Garry
2025-05-06 4:22 ` Christoph Hellwig
2025-05-06 6:57 ` John Garry
2025-05-04 8:59 ` [PATCH v11 03/16] xfs: add helpers to compute log item overhead John Garry
2025-05-04 8:59 ` [PATCH v11 04/16] xfs: add helpers to compute transaction reservation for finishing intent items John Garry
2025-05-04 8:59 ` [PATCH v11 05/16] xfs: rename xfs_inode_can_atomicwrite() -> xfs_inode_can_hw_atomic_write() John Garry
2025-05-04 8:59 ` [PATCH v11 06/16] xfs: ignore HW which cannot atomic write a single block John Garry
2025-05-05 5:43 ` Christoph Hellwig
2025-05-05 5:45 ` John Garry
2025-05-05 8:12 ` John Garry
2025-05-05 8:30 ` Christoph Hellwig
2025-05-05 14:24 ` Darrick J. Wong
2025-05-04 8:59 ` [PATCH v11 07/16] xfs: allow block allocator to take an alignment hint John Garry
2025-05-04 8:59 ` [PATCH v11 08/16] xfs: refactor xfs_reflink_end_cow_extent() John Garry
2025-05-04 8:59 ` [PATCH v11 09/16] xfs: refine atomic write size check in xfs_file_write_iter() John Garry
2025-05-04 8:59 ` [PATCH v11 10/16] xfs: add xfs_atomic_write_cow_iomap_begin() John Garry
2025-05-04 8:59 ` [PATCH v11 11/16] xfs: add large atomic writes checks in xfs_direct_write_iomap_begin() John Garry
2025-05-04 8:59 ` [PATCH v11 12/16] xfs: commit CoW-based atomic writes atomically John Garry
2025-05-04 8:59 ` John Garry [this message]
2025-05-04 8:59 ` [PATCH v11 14/16] xfs: add xfs_calc_atomic_write_unit_max() John Garry
2025-05-05 5:25 ` Darrick J. Wong
2025-05-05 6:08 ` John Garry
2025-05-05 8:02 ` John Garry
2025-05-05 14:26 ` Darrick J. Wong
2025-05-04 8:59 ` [PATCH v11 15/16] xfs: update atomic write limits John Garry
2025-05-04 8:59 ` [PATCH v11 16/16] xfs: allow sysadmins to specify a maximum atomic write limit at mount time John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250504085923.1895402-14-john.g.garry@oracle.com \
--to=john.g.garry@oracle.com \
--cc=brauner@kernel.org \
--cc=catherine.hoang@oracle.com \
--cc=cem@kernel.org \
--cc=dchinner@redhat.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=linux-api@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=ojaswin@linux.ibm.com \
--cc=ritesh.list@gmail.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox