public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: John Garry <john.g.garry@oracle.com>
To: brauner@kernel.org, djwong@kernel.org, hch@lst.de,
	viro@zeniv.linux.org.uk, jack@suse.cz, cem@kernel.org
Cc: linux-fsdevel@vger.kernel.org, dchinner@redhat.com,
	linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	ojaswin@linux.ibm.com, ritesh.list@gmail.com,
	martin.petersen@oracle.com, linux-ext4@vger.kernel.org,
	linux-block@vger.kernel.org, catherine.hoang@oracle.com,
	linux-api@vger.kernel.org, John Garry <john.g.garry@oracle.com>
Subject: [PATCH v11 11/16] xfs: add large atomic writes checks in xfs_direct_write_iomap_begin()
Date: Sun,  4 May 2025 08:59:18 +0000	[thread overview]
Message-ID: <20250504085923.1895402-12-john.g.garry@oracle.com> (raw)
In-Reply-To: <20250504085923.1895402-1-john.g.garry@oracle.com>

For when large atomic writes (> 1x FS block) are supported, there will be
various occasions when HW offload may not be possible.

Such instances include:
- unaligned extent mapping wrt write length
- extent mappings which do not cover the full write, e.g. the write spans
  sparse or mixed-mapping extents
- the write length is greater than HW offload can support
- no hardware support at all

In those cases, we need to fallback to the CoW-based atomic write mode. For
this, report special code -ENOPROTOOPT to inform the caller that HW
offload-based method is not possible.

In addition to the occasions mentioned, if the write covers an unallocated
range, we again judge that we need to rely on the CoW-based method when we
would need to allocate anything more than 1x block. This is because if we
allocate less blocks that is required for the write, then again HW
offload-based method would not be possible. So we are taking a pessimistic
approach to writes covering unallocated space.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: various cleanups]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/xfs_iomap.c | 62 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 60 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 166fba2ff1ef..ff05e6b1b0bb 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -798,6 +798,38 @@ imap_spans_range(
 	return true;
 }
 
+static bool
+xfs_bmap_hw_atomic_write_possible(
+	struct xfs_inode	*ip,
+	struct xfs_bmbt_irec	*imap,
+	xfs_fileoff_t		offset_fsb,
+	xfs_fileoff_t		end_fsb)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	xfs_fsize_t		len = XFS_FSB_TO_B(mp, end_fsb - offset_fsb);
+
+	/*
+	 * atomic writes are required to be naturally aligned for disk blocks,
+	 * which ensures that we adhere to block layer rules that we won't
+	 * straddle any boundary or violate write alignment requirement.
+	 */
+	if (!IS_ALIGNED(imap->br_startblock, imap->br_blockcount))
+		return false;
+
+	/*
+	 * Spanning multiple extents would mean that multiple BIOs would be
+	 * issued, and so would lose atomicity required for REQ_ATOMIC-based
+	 * atomics.
+	 */
+	if (!imap_spans_range(imap, offset_fsb, end_fsb))
+		return false;
+
+	/*
+	 * The ->iomap_begin caller should ensure this, but check anyway.
+	 */
+	return len <= xfs_inode_buftarg(ip)->bt_bdev_awu_max;
+}
+
 static int
 xfs_direct_write_iomap_begin(
 	struct inode		*inode,
@@ -812,9 +844,11 @@ xfs_direct_write_iomap_begin(
 	struct xfs_bmbt_irec	imap, cmap;
 	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
 	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, length);
+	xfs_fileoff_t		orig_end_fsb = end_fsb;
 	int			nimaps = 1, error = 0;
 	bool			shared = false;
 	u16			iomap_flags = 0;
+	bool			needs_alloc;
 	unsigned int		lockmode;
 	u64			seq;
 
@@ -875,13 +909,37 @@ xfs_direct_write_iomap_begin(
 				(flags & IOMAP_DIRECT) || IS_DAX(inode));
 		if (error)
 			goto out_unlock;
-		if (shared)
+		if (shared) {
+			if ((flags & IOMAP_ATOMIC) &&
+			    !xfs_bmap_hw_atomic_write_possible(ip, &cmap,
+					offset_fsb, end_fsb)) {
+				error = -ENOPROTOOPT;
+				goto out_unlock;
+			}
 			goto out_found_cow;
+		}
 		end_fsb = imap.br_startoff + imap.br_blockcount;
 		length = XFS_FSB_TO_B(mp, end_fsb) - offset;
 	}
 
-	if (imap_needs_alloc(inode, flags, &imap, nimaps))
+	needs_alloc = imap_needs_alloc(inode, flags, &imap, nimaps);
+
+	if (flags & IOMAP_ATOMIC) {
+		error = -ENOPROTOOPT;
+		/*
+		 * If we allocate less than what is required for the write
+		 * then we may end up with multiple extents, which means that
+		 * REQ_ATOMIC-based cannot be used, so avoid this possibility.
+		 */
+		if (needs_alloc && orig_end_fsb - offset_fsb > 1)
+			goto out_unlock;
+
+		if (!xfs_bmap_hw_atomic_write_possible(ip, &imap, offset_fsb,
+				orig_end_fsb))
+			goto out_unlock;
+	}
+
+	if (needs_alloc)
 		goto allocate_blocks;
 
 	/*
-- 
2.31.1


  parent reply	other threads:[~2025-05-04  9:00 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-04  8:59 [PATCH v11 00/16] large atomic writes for xfs John Garry
2025-05-04  8:59 ` [PATCH v11 01/16] fs: add atomic write unit max opt to statx John Garry
2025-05-04  8:59 ` [PATCH v11 02/16] xfs: only call xfs_setsize_buftarg once per buffer target John Garry
2025-05-05  5:40   ` Christoph Hellwig
2025-05-05 10:04     ` John Garry
2025-05-05 10:49       ` Christoph Hellwig
2025-05-05 10:55         ` John Garry
2025-05-05 14:22           ` Darrick J. Wong
2025-05-05 14:48             ` John Garry
2025-05-05 15:27               ` John Garry
2025-05-06  4:22                 ` Christoph Hellwig
2025-05-06  6:57                   ` John Garry
2025-05-04  8:59 ` [PATCH v11 03/16] xfs: add helpers to compute log item overhead John Garry
2025-05-04  8:59 ` [PATCH v11 04/16] xfs: add helpers to compute transaction reservation for finishing intent items John Garry
2025-05-04  8:59 ` [PATCH v11 05/16] xfs: rename xfs_inode_can_atomicwrite() -> xfs_inode_can_hw_atomic_write() John Garry
2025-05-04  8:59 ` [PATCH v11 06/16] xfs: ignore HW which cannot atomic write a single block John Garry
2025-05-05  5:43   ` Christoph Hellwig
2025-05-05  5:45     ` John Garry
2025-05-05  8:12       ` John Garry
2025-05-05  8:30         ` Christoph Hellwig
2025-05-05 14:24           ` Darrick J. Wong
2025-05-04  8:59 ` [PATCH v11 07/16] xfs: allow block allocator to take an alignment hint John Garry
2025-05-04  8:59 ` [PATCH v11 08/16] xfs: refactor xfs_reflink_end_cow_extent() John Garry
2025-05-04  8:59 ` [PATCH v11 09/16] xfs: refine atomic write size check in xfs_file_write_iter() John Garry
2025-05-04  8:59 ` [PATCH v11 10/16] xfs: add xfs_atomic_write_cow_iomap_begin() John Garry
2025-05-04  8:59 ` John Garry [this message]
2025-05-04  8:59 ` [PATCH v11 12/16] xfs: commit CoW-based atomic writes atomically John Garry
2025-05-04  8:59 ` [PATCH v11 13/16] xfs: add xfs_file_dio_write_atomic() John Garry
2025-05-04  8:59 ` [PATCH v11 14/16] xfs: add xfs_calc_atomic_write_unit_max() John Garry
2025-05-05  5:25   ` Darrick J. Wong
2025-05-05  6:08     ` John Garry
2025-05-05  8:02       ` John Garry
2025-05-05 14:26         ` Darrick J. Wong
2025-05-04  8:59 ` [PATCH v11 15/16] xfs: update atomic write limits John Garry
2025-05-04  8:59 ` [PATCH v11 16/16] xfs: allow sysadmins to specify a maximum atomic write limit at mount time John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250504085923.1895402-12-john.g.garry@oracle.com \
    --to=john.g.garry@oracle.com \
    --cc=brauner@kernel.org \
    --cc=catherine.hoang@oracle.com \
    --cc=cem@kernel.org \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ojaswin@linux.ibm.com \
    --cc=ritesh.list@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox