From: John Garry <john.g.garry@oracle.com>
To: brauner@kernel.org, djwong@kernel.org, hch@lst.de,
viro@zeniv.linux.org.uk, jack@suse.cz, cem@kernel.org
Cc: linux-fsdevel@vger.kernel.org, dchinner@redhat.com,
linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org,
ojaswin@linux.ibm.com, ritesh.list@gmail.com,
martin.petersen@oracle.com, linux-ext4@vger.kernel.org,
linux-block@vger.kernel.org, catherine.hoang@oracle.com,
linux-api@vger.kernel.org, John Garry <john.g.garry@oracle.com>
Subject: [PATCH v11 12/16] xfs: commit CoW-based atomic writes atomically
Date: Sun, 4 May 2025 08:59:19 +0000 [thread overview]
Message-ID: <20250504085923.1895402-13-john.g.garry@oracle.com> (raw)
In-Reply-To: <20250504085923.1895402-1-john.g.garry@oracle.com>
When completing a CoW-based write, each extent range mapping update is
covered by a separate transaction.
For a CoW-based atomic write, all mappings must be changed at once, so
change to use a single transaction.
Note that there is a limit on the amount of log intent items which can be
fit into a single transaction, but this is being ignored for now since
the count of items for a typical atomic write would be much less than is
typically supported. A typical atomic write would be expected to be 64KB
or less, which means only 16 possible extents unmaps, which is quite
small.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: add tr_atomic_ioend]
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
fs/xfs/libxfs/xfs_log_rlimit.c | 4 +++
fs/xfs/libxfs/xfs_trans_resv.c | 15 +++++++++
fs/xfs/libxfs/xfs_trans_resv.h | 1 +
fs/xfs/xfs_file.c | 5 ++-
fs/xfs/xfs_reflink.c | 56 ++++++++++++++++++++++++++++++++++
fs/xfs/xfs_reflink.h | 2 ++
6 files changed, 82 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_log_rlimit.c b/fs/xfs/libxfs/xfs_log_rlimit.c
index d3bd6a86c8fe..34bba96d30ca 100644
--- a/fs/xfs/libxfs/xfs_log_rlimit.c
+++ b/fs/xfs/libxfs/xfs_log_rlimit.c
@@ -91,6 +91,7 @@ xfs_log_calc_trans_resv_for_minlogblocks(
*/
if (xfs_want_minlogsize_fixes(&mp->m_sb)) {
xfs_trans_resv_calc(mp, resv);
+ resv->tr_atomic_ioend = M_RES(mp)->tr_atomic_ioend;
return;
}
@@ -107,6 +108,9 @@ xfs_log_calc_trans_resv_for_minlogblocks(
xfs_trans_resv_calc(mp, resv);
+ /* Copy the dynamic transaction reservation types from the running fs */
+ resv->tr_atomic_ioend = M_RES(mp)->tr_atomic_ioend;
+
if (xfs_has_reflink(mp)) {
/*
* In the early days of reflink, typical log operation counts
diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index 580d00ae2857..a841432abf83 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -1284,6 +1284,15 @@ xfs_calc_namespace_reservations(
resp->tr_mkdir.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
}
+STATIC void
+xfs_calc_default_atomic_ioend_reservation(
+ struct xfs_mount *mp,
+ struct xfs_trans_resv *resp)
+{
+ /* Pick a default that will scale reasonably for the log size. */
+ resp->tr_atomic_ioend = resp->tr_itruncate;
+}
+
void
xfs_trans_resv_calc(
struct xfs_mount *mp,
@@ -1378,4 +1387,10 @@ xfs_trans_resv_calc(
resp->tr_itruncate.tr_logcount += logcount_adj;
resp->tr_write.tr_logcount += logcount_adj;
resp->tr_qm_dqalloc.tr_logcount += logcount_adj;
+
+ /*
+ * Now that we've finished computing the static reservations, we can
+ * compute the dynamic reservation for atomic writes.
+ */
+ xfs_calc_default_atomic_ioend_reservation(mp, resp);
}
diff --git a/fs/xfs/libxfs/xfs_trans_resv.h b/fs/xfs/libxfs/xfs_trans_resv.h
index d9d0032cbbc5..670045d417a6 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.h
+++ b/fs/xfs/libxfs/xfs_trans_resv.h
@@ -48,6 +48,7 @@ struct xfs_trans_resv {
struct xfs_trans_res tr_qm_dqalloc; /* allocate quota on disk */
struct xfs_trans_res tr_sb; /* modify superblock */
struct xfs_trans_res tr_fsyncts; /* update timestamps on fsync */
+ struct xfs_trans_res tr_atomic_ioend; /* untorn write completion */
};
/* shorthand way of accessing reservation structure */
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index e8acd6ca8f27..32883ec8ca2e 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -576,7 +576,10 @@ xfs_dio_write_end_io(
nofs_flag = memalloc_nofs_save();
if (flags & IOMAP_DIO_COW) {
- error = xfs_reflink_end_cow(ip, offset, size);
+ if (iocb->ki_flags & IOCB_ATOMIC)
+ error = xfs_reflink_end_atomic_cow(ip, offset, size);
+ else
+ error = xfs_reflink_end_cow(ip, offset, size);
if (error)
goto out;
}
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index f5d338916098..218dee76768b 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -984,6 +984,62 @@ xfs_reflink_end_cow(
return error;
}
+/*
+ * Fully remap all of the file's data fork at once, which is the critical part
+ * in achieving atomic behaviour.
+ * The regular CoW end path does not use function as to keep the block
+ * reservation per transaction as low as possible.
+ */
+int
+xfs_reflink_end_atomic_cow(
+ struct xfs_inode *ip,
+ xfs_off_t offset,
+ xfs_off_t count)
+{
+ xfs_fileoff_t offset_fsb;
+ xfs_fileoff_t end_fsb;
+ int error = 0;
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_trans *tp;
+ unsigned int resblks;
+
+ trace_xfs_reflink_end_cow(ip, offset, count);
+
+ offset_fsb = XFS_B_TO_FSBT(mp, offset);
+ end_fsb = XFS_B_TO_FSB(mp, offset + count);
+
+ /*
+ * Each remapping operation could cause a btree split, so in the worst
+ * case that's one for each block.
+ */
+ resblks = (end_fsb - offset_fsb) *
+ XFS_NEXTENTADD_SPACE_RES(mp, 1, XFS_DATA_FORK);
+
+ error = xfs_trans_alloc(mp, &M_RES(mp)->tr_atomic_ioend, resblks, 0,
+ XFS_TRANS_RESERVE, &tp);
+ if (error)
+ return error;
+
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, ip, 0);
+
+ while (end_fsb > offset_fsb && !error) {
+ error = xfs_reflink_end_cow_extent_locked(tp, ip, &offset_fsb,
+ end_fsb);
+ }
+ if (error) {
+ trace_xfs_reflink_end_cow_error(ip, error, _RET_IP_);
+ goto out_cancel;
+ }
+ error = xfs_trans_commit(tp);
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ return error;
+out_cancel:
+ xfs_trans_cancel(tp);
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ return error;
+}
+
/*
* Free all CoW staging blocks that are still referenced by the ondisk refcount
* metadata. The ondisk metadata does not track which inode created the
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 379619f24247..412e9b6f2082 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -45,6 +45,8 @@ extern int xfs_reflink_cancel_cow_range(struct xfs_inode *ip, xfs_off_t offset,
xfs_off_t count, bool cancel_real);
extern int xfs_reflink_end_cow(struct xfs_inode *ip, xfs_off_t offset,
xfs_off_t count);
+int xfs_reflink_end_atomic_cow(struct xfs_inode *ip, xfs_off_t offset,
+ xfs_off_t count);
extern int xfs_reflink_recover_cow(struct xfs_mount *mp);
extern loff_t xfs_reflink_remap_range(struct file *file_in, loff_t pos_in,
struct file *file_out, loff_t pos_out, loff_t len,
--
2.31.1
next prev parent reply other threads:[~2025-05-04 9:00 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-04 8:59 [PATCH v11 00/16] large atomic writes for xfs John Garry
2025-05-04 8:59 ` [PATCH v11 01/16] fs: add atomic write unit max opt to statx John Garry
2025-05-04 8:59 ` [PATCH v11 02/16] xfs: only call xfs_setsize_buftarg once per buffer target John Garry
2025-05-05 5:40 ` Christoph Hellwig
2025-05-05 10:04 ` John Garry
2025-05-05 10:49 ` Christoph Hellwig
2025-05-05 10:55 ` John Garry
2025-05-05 14:22 ` Darrick J. Wong
2025-05-05 14:48 ` John Garry
2025-05-05 15:27 ` John Garry
2025-05-06 4:22 ` Christoph Hellwig
2025-05-06 6:57 ` John Garry
2025-05-04 8:59 ` [PATCH v11 03/16] xfs: add helpers to compute log item overhead John Garry
2025-05-04 8:59 ` [PATCH v11 04/16] xfs: add helpers to compute transaction reservation for finishing intent items John Garry
2025-05-04 8:59 ` [PATCH v11 05/16] xfs: rename xfs_inode_can_atomicwrite() -> xfs_inode_can_hw_atomic_write() John Garry
2025-05-04 8:59 ` [PATCH v11 06/16] xfs: ignore HW which cannot atomic write a single block John Garry
2025-05-05 5:43 ` Christoph Hellwig
2025-05-05 5:45 ` John Garry
2025-05-05 8:12 ` John Garry
2025-05-05 8:30 ` Christoph Hellwig
2025-05-05 14:24 ` Darrick J. Wong
2025-05-04 8:59 ` [PATCH v11 07/16] xfs: allow block allocator to take an alignment hint John Garry
2025-05-04 8:59 ` [PATCH v11 08/16] xfs: refactor xfs_reflink_end_cow_extent() John Garry
2025-05-04 8:59 ` [PATCH v11 09/16] xfs: refine atomic write size check in xfs_file_write_iter() John Garry
2025-05-04 8:59 ` [PATCH v11 10/16] xfs: add xfs_atomic_write_cow_iomap_begin() John Garry
2025-05-04 8:59 ` [PATCH v11 11/16] xfs: add large atomic writes checks in xfs_direct_write_iomap_begin() John Garry
2025-05-04 8:59 ` John Garry [this message]
2025-05-04 8:59 ` [PATCH v11 13/16] xfs: add xfs_file_dio_write_atomic() John Garry
2025-05-04 8:59 ` [PATCH v11 14/16] xfs: add xfs_calc_atomic_write_unit_max() John Garry
2025-05-05 5:25 ` Darrick J. Wong
2025-05-05 6:08 ` John Garry
2025-05-05 8:02 ` John Garry
2025-05-05 14:26 ` Darrick J. Wong
2025-05-04 8:59 ` [PATCH v11 15/16] xfs: update atomic write limits John Garry
2025-05-04 8:59 ` [PATCH v11 16/16] xfs: allow sysadmins to specify a maximum atomic write limit at mount time John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250504085923.1895402-13-john.g.garry@oracle.com \
--to=john.g.garry@oracle.com \
--cc=brauner@kernel.org \
--cc=catherine.hoang@oracle.com \
--cc=cem@kernel.org \
--cc=dchinner@redhat.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=linux-api@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=ojaswin@linux.ibm.com \
--cc=ritesh.list@gmail.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox