From: John Garry <john.g.garry@oracle.com>
To: brauner@kernel.org, djwong@kernel.org, hch@lst.de,
viro@zeniv.linux.org.uk, jack@suse.cz, cem@kernel.org
Cc: linux-fsdevel@vger.kernel.org, dchinner@redhat.com,
linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org,
ojaswin@linux.ibm.com, ritesh.list@gmail.com,
martin.petersen@oracle.com, linux-ext4@vger.kernel.org,
linux-block@vger.kernel.org, catherine.hoang@oracle.com,
linux-api@vger.kernel.org, John Garry <john.g.garry@oracle.com>
Subject: [PATCH v9 01/15] fs: add atomic write unit max opt to statx
Date: Fri, 25 Apr 2025 16:44:50 +0000 [thread overview]
Message-ID: <20250425164504.3263637-2-john.g.garry@oracle.com> (raw)
In-Reply-To: <20250425164504.3263637-1-john.g.garry@oracle.com>
XFS will be able to support large atomic writes (atomic write > 1x block)
in future. This will be achieved by using different operating methods,
depending on the size of the write.
Specifically a new method of operation based in FS atomic extent remapping
will be supported in addition to the current HW offload-based method.
The FS method will generally be appreciably slower performing than the
HW-offload method. However the FS method will be typically able to
contribute to achieving a larger atomic write unit max limit.
XFS will support a hybrid mode, where HW offload method will be used when
possible, i.e. HW offload is used when the length of the write is
supported, and for other times FS-based atomic writes will be used.
As such, there is an atomic write length at which the user may experience
appreciably slower performance.
Advertise this limit in a new statx field, stx_atomic_write_unit_max_opt.
When zero, it means that there is no such performance boundary.
Masks STATX{_ATTR}_WRITE_ATOMIC can be used to get this new field. This is
ok for older kernels which don't support this new field, as they would
report 0 in this field (from zeroing in cp_statx()) already. Furthermore
those older kernels don't support large atomic writes - apart from block
fops, but there would be consistent performance there for atomic writes
in range [unit min, unit max].
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
block/bdev.c | 3 ++-
fs/ext4/inode.c | 2 +-
fs/stat.c | 6 +++++-
fs/xfs/xfs_iops.c | 2 +-
include/linux/fs.h | 3 ++-
include/linux/stat.h | 1 +
include/uapi/linux/stat.h | 8 ++++++--
7 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/block/bdev.c b/block/bdev.c
index 4844d1e27b6f..b4afc1763e8e 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -1301,7 +1301,8 @@ void bdev_statx(struct path *path, struct kstat *stat,
generic_fill_statx_atomic_writes(stat,
queue_atomic_write_unit_min_bytes(bd_queue),
- queue_atomic_write_unit_max_bytes(bd_queue));
+ queue_atomic_write_unit_max_bytes(bd_queue),
+ 0);
}
stat->blksize = bdev_io_min(bdev);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 94c7d2d828a6..cdf01e60fa6d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5692,7 +5692,7 @@ int ext4_getattr(struct mnt_idmap *idmap, const struct path *path,
awu_max = sbi->s_awu_max;
}
- generic_fill_statx_atomic_writes(stat, awu_min, awu_max);
+ generic_fill_statx_atomic_writes(stat, awu_min, awu_max, 0);
}
flags = ei->i_flags & EXT4_FL_USER_VISIBLE;
diff --git a/fs/stat.c b/fs/stat.c
index f13308bfdc98..c41855f62d22 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -136,13 +136,15 @@ EXPORT_SYMBOL(generic_fill_statx_attr);
* @stat: Where to fill in the attribute flags
* @unit_min: Minimum supported atomic write length in bytes
* @unit_max: Maximum supported atomic write length in bytes
+ * @unit_max_opt: Optimised maximum supported atomic write length in bytes
*
* Fill in the STATX{_ATTR}_WRITE_ATOMIC flags in the kstat structure from
* atomic write unit_min and unit_max values.
*/
void generic_fill_statx_atomic_writes(struct kstat *stat,
unsigned int unit_min,
- unsigned int unit_max)
+ unsigned int unit_max,
+ unsigned int unit_max_opt)
{
/* Confirm that the request type is known */
stat->result_mask |= STATX_WRITE_ATOMIC;
@@ -153,6 +155,7 @@ void generic_fill_statx_atomic_writes(struct kstat *stat,
if (unit_min) {
stat->atomic_write_unit_min = unit_min;
stat->atomic_write_unit_max = unit_max;
+ stat->atomic_write_unit_max_opt = unit_max_opt;
/* Initially only allow 1x segment */
stat->atomic_write_segments_max = 1;
@@ -732,6 +735,7 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer)
tmp.stx_atomic_write_unit_min = stat->atomic_write_unit_min;
tmp.stx_atomic_write_unit_max = stat->atomic_write_unit_max;
tmp.stx_atomic_write_segments_max = stat->atomic_write_segments_max;
+ tmp.stx_atomic_write_unit_max_opt = stat->atomic_write_unit_max_opt;
return copy_to_user(buffer, &tmp, sizeof(tmp)) ? -EFAULT : 0;
}
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 756bd3ca8e00..f0e5d83195df 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -610,7 +610,7 @@ xfs_report_atomic_write(
if (xfs_inode_can_atomicwrite(ip))
unit_min = unit_max = ip->i_mount->m_sb.sb_blocksize;
- generic_fill_statx_atomic_writes(stat, unit_min, unit_max);
+ generic_fill_statx_atomic_writes(stat, unit_min, unit_max, 0);
}
STATIC int
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 016b0fe1536e..7b19d8f99aff 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3475,7 +3475,8 @@ void generic_fillattr(struct mnt_idmap *, u32, struct inode *, struct kstat *);
void generic_fill_statx_attr(struct inode *inode, struct kstat *stat);
void generic_fill_statx_atomic_writes(struct kstat *stat,
unsigned int unit_min,
- unsigned int unit_max);
+ unsigned int unit_max,
+ unsigned int unit_max_opt);
extern int vfs_getattr_nosec(const struct path *, struct kstat *, u32, unsigned int);
extern int vfs_getattr(const struct path *, struct kstat *, u32, unsigned int);
void __inode_add_bytes(struct inode *inode, loff_t bytes);
diff --git a/include/linux/stat.h b/include/linux/stat.h
index be7496a6a0dd..e3d00e7bb26d 100644
--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -57,6 +57,7 @@ struct kstat {
u32 dio_read_offset_align;
u32 atomic_write_unit_min;
u32 atomic_write_unit_max;
+ u32 atomic_write_unit_max_opt;
u32 atomic_write_segments_max;
};
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index f78ee3670dd5..1686861aae20 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -182,8 +182,12 @@ struct statx {
/* File offset alignment for direct I/O reads */
__u32 stx_dio_read_offset_align;
- /* 0xb8 */
- __u64 __spare3[9]; /* Spare space for future expansion */
+ /* Optimised max atomic write unit in bytes */
+ __u32 stx_atomic_write_unit_max_opt;
+ __u32 __spare2[1];
+
+ /* 0xc0 */
+ __u64 __spare3[8]; /* Spare space for future expansion */
/* 0x100 */
};
--
2.31.1
next prev parent reply other threads:[~2025-04-25 16:45 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-25 16:44 [PATCH v9 00/15] large atomic writes for xfs John Garry
2025-04-25 16:44 ` John Garry [this message]
2025-04-25 16:44 ` [PATCH v9 02/15] xfs: add helpers to compute log item overhead John Garry
2025-04-25 16:44 ` [PATCH v9 03/15] xfs: add helpers to compute transaction reservation for finishing intent items John Garry
2025-04-25 16:44 ` [PATCH v9 04/15] xfs: rename xfs_inode_can_atomicwrite() -> xfs_inode_can_hw_atomic_write() John Garry
2025-04-25 16:44 ` [PATCH v9 05/15] xfs: ignore HW which cannot atomic write a single block John Garry
2025-04-29 12:21 ` Christoph Hellwig
2025-04-29 14:44 ` Darrick J. Wong
2025-04-30 12:59 ` Christoph Hellwig
2025-05-01 16:22 ` Darrick J. Wong
2025-05-01 19:53 ` Darrick J. Wong
2025-04-30 5:18 ` [PATCH v9.1 " Darrick J. Wong
2025-04-25 16:44 ` [PATCH v9 06/15] xfs: allow block allocator to take an alignment hint John Garry
2025-04-25 16:44 ` [PATCH v9 07/15] xfs: refactor xfs_reflink_end_cow_extent() John Garry
2025-04-25 16:44 ` [PATCH v9 08/15] xfs: refine atomic write size check in xfs_file_write_iter() John Garry
2025-04-25 16:44 ` [PATCH v9 09/15] xfs: add xfs_atomic_write_cow_iomap_begin() John Garry
2025-04-25 16:44 ` [PATCH v9 10/15] xfs: add large atomic writes checks in xfs_direct_write_iomap_begin() John Garry
2025-04-25 16:45 ` [PATCH v9 11/15] xfs: commit CoW-based atomic writes atomically John Garry
2025-04-25 16:45 ` [PATCH v9 12/15] xfs: add xfs_file_dio_write_atomic() John Garry
2025-04-25 16:45 ` [PATCH v9 13/15] xfs: add xfs_compute_atomic_write_unit_max() John Garry
2025-04-30 7:52 ` John Garry
2025-05-01 4:30 ` Darrick J. Wong
2025-05-01 5:00 ` John Garry
2025-05-01 16:23 ` Darrick J. Wong
2025-04-25 16:45 ` [PATCH v9 14/15] xfs: update atomic write limits John Garry
2025-04-25 16:45 ` [PATCH v9 15/15] xfs: allow sysadmins to specify a maximum atomic write limit at mount time John Garry
2025-04-29 12:22 ` Christoph Hellwig
2025-04-29 14:38 ` Darrick J. Wong
2025-04-30 14:14 ` [PATCH v9 00/15] large atomic writes for xfs John Garry
2025-05-01 4:31 ` Darrick J. Wong
2025-05-01 5:04 ` John Garry
2025-05-01 13:44 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250425164504.3263637-2-john.g.garry@oracle.com \
--to=john.g.garry@oracle.com \
--cc=brauner@kernel.org \
--cc=catherine.hoang@oracle.com \
--cc=cem@kernel.org \
--cc=dchinner@redhat.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=linux-api@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=ojaswin@linux.ibm.com \
--cc=ritesh.list@gmail.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox