* [PATCH 1/5] fs: reformat the statx definition
2025-01-08 8:55 add STATX_DIO_READ_ALIGN v2 Christoph Hellwig
@ 2025-01-08 8:55 ` Christoph Hellwig
2025-01-08 10:08 ` John Garry
2025-01-08 8:55 ` [PATCH 2/5] fs: add STATX_DIO_READ_ALIGN Christoph Hellwig
` (4 subsequent siblings)
5 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2025-01-08 8:55 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner
Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs
The comments after the declaration are becoming rather unreadable with
long enough comments. Move them into lines of their own.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
include/uapi/linux/stat.h | 95 +++++++++++++++++++++++++++++----------
1 file changed, 72 insertions(+), 23 deletions(-)
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index 887a25286441..8b35d7d511a2 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -98,43 +98,92 @@ struct statx_timestamp {
*/
struct statx {
/* 0x00 */
- __u32 stx_mask; /* What results were written [uncond] */
- __u32 stx_blksize; /* Preferred general I/O size [uncond] */
- __u64 stx_attributes; /* Flags conveying information about the file [uncond] */
+ /* What results were written [uncond] */
+ __u32 stx_mask;
+
+ /* Preferred general I/O size [uncond] */
+ __u32 stx_blksize;
+
+ /* Flags conveying information about the file [uncond] */
+ __u64 stx_attributes;
+
/* 0x10 */
- __u32 stx_nlink; /* Number of hard links */
- __u32 stx_uid; /* User ID of owner */
- __u32 stx_gid; /* Group ID of owner */
- __u16 stx_mode; /* File mode */
+ /* Number of hard links */
+ __u32 stx_nlink;
+
+ /* User ID of owner */
+ __u32 stx_uid;
+
+ /* Group ID of owner */
+ __u32 stx_gid;
+
+ /* File mode */
+ __u16 stx_mode;
__u16 __spare0[1];
+
/* 0x20 */
- __u64 stx_ino; /* Inode number */
- __u64 stx_size; /* File size */
- __u64 stx_blocks; /* Number of 512-byte blocks allocated */
- __u64 stx_attributes_mask; /* Mask to show what's supported in stx_attributes */
+ /* Inode number */
+ __u64 stx_ino;
+
+ /* File size */
+ __u64 stx_size;
+
+ /* Number of 512-byte blocks allocated */
+ __u64 stx_blocks;
+
+ /* Mask to show what's supported in stx_attributes */
+ __u64 stx_attributes_mask;
+
/* 0x40 */
- struct statx_timestamp stx_atime; /* Last access time */
- struct statx_timestamp stx_btime; /* File creation time */
- struct statx_timestamp stx_ctime; /* Last attribute change time */
- struct statx_timestamp stx_mtime; /* Last data modification time */
+ /* Last access time */
+ struct statx_timestamp stx_atime;
+
+ /* File creation time */
+ struct statx_timestamp stx_btime;
+
+ /* Last attribute change time */
+ struct statx_timestamp stx_ctime;
+
+ /* Last data modification time */
+ struct statx_timestamp stx_mtime;
+
/* 0x80 */
- __u32 stx_rdev_major; /* Device ID of special file [if bdev/cdev] */
+ /* Device ID of special file [if bdev/cdev] */
+ __u32 stx_rdev_major;
__u32 stx_rdev_minor;
- __u32 stx_dev_major; /* ID of device containing file [uncond] */
+
+ /* ID of device containing file [uncond] */
+ __u32 stx_dev_major;
__u32 stx_dev_minor;
+
/* 0x90 */
__u64 stx_mnt_id;
- __u32 stx_dio_mem_align; /* Memory buffer alignment for direct I/O */
- __u32 stx_dio_offset_align; /* File offset alignment for direct I/O */
+
+ /* Memory buffer alignment for direct I/O */
+ __u32 stx_dio_mem_align;
+
+ /* File offset alignment for direct I/O */
+ __u32 stx_dio_offset_align;
+
/* 0xa0 */
- __u64 stx_subvol; /* Subvolume identifier */
- __u32 stx_atomic_write_unit_min; /* Min atomic write unit in bytes */
- __u32 stx_atomic_write_unit_max; /* Max atomic write unit in bytes */
+ /* Subvolume identifier */
+ __u64 stx_subvol;
+
+ /* Min atomic write unit in bytes */
+ __u32 stx_atomic_write_unit_min;
+
+ /* Max atomic write unit in bytes */
+ __u32 stx_atomic_write_unit_max;
+
/* 0xb0 */
- __u32 stx_atomic_write_segments_max; /* Max atomic write segment count */
+ /* Max atomic write segment count */
+ __u32 stx_atomic_write_segments_max;
+
__u32 __spare1[1];
+
/* 0xb8 */
__u64 __spare3[9]; /* Spare space for future expansion */
+
/* 0x100 */
};
--
2.45.2
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH 1/5] fs: reformat the statx definition
2025-01-08 8:55 ` [PATCH 1/5] fs: reformat the statx definition Christoph Hellwig
@ 2025-01-08 10:08 ` John Garry
0 siblings, 0 replies; 19+ messages in thread
From: John Garry @ 2025-01-08 10:08 UTC (permalink / raw)
To: Christoph Hellwig, Alexander Viro, Christian Brauner
Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs
On 08/01/2025 08:55, Christoph Hellwig wrote:
> The comments after the declaration are becoming rather unreadable with
> long enough comments. Move them into lines of their own.
>
> Signed-off-by: Christoph Hellwig<hch@lst.de>
> Reviewed-by: Jan Kara<jack@suse.cz>
> Reviewed-by: Darrick J. Wong<djwong@kernel.org>
> ---
FWIW:
Reviewed-by: John Garry <john.g.garry@oracle.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 2/5] fs: add STATX_DIO_READ_ALIGN
2025-01-08 8:55 add STATX_DIO_READ_ALIGN v2 Christoph Hellwig
2025-01-08 8:55 ` [PATCH 1/5] fs: reformat the statx definition Christoph Hellwig
@ 2025-01-08 8:55 ` Christoph Hellwig
2025-01-08 10:09 ` John Garry
2025-01-08 8:55 ` [PATCH 3/5] xfs: cleanup xfs_vn_getattr Christoph Hellwig
` (3 subsequent siblings)
5 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2025-01-08 8:55 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner
Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs
Add a separate dio read align field, as many out of place write
file systems can easily do reads aligned to the device sector size,
but require bigger alignment for writes.
This is usually papered over by falling back to buffered I/O for smaller
writes and doing read-modify-write cycles, but performance for this
sucks, so applications benefit from knowing the actual write alignment.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
fs/stat.c | 1 +
include/linux/stat.h | 1 +
include/uapi/linux/stat.h | 4 +++-
3 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/stat.c b/fs/stat.c
index 0870e969a8a0..2c0e111a098a 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -725,6 +725,7 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer)
tmp.stx_mnt_id = stat->mnt_id;
tmp.stx_dio_mem_align = stat->dio_mem_align;
tmp.stx_dio_offset_align = stat->dio_offset_align;
+ tmp.stx_dio_read_offset_align = stat->dio_read_offset_align;
tmp.stx_subvol = stat->subvol;
tmp.stx_atomic_write_unit_min = stat->atomic_write_unit_min;
tmp.stx_atomic_write_unit_max = stat->atomic_write_unit_max;
diff --git a/include/linux/stat.h b/include/linux/stat.h
index 3d900c86981c..9d8382e23a9c 100644
--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -52,6 +52,7 @@ struct kstat {
u64 mnt_id;
u32 dio_mem_align;
u32 dio_offset_align;
+ u32 dio_read_offset_align;
u64 change_cookie;
u64 subvol;
u32 atomic_write_unit_min;
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index 8b35d7d511a2..f78ee3670dd5 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -179,7 +179,8 @@ struct statx {
/* Max atomic write segment count */
__u32 stx_atomic_write_segments_max;
- __u32 __spare1[1];
+ /* File offset alignment for direct I/O reads */
+ __u32 stx_dio_read_offset_align;
/* 0xb8 */
__u64 __spare3[9]; /* Spare space for future expansion */
@@ -213,6 +214,7 @@ struct statx {
#define STATX_MNT_ID_UNIQUE 0x00004000U /* Want/got extended stx_mount_id */
#define STATX_SUBVOL 0x00008000U /* Want/got stx_subvol */
#define STATX_WRITE_ATOMIC 0x00010000U /* Want/got atomic_write_* fields */
+#define STATX_DIO_READ_ALIGN 0x00020000U /* Want/got dio read alignment info */
#define STATX__RESERVED 0x80000000U /* Reserved for future struct statx expansion */
--
2.45.2
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH 2/5] fs: add STATX_DIO_READ_ALIGN
2025-01-08 8:55 ` [PATCH 2/5] fs: add STATX_DIO_READ_ALIGN Christoph Hellwig
@ 2025-01-08 10:09 ` John Garry
0 siblings, 0 replies; 19+ messages in thread
From: John Garry @ 2025-01-08 10:09 UTC (permalink / raw)
To: Christoph Hellwig, Alexander Viro, Christian Brauner
Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs
On 08/01/2025 08:55, Christoph Hellwig wrote:
> Add a separate dio read align field, as many out of place write
> file systems can easily do reads aligned to the device sector size,
> but require bigger alignment for writes.
>
> This is usually papered over by falling back to buffered I/O for smaller
> writes and doing read-modify-write cycles, but performance for this
> sucks, so applications benefit from knowing the actual write alignment.
>
> Signed-off-by: Christoph Hellwig<hch@lst.de>
> Reviewed-by: Jan Kara<jack@suse.cz>
> Reviewed-by: Darrick J. Wong<djwong@kernel.org>
FWIW:
Reviewed-by: John Garry <john.g.garry@oracle.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 3/5] xfs: cleanup xfs_vn_getattr
2025-01-08 8:55 add STATX_DIO_READ_ALIGN v2 Christoph Hellwig
2025-01-08 8:55 ` [PATCH 1/5] fs: reformat the statx definition Christoph Hellwig
2025-01-08 8:55 ` [PATCH 2/5] fs: add STATX_DIO_READ_ALIGN Christoph Hellwig
@ 2025-01-08 8:55 ` Christoph Hellwig
2025-01-08 8:55 ` [PATCH 4/5] xfs: report the correct read/write dio alignment for reflinked inodes Christoph Hellwig
` (2 subsequent siblings)
5 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-01-08 8:55 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner
Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs,
John Garry
Split the two bits of optional statx reporting into their own helpers
so that they are self-contained instead of deeply indented in the main
getattr handler.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_iops.c | 47 +++++++++++++++++++++++------------------------
1 file changed, 23 insertions(+), 24 deletions(-)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 207e0dadffc3..6b0228a21617 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -573,17 +573,28 @@ xfs_stat_blksize(
}
static void
-xfs_get_atomic_write_attr(
+xfs_report_dioalign(
struct xfs_inode *ip,
- unsigned int *unit_min,
- unsigned int *unit_max)
+ struct kstat *stat)
{
- if (!xfs_inode_can_atomicwrite(ip)) {
- *unit_min = *unit_max = 0;
- return;
- }
+ struct xfs_buftarg *target = xfs_inode_buftarg(ip);
+ struct block_device *bdev = target->bt_bdev;
- *unit_min = *unit_max = ip->i_mount->m_sb.sb_blocksize;
+ stat->result_mask |= STATX_DIOALIGN;
+ stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
+ stat->dio_offset_align = bdev_logical_block_size(bdev);
+}
+
+static void
+xfs_report_atomic_write(
+ struct xfs_inode *ip,
+ struct kstat *stat)
+{
+ unsigned int unit_min = 0, unit_max = 0;
+
+ if (xfs_inode_can_atomicwrite(ip))
+ unit_min = unit_max = ip->i_mount->m_sb.sb_blocksize;
+ generic_fill_statx_atomic_writes(stat, unit_min, unit_max);
}
STATIC int
@@ -647,22 +658,10 @@ xfs_vn_getattr(
stat->rdev = inode->i_rdev;
break;
case S_IFREG:
- if (request_mask & STATX_DIOALIGN) {
- struct xfs_buftarg *target = xfs_inode_buftarg(ip);
- struct block_device *bdev = target->bt_bdev;
-
- stat->result_mask |= STATX_DIOALIGN;
- stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
- stat->dio_offset_align = bdev_logical_block_size(bdev);
- }
- if (request_mask & STATX_WRITE_ATOMIC) {
- unsigned int unit_min, unit_max;
-
- xfs_get_atomic_write_attr(ip, &unit_min,
- &unit_max);
- generic_fill_statx_atomic_writes(stat,
- unit_min, unit_max);
- }
+ if (request_mask & STATX_DIOALIGN)
+ xfs_report_dioalign(ip, stat);
+ if (request_mask & STATX_WRITE_ATOMIC)
+ xfs_report_atomic_write(ip, stat);
fallthrough;
default:
stat->blksize = xfs_stat_blksize(ip);
--
2.45.2
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH 4/5] xfs: report the correct read/write dio alignment for reflinked inodes
2025-01-08 8:55 add STATX_DIO_READ_ALIGN v2 Christoph Hellwig
` (2 preceding siblings ...)
2025-01-08 8:55 ` [PATCH 3/5] xfs: cleanup xfs_vn_getattr Christoph Hellwig
@ 2025-01-08 8:55 ` Christoph Hellwig
2025-01-08 10:10 ` John Garry
` (2 more replies)
2025-01-08 8:55 ` [PATCH 5/5] xfs: report larger dio alignment for COW inodes Christoph Hellwig
2025-01-08 8:59 ` [PATCH] statx.2: document STATX_DIO_READ_ALIGN Christoph Hellwig
5 siblings, 3 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-01-08 8:55 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner
Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs
For I/O to reflinked blocks we always need to write an entire new
file system block, and the code enforces the file system block alignment
for the entire file if it has any reflinked blocks.
Use the new STATX_DIO_READ_ALIGN flag to report the asymmetric read
vs write alignments for reflinked files.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_iops.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 6b0228a21617..40289fe6f5b2 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -580,9 +580,24 @@ xfs_report_dioalign(
struct xfs_buftarg *target = xfs_inode_buftarg(ip);
struct block_device *bdev = target->bt_bdev;
- stat->result_mask |= STATX_DIOALIGN;
+ stat->result_mask |= STATX_DIOALIGN | STATX_DIO_READ_ALIGN;
stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
- stat->dio_offset_align = bdev_logical_block_size(bdev);
+
+ /*
+ * For COW inodes, we can only perform out of place writes of entire
+ * allocation units (blocks or RT extents).
+ * For writes smaller than the allocation unit, we must fall back to
+ * buffered I/O to perform read-modify-write cycles. At best this is
+ * highly inefficient; at worst it leads to page cache invalidation
+ * races. Tell applications to avoid this by reporting the larger write
+ * alignment in dio_offset_align, and the smaller read alignment in
+ * dio_read_offset_align.
+ */
+ stat->dio_read_offset_align = bdev_logical_block_size(bdev);
+ if (xfs_is_cow_inode(ip))
+ stat->dio_offset_align = xfs_inode_alloc_unitsize(ip);
+ else
+ stat->dio_offset_align = stat->dio_read_offset_align;
}
static void
@@ -658,7 +673,7 @@ xfs_vn_getattr(
stat->rdev = inode->i_rdev;
break;
case S_IFREG:
- if (request_mask & STATX_DIOALIGN)
+ if (request_mask & (STATX_DIOALIGN | STATX_DIO_READ_ALIGN))
xfs_report_dioalign(ip, stat);
if (request_mask & STATX_WRITE_ATOMIC)
xfs_report_atomic_write(ip, stat);
--
2.45.2
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH 4/5] xfs: report the correct read/write dio alignment for reflinked inodes
2025-01-08 8:55 ` [PATCH 4/5] xfs: report the correct read/write dio alignment for reflinked inodes Christoph Hellwig
@ 2025-01-08 10:10 ` John Garry
2025-01-08 10:13 ` John Garry
2025-01-08 17:53 ` Eric Biggers
2 siblings, 0 replies; 19+ messages in thread
From: John Garry @ 2025-01-08 10:10 UTC (permalink / raw)
To: Christoph Hellwig, Alexander Viro, Christian Brauner
Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs
On 08/01/2025 08:55, Christoph Hellwig wrote:
> For I/O to reflinked blocks we always need to write an entire new
> file system block, and the code enforces the file system block alignment
> for the entire file if it has any reflinked blocks.
>
> Use the new STATX_DIO_READ_ALIGN flag to report the asymmetric read
> vs write alignments for reflinked files.
>
> Signed-off-by: Christoph Hellwig<hch@lst.de>
> Reviewed-by: "Darrick J. Wong"<djwong@kernel.org>
FWIW:
Reviewed-by: John Garry <john.g.garry@oracle.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/5] xfs: report the correct read/write dio alignment for reflinked inodes
2025-01-08 8:55 ` [PATCH 4/5] xfs: report the correct read/write dio alignment for reflinked inodes Christoph Hellwig
2025-01-08 10:10 ` John Garry
@ 2025-01-08 10:13 ` John Garry
2025-01-08 15:18 ` Christoph Hellwig
2025-01-08 17:53 ` Eric Biggers
2 siblings, 1 reply; 19+ messages in thread
From: John Garry @ 2025-01-08 10:13 UTC (permalink / raw)
To: Christoph Hellwig, Alexander Viro, Christian Brauner
Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs
On 08/01/2025 08:55, Christoph Hellwig wrote:
> @@ -580,9 +580,24 @@ xfs_report_dioalign(
> struct xfs_buftarg *target = xfs_inode_buftarg(ip);
> struct block_device *bdev = target->bt_bdev;
>
> - stat->result_mask |= STATX_DIOALIGN;
> + stat->result_mask |= STATX_DIOALIGN | STATX_DIO_READ_ALIGN;
BTW, it would be a crappy userspace which can't handle fields which it
did not ask for, e.g. asked for STATX_DIOALIGN, but got STATX_DIOALIGN
and STATX_DIO_READ_ALIGN
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/5] xfs: report the correct read/write dio alignment for reflinked inodes
2025-01-08 10:13 ` John Garry
@ 2025-01-08 15:18 ` Christoph Hellwig
2025-01-08 17:20 ` Darrick J. Wong
0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2025-01-08 15:18 UTC (permalink / raw)
To: John Garry
Cc: Christoph Hellwig, Alexander Viro, Christian Brauner, Jan Kara,
Chandan Babu R, Darrick J. Wong, Hongbo Li, Ryusuke Konishi,
linux-nilfs, linux-fsdevel, linux-xfs
On Wed, Jan 08, 2025 at 10:13:02AM +0000, John Garry wrote:
> On 08/01/2025 08:55, Christoph Hellwig wrote:
>> @@ -580,9 +580,24 @@ xfs_report_dioalign(
>> struct xfs_buftarg *target = xfs_inode_buftarg(ip);
>> struct block_device *bdev = target->bt_bdev;
>> - stat->result_mask |= STATX_DIOALIGN;
>> + stat->result_mask |= STATX_DIOALIGN | STATX_DIO_READ_ALIGN;
>
> BTW, it would be a crappy userspace which can't handle fields which it did
> not ask for, e.g. asked for STATX_DIOALIGN, but got STATX_DIOALIGN and
> STATX_DIO_READ_ALIGN
Well, the space is marked for extension. I don't think there ever
was a requirement only fields asked for are reported, but if that
feels safer I could switch to that.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/5] xfs: report the correct read/write dio alignment for reflinked inodes
2025-01-08 15:18 ` Christoph Hellwig
@ 2025-01-08 17:20 ` Darrick J. Wong
0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2025-01-08 17:20 UTC (permalink / raw)
To: Christoph Hellwig
Cc: John Garry, Alexander Viro, Christian Brauner, Jan Kara,
Chandan Babu R, Hongbo Li, Ryusuke Konishi, linux-nilfs,
linux-fsdevel, linux-xfs
On Wed, Jan 08, 2025 at 04:18:27PM +0100, Christoph Hellwig wrote:
> On Wed, Jan 08, 2025 at 10:13:02AM +0000, John Garry wrote:
> > On 08/01/2025 08:55, Christoph Hellwig wrote:
> >> @@ -580,9 +580,24 @@ xfs_report_dioalign(
> >> struct xfs_buftarg *target = xfs_inode_buftarg(ip);
> >> struct block_device *bdev = target->bt_bdev;
> >> - stat->result_mask |= STATX_DIOALIGN;
> >> + stat->result_mask |= STATX_DIOALIGN | STATX_DIO_READ_ALIGN;
> >
> > BTW, it would be a crappy userspace which can't handle fields which it did
> > not ask for, e.g. asked for STATX_DIOALIGN, but got STATX_DIOALIGN and
> > STATX_DIO_READ_ALIGN
>
> Well, the space is marked for extension. I don't think there ever
> was a requirement only fields asked for are reported, but if that
> feels safer I could switch to that.
From the "Returned information" section of the statx manpage:
"It should be noted that the kernel may return fields that weren't
requested and may fail to return fields that were requested, depending
on what the backing filesystem supports. (Fields that are given values
despite being unrequested can just be ignored.) In either case,
stx_mask will not be equal mask.
<snip>
"A filesystem may also fill in fields that the caller didn't ask for if
it has values for them available and the information is available at no
extra cost. If this happens, the corresponding bits will be set in
stx_mask."
In other words, the kernel is allowed to return information/flags that
weren't requested, and userspace can ignore it.
--D
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 4/5] xfs: report the correct read/write dio alignment for reflinked inodes
2025-01-08 8:55 ` [PATCH 4/5] xfs: report the correct read/write dio alignment for reflinked inodes Christoph Hellwig
2025-01-08 10:10 ` John Garry
2025-01-08 10:13 ` John Garry
@ 2025-01-08 17:53 ` Eric Biggers
2025-01-09 6:25 ` Christoph Hellwig
2 siblings, 1 reply; 19+ messages in thread
From: Eric Biggers @ 2025-01-08 17:53 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Alexander Viro, Christian Brauner, Jan Kara, Chandan Babu R,
Darrick J. Wong, Hongbo Li, Ryusuke Konishi, linux-nilfs,
linux-fsdevel, linux-xfs
On Wed, Jan 08, 2025 at 09:55:32AM +0100, Christoph Hellwig wrote:
> For I/O to reflinked blocks we always need to write an entire new
> file system block, and the code enforces the file system block alignment
> for the entire file if it has any reflinked blocks.
>
> Use the new STATX_DIO_READ_ALIGN flag to report the asymmetric read
> vs write alignments for reflinked files.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
> ---
> fs/xfs/xfs_iops.c | 21 ++++++++++++++++++---
> 1 file changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 6b0228a21617..40289fe6f5b2 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -580,9 +580,24 @@ xfs_report_dioalign(
> struct xfs_buftarg *target = xfs_inode_buftarg(ip);
> struct block_device *bdev = target->bt_bdev;
>
> - stat->result_mask |= STATX_DIOALIGN;
> + stat->result_mask |= STATX_DIOALIGN | STATX_DIO_READ_ALIGN;
> stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
> - stat->dio_offset_align = bdev_logical_block_size(bdev);
> +
> + /*
> + * For COW inodes, we can only perform out of place writes of entire
> + * allocation units (blocks or RT extents).
> + * For writes smaller than the allocation unit, we must fall back to
> + * buffered I/O to perform read-modify-write cycles. At best this is
> + * highly inefficient; at worst it leads to page cache invalidation
> + * races. Tell applications to avoid this by reporting the larger write
> + * alignment in dio_offset_align, and the smaller read alignment in
> + * dio_read_offset_align.
> + */
> + stat->dio_read_offset_align = bdev_logical_block_size(bdev);
> + if (xfs_is_cow_inode(ip))
> + stat->dio_offset_align = xfs_inode_alloc_unitsize(ip);
> + else
> + stat->dio_offset_align = stat->dio_read_offset_align;
This contradicts the proposed man page, which says the following about
stx_dio_read_offset_align offset:
If non-zero this value must be smaller than stx_dio_offset_align
which must be provided by the file system.
The proposed code sets stx_dio_read_offset_align and stx_dio_offset_align to the
same value in some cases.
Perhaps the documentation should say "less than or equal to"?
Also, the claim that stx_dio_offset_align "must be provided by the file system"
if stx_dio_read_offset_align is nonzero should probably be conditional on
STATX_DIOALIGN being provided too.
- Eric
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH 4/5] xfs: report the correct read/write dio alignment for reflinked inodes
2025-01-08 17:53 ` Eric Biggers
@ 2025-01-09 6:25 ` Christoph Hellwig
0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-01-09 6:25 UTC (permalink / raw)
To: Eric Biggers
Cc: Christoph Hellwig, Alexander Viro, Christian Brauner, Jan Kara,
Chandan Babu R, Darrick J. Wong, Hongbo Li, Ryusuke Konishi,
linux-nilfs, linux-fsdevel, linux-xfs
On Wed, Jan 08, 2025 at 09:53:58AM -0800, Eric Biggers wrote:
> This contradicts the proposed man page, which says the following about
> stx_dio_read_offset_align offset:
>
> If non-zero this value must be smaller than stx_dio_offset_align
> which must be provided by the file system.
>
> The proposed code sets stx_dio_read_offset_align and stx_dio_offset_align to the
> same value in some cases.
>
> Perhaps the documentation should say "less than or equal to"?
Yes, I'll fix it up.
> Also, the claim that stx_dio_offset_align "must be provided by the file system"
> if stx_dio_read_offset_align is nonzero should probably be conditional on
> STATX_DIOALIGN being provided too.
I'll add a "if requested" to the man page.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 5/5] xfs: report larger dio alignment for COW inodes
2025-01-08 8:55 add STATX_DIO_READ_ALIGN v2 Christoph Hellwig
` (3 preceding siblings ...)
2025-01-08 8:55 ` [PATCH 4/5] xfs: report the correct read/write dio alignment for reflinked inodes Christoph Hellwig
@ 2025-01-08 8:55 ` Christoph Hellwig
2025-01-08 10:11 ` John Garry
2025-01-08 8:59 ` [PATCH] statx.2: document STATX_DIO_READ_ALIGN Christoph Hellwig
5 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2025-01-08 8:55 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner
Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs
For I/O to reflinked blocks we always need to write an entire new file
system block, and the code enforces the file system block alignment for
the entire file if it has any reflinked blocks. Mirror the larger
value reported in the statx in the dio_offset_align in the xfs-specific
XFS_IOC_DIOINFO ioctl for the same reason.
Don't bother adding a new field for the read alignment to this legacy
ioctl as all new users should use statx instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_ioctl.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 0789c18aaa18..20f3cf5391c6 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1204,7 +1204,16 @@ xfs_file_ioctl(
struct xfs_buftarg *target = xfs_inode_buftarg(ip);
struct dioattr da;
- da.d_mem = da.d_miniosz = target->bt_logical_sectorsize;
+ da.d_mem = target->bt_logical_sectorsize;
+
+ /*
+ * See xfs_report_dioalign() why report a potential larger than
+ * sector sizevalue here for COW inodes.
+ */
+ if (xfs_is_cow_inode(ip))
+ da.d_miniosz = xfs_inode_alloc_unitsize(ip);
+ else
+ da.d_miniosz = target->bt_logical_sectorsize;
da.d_maxiosz = INT_MAX & ~(da.d_miniosz - 1);
if (copy_to_user(arg, &da, sizeof(da)))
--
2.45.2
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH 5/5] xfs: report larger dio alignment for COW inodes
2025-01-08 8:55 ` [PATCH 5/5] xfs: report larger dio alignment for COW inodes Christoph Hellwig
@ 2025-01-08 10:11 ` John Garry
2025-01-08 17:25 ` Darrick J. Wong
0 siblings, 1 reply; 19+ messages in thread
From: John Garry @ 2025-01-08 10:11 UTC (permalink / raw)
To: Christoph Hellwig, Alexander Viro, Christian Brauner
Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs
On 08/01/2025 08:55, Christoph Hellwig wrote:
> For I/O to reflinked blocks we always need to write an entire new file
> system block, and the code enforces the file system block alignment for
> the entire file if it has any reflinked blocks. Mirror the larger
> value reported in the statx in the dio_offset_align in the xfs-specific
> XFS_IOC_DIOINFO ioctl for the same reason.
>
> Don't bother adding a new field for the read alignment to this legacy
> ioctl as all new users should use statx instead.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/xfs/xfs_ioctl.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 0789c18aaa18..20f3cf5391c6 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1204,7 +1204,16 @@ xfs_file_ioctl(
> struct xfs_buftarg *target = xfs_inode_buftarg(ip);
> struct dioattr da;
>
> - da.d_mem = da.d_miniosz = target->bt_logical_sectorsize;
> + da.d_mem = target->bt_logical_sectorsize;
> +
> + /*
> + * See xfs_report_dioalign() why report a potential larger than
> + * sector sizevalue here for COW inodes.
nit: sizevalue
> + */
> + if (xfs_is_cow_inode(ip))
> + da.d_miniosz = xfs_inode_alloc_unitsize(ip);
> + else
> + da.d_miniosz = target->bt_logical_sectorsize;
> da.d_maxiosz = INT_MAX & ~(da.d_miniosz - 1);
>
> if (copy_to_user(arg, &da, sizeof(da)))
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 5/5] xfs: report larger dio alignment for COW inodes
2025-01-08 10:11 ` John Garry
@ 2025-01-08 17:25 ` Darrick J. Wong
0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2025-01-08 17:25 UTC (permalink / raw)
To: John Garry
Cc: Christoph Hellwig, Alexander Viro, Christian Brauner, Jan Kara,
Chandan Babu R, Hongbo Li, Ryusuke Konishi, linux-nilfs,
linux-fsdevel, linux-xfs
On Wed, Jan 08, 2025 at 10:11:27AM +0000, John Garry wrote:
> On 08/01/2025 08:55, Christoph Hellwig wrote:
> > For I/O to reflinked blocks we always need to write an entire new file
> > system block, and the code enforces the file system block alignment for
> > the entire file if it has any reflinked blocks. Mirror the larger
> > value reported in the statx in the dio_offset_align in the xfs-specific
> > XFS_IOC_DIOINFO ioctl for the same reason.
> >
> > Don't bother adding a new field for the read alignment to this legacy
> > ioctl as all new users should use statx instead.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> > fs/xfs/xfs_ioctl.c | 11 ++++++++++-
> > 1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index 0789c18aaa18..20f3cf5391c6 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -1204,7 +1204,16 @@ xfs_file_ioctl(
> > struct xfs_buftarg *target = xfs_inode_buftarg(ip);
> > struct dioattr da;
> > - da.d_mem = da.d_miniosz = target->bt_logical_sectorsize;
> > + da.d_mem = target->bt_logical_sectorsize;
> > +
> > + /*
> > + * See xfs_report_dioalign() why report a potential larger than
> > + * sector sizevalue here for COW inodes.
>
> nit: sizevalue
The sentence reads oddly to me; how about:
"See xfs_report_dioalign() for an explanation about why we report a
value larger than the sector size for COW inodes."?
The code change looks ok though.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> > + */
> > + if (xfs_is_cow_inode(ip))
> > + da.d_miniosz = xfs_inode_alloc_unitsize(ip);
> > + else
> > + da.d_miniosz = target->bt_logical_sectorsize;
> > da.d_maxiosz = INT_MAX & ~(da.d_miniosz - 1);
> > if (copy_to_user(arg, &da, sizeof(da)))
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH] statx.2: document STATX_DIO_READ_ALIGN
2025-01-08 8:55 add STATX_DIO_READ_ALIGN v2 Christoph Hellwig
` (4 preceding siblings ...)
2025-01-08 8:55 ` [PATCH 5/5] xfs: report larger dio alignment for COW inodes Christoph Hellwig
@ 2025-01-08 8:59 ` Christoph Hellwig
2025-01-08 17:27 ` Darrick J. Wong
5 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2025-01-08 8:59 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner
Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs, linux-man
Document the new STATX_DIO_READ_ALIGN flag and the new
stx_dio_read_offset_align field guarded by it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
man/man2/statx.2 | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
diff --git a/man/man2/statx.2 b/man/man2/statx.2
index c5b5a28ec2f1..8ef6a1cfb1c0 100644
--- a/man/man2/statx.2
+++ b/man/man2/statx.2
@@ -76,6 +76,9 @@ struct statx {
__u32 stx_atomic_write_unit_min;
__u32 stx_atomic_write_unit_max;
__u32 stx_atomic_write_segments_max;
+
+ /* File offset alignment for direct I/O reads */
+ __u32 stx_dio_read_offset_align;
};
.EE
.in
@@ -261,7 +264,7 @@ STATX_BTIME Want stx_btime
STATX_ALL The same as STATX_BASIC_STATS | STATX_BTIME.
It is deprecated and should not be used.
STATX_MNT_ID Want stx_mnt_id (since Linux 5.8)
-STATX_DIOALIGN Want stx_dio_mem_align and stx_dio_offset_align
+STATX_DIOALIGN Want stx_dio_mem_align and stx_dio_offset_align.
(since Linux 6.1; support varies by filesystem)
STATX_MNT_ID_UNIQUE Want unique stx_mnt_id (since Linux 6.8)
STATX_SUBVOL Want stx_subvol
@@ -270,6 +273,8 @@ STATX_WRITE_ATOMIC Want stx_atomic_write_unit_min,
stx_atomic_write_unit_max,
and stx_atomic_write_segments_max.
(since Linux 6.11; support varies by filesystem)
+STATX_DIO_READ_ALIGN Want stx_dio_read_offset_align.
+ (since Linux 6.14; support varies by filesystem)
.TE
.in
.P
@@ -467,6 +472,25 @@ This will only be nonzero if
.I stx_dio_mem_align
is nonzero, and vice versa.
.TP
+.I stx_dio_read_offset_align
+The alignment (in bytes) required for file offsets and I/O segment lengths for
+direct I/O reads
+.RB ( O_DIRECT )
+on this file.
+If zero the limit in
+.I stx_dio_offset_align
+applies for reads as well.
+If non-zero this value must be smaller than
+.I stx_dio_offset_align
+which must be provided by the file system.
+The memory alignment in
+.I stx_dio_mem_align
+is not affected by this value.
+.IP
+.B STATX_DIO_READ_ALIGN
+.RI ( stx_dio_offset_align )
+is supported by xfs on regular files since Linux 6.14.
+.TP
.I stx_subvol
Subvolume number of the current file.
.IP
--
2.45.2
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH] statx.2: document STATX_DIO_READ_ALIGN
2025-01-08 8:59 ` [PATCH] statx.2: document STATX_DIO_READ_ALIGN Christoph Hellwig
@ 2025-01-08 17:27 ` Darrick J. Wong
0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2025-01-08 17:27 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Alexander Viro, Christian Brauner, Jan Kara, Chandan Babu R,
Hongbo Li, Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs,
linux-man
On Wed, Jan 08, 2025 at 09:59:00AM +0100, Christoph Hellwig wrote:
> Document the new STATX_DIO_READ_ALIGN flag and the new
> stx_dio_read_offset_align field guarded by it.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> man/man2/statx.2 | 26 +++++++++++++++++++++++++-
> 1 file changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/man/man2/statx.2 b/man/man2/statx.2
> index c5b5a28ec2f1..8ef6a1cfb1c0 100644
> --- a/man/man2/statx.2
> +++ b/man/man2/statx.2
> @@ -76,6 +76,9 @@ struct statx {
> __u32 stx_atomic_write_unit_min;
> __u32 stx_atomic_write_unit_max;
> __u32 stx_atomic_write_segments_max;
> +
> + /* File offset alignment for direct I/O reads */
> + __u32 stx_dio_read_offset_align;
> };
> .EE
> .in
> @@ -261,7 +264,7 @@ STATX_BTIME Want stx_btime
> STATX_ALL The same as STATX_BASIC_STATS | STATX_BTIME.
> It is deprecated and should not be used.
> STATX_MNT_ID Want stx_mnt_id (since Linux 5.8)
> -STATX_DIOALIGN Want stx_dio_mem_align and stx_dio_offset_align
> +STATX_DIOALIGN Want stx_dio_mem_align and stx_dio_offset_align.
> (since Linux 6.1; support varies by filesystem)
> STATX_MNT_ID_UNIQUE Want unique stx_mnt_id (since Linux 6.8)
> STATX_SUBVOL Want stx_subvol
> @@ -270,6 +273,8 @@ STATX_WRITE_ATOMIC Want stx_atomic_write_unit_min,
> stx_atomic_write_unit_max,
> and stx_atomic_write_segments_max.
> (since Linux 6.11; support varies by filesystem)
> +STATX_DIO_READ_ALIGN Want stx_dio_read_offset_align.
> + (since Linux 6.14; support varies by filesystem)
> .TE
> .in
> .P
> @@ -467,6 +472,25 @@ This will only be nonzero if
> .I stx_dio_mem_align
> is nonzero, and vice versa.
> .TP
> +.I stx_dio_read_offset_align
> +The alignment (in bytes) required for file offsets and I/O segment lengths for
> +direct I/O reads
> +.RB ( O_DIRECT )
> +on this file.
> +If zero the limit in
nit: add a comma here (really a dependent clause) to make it clearer
that 'zero' isn't being used as a verb here:
"If zero, the limit in..."
> +.I stx_dio_offset_align
> +applies for reads as well.
> +If non-zero this value must be smaller than
Same here.
"If non-zero, this value..."
With that fixed,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> +.I stx_dio_offset_align
> +which must be provided by the file system.
> +The memory alignment in
> +.I stx_dio_mem_align
> +is not affected by this value.
> +.IP
> +.B STATX_DIO_READ_ALIGN
> +.RI ( stx_dio_offset_align )
> +is supported by xfs on regular files since Linux 6.14.
> +.TP
> .I stx_subvol
> Subvolume number of the current file.
> .IP
> --
> 2.45.2
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread