linux-nilfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* add STATX_DIO_READ_ALIGN
@ 2025-01-06 15:15 Christoph Hellwig
  2025-01-06 15:15 ` [PATCH 1/4] fs: reformat the statx definition Christoph Hellwig
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-01-06 15:15 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner
  Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
	Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs

Hi all,

file systems that write out of place usually require different alignment
for direct I/O writes than what they can do for reads.  This series tries
to address this by yet another statx field.

Note that the code is completely untested - I wrote it and got preempted
and only sent this out because Hongbo Li brought the issue up in the
nilfs2 context.  I've just started a vacation so I'm unlikely to get
back to it any time soon, but if someone wants to take the work over
go for it.  I'll probably answer to email at least every other day or
so.

Changes from RFC:
 - add a cleanup patch for xfs_vn_getattr
 - use xfs_inode_alloc_unitsize in xfs_vn_getattr
 - improve a comment in XFS

Diffstat:
 fs/stat.c                 |    1 
 fs/xfs/xfs_iops.c         |   65 +++++++++++++++++++-----------
 include/linux/stat.h      |    1 
 include/uapi/linux/stat.h |   99 ++++++++++++++++++++++++++++++++++------------
 4 files changed, 118 insertions(+), 48 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/4] fs: reformat the statx definition
  2025-01-06 15:15 add STATX_DIO_READ_ALIGN Christoph Hellwig
@ 2025-01-06 15:15 ` Christoph Hellwig
  2025-01-06 16:28   ` Jan Kara
  2025-01-06 15:15 ` [PATCH 2/4] fs: add STATX_DIO_READ_ALIGN Christoph Hellwig
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2025-01-06 15:15 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner
  Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
	Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs

The comments after the declaration are becoming rather unreadable with
long enough comments.  Move them into lines of their own.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 include/uapi/linux/stat.h | 95 +++++++++++++++++++++++++++++----------
 1 file changed, 72 insertions(+), 23 deletions(-)

diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index 887a25286441..8b35d7d511a2 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -98,43 +98,92 @@ struct statx_timestamp {
  */
 struct statx {
 	/* 0x00 */
-	__u32	stx_mask;	/* What results were written [uncond] */
-	__u32	stx_blksize;	/* Preferred general I/O size [uncond] */
-	__u64	stx_attributes;	/* Flags conveying information about the file [uncond] */
+	/* What results were written [uncond] */
+	__u32	stx_mask;
+
+	/* Preferred general I/O size [uncond] */
+	__u32	stx_blksize;
+
+	/* Flags conveying information about the file [uncond] */
+	__u64	stx_attributes;
+
 	/* 0x10 */
-	__u32	stx_nlink;	/* Number of hard links */
-	__u32	stx_uid;	/* User ID of owner */
-	__u32	stx_gid;	/* Group ID of owner */
-	__u16	stx_mode;	/* File mode */
+	/* Number of hard links */
+	__u32	stx_nlink;
+
+	/* User ID of owner */
+	__u32	stx_uid;
+
+	/* Group ID of owner */
+	__u32	stx_gid;
+
+	/* File mode */
+	__u16	stx_mode;
 	__u16	__spare0[1];
+
 	/* 0x20 */
-	__u64	stx_ino;	/* Inode number */
-	__u64	stx_size;	/* File size */
-	__u64	stx_blocks;	/* Number of 512-byte blocks allocated */
-	__u64	stx_attributes_mask; /* Mask to show what's supported in stx_attributes */
+	/* Inode number */
+	__u64	stx_ino;
+
+	/* File size */
+	__u64	stx_size;
+
+	/* Number of 512-byte blocks allocated */
+	__u64	stx_blocks;
+
+	/* Mask to show what's supported in stx_attributes */
+	__u64	stx_attributes_mask;
+
 	/* 0x40 */
-	struct statx_timestamp	stx_atime;	/* Last access time */
-	struct statx_timestamp	stx_btime;	/* File creation time */
-	struct statx_timestamp	stx_ctime;	/* Last attribute change time */
-	struct statx_timestamp	stx_mtime;	/* Last data modification time */
+	/* Last access time */
+	struct statx_timestamp	stx_atime;
+
+	/* File creation time */
+	struct statx_timestamp	stx_btime;
+
+	/* Last attribute change time */
+	struct statx_timestamp	stx_ctime;
+
+	/* Last data modification time */
+	struct statx_timestamp	stx_mtime;
+
 	/* 0x80 */
-	__u32	stx_rdev_major;	/* Device ID of special file [if bdev/cdev] */
+	/* Device ID of special file [if bdev/cdev] */
+	__u32	stx_rdev_major;
 	__u32	stx_rdev_minor;
-	__u32	stx_dev_major;	/* ID of device containing file [uncond] */
+
+	/* ID of device containing file [uncond] */
+	__u32	stx_dev_major;
 	__u32	stx_dev_minor;
+
 	/* 0x90 */
 	__u64	stx_mnt_id;
-	__u32	stx_dio_mem_align;	/* Memory buffer alignment for direct I/O */
-	__u32	stx_dio_offset_align;	/* File offset alignment for direct I/O */
+
+	/* Memory buffer alignment for direct I/O */
+	__u32	stx_dio_mem_align;
+
+	/* File offset alignment for direct I/O */
+	__u32	stx_dio_offset_align;
+
 	/* 0xa0 */
-	__u64	stx_subvol;	/* Subvolume identifier */
-	__u32	stx_atomic_write_unit_min;	/* Min atomic write unit in bytes */
-	__u32	stx_atomic_write_unit_max;	/* Max atomic write unit in bytes */
+	/* Subvolume identifier */
+	__u64	stx_subvol;
+
+	/* Min atomic write unit in bytes */
+	__u32	stx_atomic_write_unit_min;
+
+	/* Max atomic write unit in bytes */
+	__u32	stx_atomic_write_unit_max;
+
 	/* 0xb0 */
-	__u32   stx_atomic_write_segments_max;	/* Max atomic write segment count */
+	/* Max atomic write segment count */
+	__u32   stx_atomic_write_segments_max;
+
 	__u32   __spare1[1];
+
 	/* 0xb8 */
 	__u64	__spare3[9];	/* Spare space for future expansion */
+
 	/* 0x100 */
 };
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/4] fs: add STATX_DIO_READ_ALIGN
  2025-01-06 15:15 add STATX_DIO_READ_ALIGN Christoph Hellwig
  2025-01-06 15:15 ` [PATCH 1/4] fs: reformat the statx definition Christoph Hellwig
@ 2025-01-06 15:15 ` Christoph Hellwig
  2025-01-06 16:32   ` Jan Kara
  2025-01-06 15:15 ` [PATCH 3/4] xfs: cleanup xfs_vn_getattr Christoph Hellwig
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2025-01-06 15:15 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner
  Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
	Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs

Add a separate dio read align field, as many out of place write
file systems can easily do reads aligned to the device sector size,
but require bigger alignment for writes.

This is usually papered over by falling back to buffered I/O for smaller
writes and doing read-modify-write cycles, but performance for this
sucks, so applications benefit from knowing the actual write alignment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/stat.c                 | 1 +
 include/linux/stat.h      | 1 +
 include/uapi/linux/stat.h | 4 +++-
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/stat.c b/fs/stat.c
index 0870e969a8a0..2c0e111a098a 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -725,6 +725,7 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer)
 	tmp.stx_mnt_id = stat->mnt_id;
 	tmp.stx_dio_mem_align = stat->dio_mem_align;
 	tmp.stx_dio_offset_align = stat->dio_offset_align;
+	tmp.stx_dio_read_offset_align = stat->dio_read_offset_align;
 	tmp.stx_subvol = stat->subvol;
 	tmp.stx_atomic_write_unit_min = stat->atomic_write_unit_min;
 	tmp.stx_atomic_write_unit_max = stat->atomic_write_unit_max;
diff --git a/include/linux/stat.h b/include/linux/stat.h
index 3d900c86981c..9d8382e23a9c 100644
--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -52,6 +52,7 @@ struct kstat {
 	u64		mnt_id;
 	u32		dio_mem_align;
 	u32		dio_offset_align;
+	u32		dio_read_offset_align;
 	u64		change_cookie;
 	u64		subvol;
 	u32		atomic_write_unit_min;
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index 8b35d7d511a2..f78ee3670dd5 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -179,7 +179,8 @@ struct statx {
 	/* Max atomic write segment count */
 	__u32   stx_atomic_write_segments_max;
 
-	__u32   __spare1[1];
+	/* File offset alignment for direct I/O reads */
+	__u32	stx_dio_read_offset_align;
 
 	/* 0xb8 */
 	__u64	__spare3[9];	/* Spare space for future expansion */
@@ -213,6 +214,7 @@ struct statx {
 #define STATX_MNT_ID_UNIQUE	0x00004000U	/* Want/got extended stx_mount_id */
 #define STATX_SUBVOL		0x00008000U	/* Want/got stx_subvol */
 #define STATX_WRITE_ATOMIC	0x00010000U	/* Want/got atomic_write_* fields */
+#define STATX_DIO_READ_ALIGN	0x00020000U	/* Want/got dio read alignment info */
 
 #define STATX__RESERVED		0x80000000U	/* Reserved for future struct statx expansion */
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/4] xfs: cleanup xfs_vn_getattr
  2025-01-06 15:15 add STATX_DIO_READ_ALIGN Christoph Hellwig
  2025-01-06 15:15 ` [PATCH 1/4] fs: reformat the statx definition Christoph Hellwig
  2025-01-06 15:15 ` [PATCH 2/4] fs: add STATX_DIO_READ_ALIGN Christoph Hellwig
@ 2025-01-06 15:15 ` Christoph Hellwig
  2025-01-06 17:07   ` John Garry
  2025-01-06 15:15 ` [PATCH 4/4] xfs: report the correct read/write dio alignment for reflinked inodes Christoph Hellwig
  2025-01-06 15:19 ` [PATCH] statx.2: document STATX_DIO_READ_ALIGN Christoph Hellwig
  4 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2025-01-06 15:15 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner
  Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
	Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs

Split the two bits of optional statx reporting into their own helpers
so that they are self-contained instead of deeply indented in the main
getattr handler.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/xfs_iops.c | 47 +++++++++++++++++++++++------------------------
 1 file changed, 23 insertions(+), 24 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 207e0dadffc3..6b0228a21617 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -573,17 +573,28 @@ xfs_stat_blksize(
 }
 
 static void
-xfs_get_atomic_write_attr(
+xfs_report_dioalign(
 	struct xfs_inode	*ip,
-	unsigned int		*unit_min,
-	unsigned int		*unit_max)
+	struct kstat		*stat)
 {
-	if (!xfs_inode_can_atomicwrite(ip)) {
-		*unit_min = *unit_max = 0;
-		return;
-	}
+	struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
+	struct block_device	*bdev = target->bt_bdev;
 
-	*unit_min = *unit_max = ip->i_mount->m_sb.sb_blocksize;
+	stat->result_mask |= STATX_DIOALIGN;
+	stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
+	stat->dio_offset_align = bdev_logical_block_size(bdev);
+}
+
+static void
+xfs_report_atomic_write(
+	struct xfs_inode	*ip,
+	struct kstat		*stat)
+{
+	unsigned int		unit_min = 0, unit_max = 0;
+
+	if (xfs_inode_can_atomicwrite(ip))
+		unit_min = unit_max = ip->i_mount->m_sb.sb_blocksize;
+	generic_fill_statx_atomic_writes(stat, unit_min, unit_max);
 }
 
 STATIC int
@@ -647,22 +658,10 @@ xfs_vn_getattr(
 		stat->rdev = inode->i_rdev;
 		break;
 	case S_IFREG:
-		if (request_mask & STATX_DIOALIGN) {
-			struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
-			struct block_device	*bdev = target->bt_bdev;
-
-			stat->result_mask |= STATX_DIOALIGN;
-			stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
-			stat->dio_offset_align = bdev_logical_block_size(bdev);
-		}
-		if (request_mask & STATX_WRITE_ATOMIC) {
-			unsigned int	unit_min, unit_max;
-
-			xfs_get_atomic_write_attr(ip, &unit_min,
-					&unit_max);
-			generic_fill_statx_atomic_writes(stat,
-					unit_min, unit_max);
-		}
+		if (request_mask & STATX_DIOALIGN)
+			xfs_report_dioalign(ip, stat);
+		if (request_mask & STATX_WRITE_ATOMIC)
+			xfs_report_atomic_write(ip, stat);
 		fallthrough;
 	default:
 		stat->blksize = xfs_stat_blksize(ip);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/4] xfs: report the correct read/write dio alignment for reflinked inodes
  2025-01-06 15:15 add STATX_DIO_READ_ALIGN Christoph Hellwig
                   ` (2 preceding siblings ...)
  2025-01-06 15:15 ` [PATCH 3/4] xfs: cleanup xfs_vn_getattr Christoph Hellwig
@ 2025-01-06 15:15 ` Christoph Hellwig
  2025-01-06 17:33   ` Darrick J. Wong
  2025-01-06 18:37   ` John Garry
  2025-01-06 15:19 ` [PATCH] statx.2: document STATX_DIO_READ_ALIGN Christoph Hellwig
  4 siblings, 2 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-01-06 15:15 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner
  Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
	Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs

For I/O to reflinked blocks we always need to write an entire new
file system block, and the code enforces the file system block alignment
for the entire file if it has any reflinked blocks.

Use the new STATX_DIO_READ_ALIGN flag to report the asymmetric read
vs write alignments for reflinked files.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iops.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 6b0228a21617..053d05f5567d 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -580,9 +580,27 @@ xfs_report_dioalign(
 	struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
 	struct block_device	*bdev = target->bt_bdev;
 
-	stat->result_mask |= STATX_DIOALIGN;
+	stat->result_mask |= STATX_DIOALIGN | STATX_DIO_READ_ALIGN;
 	stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
-	stat->dio_offset_align = bdev_logical_block_size(bdev);
+	stat->dio_read_offset_align = bdev_logical_block_size(bdev);
+
+	/*
+	 * On COW inodes we are forced to always rewrite an entire file system
+	 * block or RT extent.
+	 *
+	 * Because applications assume they can do sector sized direct writes
+	 * on XFS we fall back to buffered I/O for sub-block direct I/O in that
+	 * case.  Because that needs to copy the entire block into the buffer
+	 * cache it is highly inefficient and can easily lead to page cache
+	 * invalidation races.
+	 *
+	 * Tell applications to avoid this case by reporting the natively
+	 * supported direct I/O read alignment.
+	 */
+	if (xfs_is_cow_inode(ip))
+		stat->dio_offset_align = xfs_inode_alloc_unitsize(ip);
+	else
+		stat->dio_offset_align = stat->dio_read_offset_align;
 }
 
 static void
@@ -658,7 +676,7 @@ xfs_vn_getattr(
 		stat->rdev = inode->i_rdev;
 		break;
 	case S_IFREG:
-		if (request_mask & STATX_DIOALIGN)
+		if (request_mask & (STATX_DIOALIGN | STATX_DIO_READ_ALIGN))
 			xfs_report_dioalign(ip, stat);
 		if (request_mask & STATX_WRITE_ATOMIC)
 			xfs_report_atomic_write(ip, stat);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH] statx.2: document STATX_DIO_READ_ALIGN
  2025-01-06 15:15 add STATX_DIO_READ_ALIGN Christoph Hellwig
                   ` (3 preceding siblings ...)
  2025-01-06 15:15 ` [PATCH 4/4] xfs: report the correct read/write dio alignment for reflinked inodes Christoph Hellwig
@ 2025-01-06 15:19 ` Christoph Hellwig
  2025-01-06 17:40   ` Darrick J. Wong
  2025-01-06 22:01   ` Alejandro Colomar
  4 siblings, 2 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-01-06 15:19 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner
  Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
	Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs, linux-man

Document the new STATX_DIO_READ_ALIGN flag and the new
stx_dio_read_offset_align field guarded by it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 man/man2/statx.2 | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/man/man2/statx.2 b/man/man2/statx.2
index c5b5a28ec2f1..378bf363d93f 100644
--- a/man/man2/statx.2
+++ b/man/man2/statx.2
@@ -76,6 +76,9 @@ struct statx {
     __u32 stx_atomic_write_unit_min;
     __u32 stx_atomic_write_unit_max;
     __u32 stx_atomic_write_segments_max;
+
+    /* File offset alignment for direct I/O reads */
+    __u32   stx_dio_read_offset_align;
 };
 .EE
 .in
@@ -261,7 +264,7 @@ STATX_BTIME	Want stx_btime
 STATX_ALL	The same as STATX_BASIC_STATS | STATX_BTIME.
 	It is deprecated and should not be used.
 STATX_MNT_ID	Want stx_mnt_id (since Linux 5.8)
-STATX_DIOALIGN	Want stx_dio_mem_align and stx_dio_offset_align
+STATX_DIOALIGN	Want stx_dio_mem_align and stx_dio_offset_align.
 	(since Linux 6.1; support varies by filesystem)
 STATX_MNT_ID_UNIQUE	Want unique stx_mnt_id (since Linux 6.8)
 STATX_SUBVOL	Want stx_subvol
@@ -270,6 +273,8 @@ STATX_WRITE_ATOMIC	Want stx_atomic_write_unit_min,
 	stx_atomic_write_unit_max,
 	and stx_atomic_write_segments_max.
 	(since Linux 6.11; support varies by filesystem)
+STATX_DIO_READ_ALIGN	Want stx_dio_read_offset_align.
+	(since Linux 6.14; support varies by filesystem)
 .TE
 .in
 .P
@@ -467,6 +472,26 @@ This will only be nonzero if
 .I stx_dio_mem_align
 is nonzero, and vice versa.
 .TP
+.I stx_dio_read_offset_align
+The alignment (in bytes) required for file offsets and I/O segment lengths for
+direct I/O reads
+.RB ( O_DIRECT )
+on this file.  If zero the limit in
+.I
+stx_dio_offset_align
+applies for reads as well.  If non-zero this value must be
+smaller than
+.I
+stx_dio_offset_align
+which must be provided by the file system.
+This value does not affect the memory alignent in
+.I stx_dio_mem_align .
+.IP
+.B STATX_DIO_READ_ALIGN
+.I ( stx_dio_offset_align )
+support by filesystem;
+it is supported by xfs since Linux 6.14.
+.TP
 .I stx_subvol
 Subvolume number of the current file.
 .IP
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] fs: reformat the statx definition
  2025-01-06 15:15 ` [PATCH 1/4] fs: reformat the statx definition Christoph Hellwig
@ 2025-01-06 16:28   ` Jan Kara
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Kara @ 2025-01-06 16:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Chandan Babu R,
	Darrick J. Wong, Hongbo Li, Ryusuke Konishi, linux-nilfs,
	linux-fsdevel, linux-xfs

On Mon 06-01-25 16:15:54, Christoph Hellwig wrote:
> The comments after the declaration are becoming rather unreadable with
> long enough comments.  Move them into lines of their own.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  include/uapi/linux/stat.h | 95 +++++++++++++++++++++++++++++----------
>  1 file changed, 72 insertions(+), 23 deletions(-)
> 
> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> index 887a25286441..8b35d7d511a2 100644
> --- a/include/uapi/linux/stat.h
> +++ b/include/uapi/linux/stat.h
> @@ -98,43 +98,92 @@ struct statx_timestamp {
>   */
>  struct statx {
>  	/* 0x00 */
> -	__u32	stx_mask;	/* What results were written [uncond] */
> -	__u32	stx_blksize;	/* Preferred general I/O size [uncond] */
> -	__u64	stx_attributes;	/* Flags conveying information about the file [uncond] */
> +	/* What results were written [uncond] */
> +	__u32	stx_mask;
> +
> +	/* Preferred general I/O size [uncond] */
> +	__u32	stx_blksize;
> +
> +	/* Flags conveying information about the file [uncond] */
> +	__u64	stx_attributes;
> +
>  	/* 0x10 */
> -	__u32	stx_nlink;	/* Number of hard links */
> -	__u32	stx_uid;	/* User ID of owner */
> -	__u32	stx_gid;	/* Group ID of owner */
> -	__u16	stx_mode;	/* File mode */
> +	/* Number of hard links */
> +	__u32	stx_nlink;
> +
> +	/* User ID of owner */
> +	__u32	stx_uid;
> +
> +	/* Group ID of owner */
> +	__u32	stx_gid;
> +
> +	/* File mode */
> +	__u16	stx_mode;
>  	__u16	__spare0[1];
> +
>  	/* 0x20 */
> -	__u64	stx_ino;	/* Inode number */
> -	__u64	stx_size;	/* File size */
> -	__u64	stx_blocks;	/* Number of 512-byte blocks allocated */
> -	__u64	stx_attributes_mask; /* Mask to show what's supported in stx_attributes */
> +	/* Inode number */
> +	__u64	stx_ino;
> +
> +	/* File size */
> +	__u64	stx_size;
> +
> +	/* Number of 512-byte blocks allocated */
> +	__u64	stx_blocks;
> +
> +	/* Mask to show what's supported in stx_attributes */
> +	__u64	stx_attributes_mask;
> +
>  	/* 0x40 */
> -	struct statx_timestamp	stx_atime;	/* Last access time */
> -	struct statx_timestamp	stx_btime;	/* File creation time */
> -	struct statx_timestamp	stx_ctime;	/* Last attribute change time */
> -	struct statx_timestamp	stx_mtime;	/* Last data modification time */
> +	/* Last access time */
> +	struct statx_timestamp	stx_atime;
> +
> +	/* File creation time */
> +	struct statx_timestamp	stx_btime;
> +
> +	/* Last attribute change time */
> +	struct statx_timestamp	stx_ctime;
> +
> +	/* Last data modification time */
> +	struct statx_timestamp	stx_mtime;
> +
>  	/* 0x80 */
> -	__u32	stx_rdev_major;	/* Device ID of special file [if bdev/cdev] */
> +	/* Device ID of special file [if bdev/cdev] */
> +	__u32	stx_rdev_major;
>  	__u32	stx_rdev_minor;
> -	__u32	stx_dev_major;	/* ID of device containing file [uncond] */
> +
> +	/* ID of device containing file [uncond] */
> +	__u32	stx_dev_major;
>  	__u32	stx_dev_minor;
> +
>  	/* 0x90 */
>  	__u64	stx_mnt_id;
> -	__u32	stx_dio_mem_align;	/* Memory buffer alignment for direct I/O */
> -	__u32	stx_dio_offset_align;	/* File offset alignment for direct I/O */
> +
> +	/* Memory buffer alignment for direct I/O */
> +	__u32	stx_dio_mem_align;
> +
> +	/* File offset alignment for direct I/O */
> +	__u32	stx_dio_offset_align;
> +
>  	/* 0xa0 */
> -	__u64	stx_subvol;	/* Subvolume identifier */
> -	__u32	stx_atomic_write_unit_min;	/* Min atomic write unit in bytes */
> -	__u32	stx_atomic_write_unit_max;	/* Max atomic write unit in bytes */
> +	/* Subvolume identifier */
> +	__u64	stx_subvol;
> +
> +	/* Min atomic write unit in bytes */
> +	__u32	stx_atomic_write_unit_min;
> +
> +	/* Max atomic write unit in bytes */
> +	__u32	stx_atomic_write_unit_max;
> +
>  	/* 0xb0 */
> -	__u32   stx_atomic_write_segments_max;	/* Max atomic write segment count */
> +	/* Max atomic write segment count */
> +	__u32   stx_atomic_write_segments_max;
> +
>  	__u32   __spare1[1];
> +
>  	/* 0xb8 */
>  	__u64	__spare3[9];	/* Spare space for future expansion */
> +
>  	/* 0x100 */
>  };
>  
> -- 
> 2.45.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] fs: add STATX_DIO_READ_ALIGN
  2025-01-06 15:15 ` [PATCH 2/4] fs: add STATX_DIO_READ_ALIGN Christoph Hellwig
@ 2025-01-06 16:32   ` Jan Kara
  2025-01-06 16:40     ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Kara @ 2025-01-06 16:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Chandan Babu R,
	Darrick J. Wong, Hongbo Li, Ryusuke Konishi, linux-nilfs,
	linux-fsdevel, linux-xfs

On Mon 06-01-25 16:15:55, Christoph Hellwig wrote:
> Add a separate dio read align field, as many out of place write
> file systems can easily do reads aligned to the device sector size,
> but require bigger alignment for writes.
> 
> This is usually papered over by falling back to buffered I/O for smaller
> writes and doing read-modify-write cycles, but performance for this
> sucks, so applications benefit from knowing the actual write alignment.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

So if I understand right dio_offset_align is guaranteed to work for all DIO
(i.e., maximum of all possible alignments), dio_read_offset_align is
possibly lower and works only for reads. Looks good to me. Feel free to
add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/stat.c                 | 1 +
>  include/linux/stat.h      | 1 +
>  include/uapi/linux/stat.h | 4 +++-
>  3 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/stat.c b/fs/stat.c
> index 0870e969a8a0..2c0e111a098a 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -725,6 +725,7 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer)
>  	tmp.stx_mnt_id = stat->mnt_id;
>  	tmp.stx_dio_mem_align = stat->dio_mem_align;
>  	tmp.stx_dio_offset_align = stat->dio_offset_align;
> +	tmp.stx_dio_read_offset_align = stat->dio_read_offset_align;
>  	tmp.stx_subvol = stat->subvol;
>  	tmp.stx_atomic_write_unit_min = stat->atomic_write_unit_min;
>  	tmp.stx_atomic_write_unit_max = stat->atomic_write_unit_max;
> diff --git a/include/linux/stat.h b/include/linux/stat.h
> index 3d900c86981c..9d8382e23a9c 100644
> --- a/include/linux/stat.h
> +++ b/include/linux/stat.h
> @@ -52,6 +52,7 @@ struct kstat {
>  	u64		mnt_id;
>  	u32		dio_mem_align;
>  	u32		dio_offset_align;
> +	u32		dio_read_offset_align;
>  	u64		change_cookie;
>  	u64		subvol;
>  	u32		atomic_write_unit_min;
> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> index 8b35d7d511a2..f78ee3670dd5 100644
> --- a/include/uapi/linux/stat.h
> +++ b/include/uapi/linux/stat.h
> @@ -179,7 +179,8 @@ struct statx {
>  	/* Max atomic write segment count */
>  	__u32   stx_atomic_write_segments_max;
>  
> -	__u32   __spare1[1];
> +	/* File offset alignment for direct I/O reads */
> +	__u32	stx_dio_read_offset_align;
>  
>  	/* 0xb8 */
>  	__u64	__spare3[9];	/* Spare space for future expansion */
> @@ -213,6 +214,7 @@ struct statx {
>  #define STATX_MNT_ID_UNIQUE	0x00004000U	/* Want/got extended stx_mount_id */
>  #define STATX_SUBVOL		0x00008000U	/* Want/got stx_subvol */
>  #define STATX_WRITE_ATOMIC	0x00010000U	/* Want/got atomic_write_* fields */
> +#define STATX_DIO_READ_ALIGN	0x00020000U	/* Want/got dio read alignment info */
>  
>  #define STATX__RESERVED		0x80000000U	/* Reserved for future struct statx expansion */
>  
> -- 
> 2.45.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] fs: add STATX_DIO_READ_ALIGN
  2025-01-06 16:32   ` Jan Kara
@ 2025-01-06 16:40     ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2025-01-06 16:40 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Alexander Viro, Christian Brauner,
	Chandan Babu R, Darrick J. Wong, Hongbo Li, Ryusuke Konishi,
	linux-nilfs, linux-fsdevel, linux-xfs

On Mon, Jan 06, 2025 at 05:32:14PM +0100, Jan Kara wrote:
> > Add a separate dio read align field, as many out of place write
> > file systems can easily do reads aligned to the device sector size,
> > but require bigger alignment for writes.
> > 
> > This is usually papered over by falling back to buffered I/O for smaller
> > writes and doing read-modify-write cycles, but performance for this
> > sucks, so applications benefit from knowing the actual write alignment.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> 
> So if I understand right dio_offset_align is guaranteed to work for all DIO
> (i.e., maximum of all possible alignments), dio_read_offset_align is
> possibly lower and works only for reads.

Yes.  If you think this needs to be made more clear I'm open to
suggestions to improve the wording.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] xfs: cleanup xfs_vn_getattr
  2025-01-06 15:15 ` [PATCH 3/4] xfs: cleanup xfs_vn_getattr Christoph Hellwig
@ 2025-01-06 17:07   ` John Garry
  0 siblings, 0 replies; 19+ messages in thread
From: John Garry @ 2025-01-06 17:07 UTC (permalink / raw)
  To: Christoph Hellwig, Alexander Viro, Christian Brauner
  Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
	Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs

On 06/01/2025 15:15, Christoph Hellwig wrote:
> Split the two bits of optional statx reporting into their own helpers
> so that they are self-contained instead of deeply indented in the main
> getattr handler.
> 
> Signed-off-by: Christoph Hellwig<hch@lst.de>
> Reviewed-by: "Darrick J. Wong"<djwong@kernel.org>
> ---

Feel free to add:
Reviewed-by: John Garry <john.g.garry@oracle.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] xfs: report the correct read/write dio alignment for reflinked inodes
  2025-01-06 15:15 ` [PATCH 4/4] xfs: report the correct read/write dio alignment for reflinked inodes Christoph Hellwig
@ 2025-01-06 17:33   ` Darrick J. Wong
  2025-01-06 18:37   ` John Garry
  1 sibling, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2025-01-06 17:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Chandan Babu R,
	Hongbo Li, Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs

On Mon, Jan 06, 2025 at 04:15:57PM +0100, Christoph Hellwig wrote:
> For I/O to reflinked blocks we always need to write an entire new
> file system block, and the code enforces the file system block alignment
> for the entire file if it has any reflinked blocks.
> 
> Use the new STATX_DIO_READ_ALIGN flag to report the asymmetric read
> vs write alignments for reflinked files.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_iops.c | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 6b0228a21617..053d05f5567d 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -580,9 +580,27 @@ xfs_report_dioalign(
>  	struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
>  	struct block_device	*bdev = target->bt_bdev;
>  
> -	stat->result_mask |= STATX_DIOALIGN;
> +	stat->result_mask |= STATX_DIOALIGN | STATX_DIO_READ_ALIGN;
>  	stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
> -	stat->dio_offset_align = bdev_logical_block_size(bdev);
> +	stat->dio_read_offset_align = bdev_logical_block_size(bdev);
> +
> +	/*
> +	 * On COW inodes we are forced to always rewrite an entire file system
> +	 * block or RT extent.
> +	 *
> +	 * Because applications assume they can do sector sized direct writes
> +	 * on XFS we fall back to buffered I/O for sub-block direct I/O in that
> +	 * case.  Because that needs to copy the entire block into the buffer
> +	 * cache it is highly inefficient and can easily lead to page cache
> +	 * invalidation races.
> +	 *
> +	 * Tell applications to avoid this case by reporting the natively
> +	 * supported direct I/O read alignment.
> +	 */
> +	if (xfs_is_cow_inode(ip))
> +		stat->dio_offset_align = xfs_inode_alloc_unitsize(ip);
> +	else
> +		stat->dio_offset_align = stat->dio_read_offset_align;

This is a good improvement!
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

>  }
>  
>  static void
> @@ -658,7 +676,7 @@ xfs_vn_getattr(
>  		stat->rdev = inode->i_rdev;
>  		break;
>  	case S_IFREG:
> -		if (request_mask & STATX_DIOALIGN)
> +		if (request_mask & (STATX_DIOALIGN | STATX_DIO_READ_ALIGN))
>  			xfs_report_dioalign(ip, stat);
>  		if (request_mask & STATX_WRITE_ATOMIC)
>  			xfs_report_atomic_write(ip, stat);
> -- 
> 2.45.2
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] statx.2: document STATX_DIO_READ_ALIGN
  2025-01-06 15:19 ` [PATCH] statx.2: document STATX_DIO_READ_ALIGN Christoph Hellwig
@ 2025-01-06 17:40   ` Darrick J. Wong
  2025-01-06 18:09     ` Christoph Hellwig
  2025-01-06 22:01   ` Alejandro Colomar
  1 sibling, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2025-01-06 17:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Chandan Babu R,
	Hongbo Li, Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs,
	linux-man

On Mon, Jan 06, 2025 at 04:19:38PM +0100, Christoph Hellwig wrote:
> Document the new STATX_DIO_READ_ALIGN flag and the new
> stx_dio_read_offset_align field guarded by it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  man/man2/statx.2 | 27 ++++++++++++++++++++++++++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/man/man2/statx.2 b/man/man2/statx.2
> index c5b5a28ec2f1..378bf363d93f 100644
> --- a/man/man2/statx.2
> +++ b/man/man2/statx.2
> @@ -76,6 +76,9 @@ struct statx {
>      __u32 stx_atomic_write_unit_min;
>      __u32 stx_atomic_write_unit_max;
>      __u32 stx_atomic_write_segments_max;
> +
> +    /* File offset alignment for direct I/O reads */
> +    __u32   stx_dio_read_offset_align;
>  };
>  .EE
>  .in
> @@ -261,7 +264,7 @@ STATX_BTIME	Want stx_btime
>  STATX_ALL	The same as STATX_BASIC_STATS | STATX_BTIME.
>  	It is deprecated and should not be used.
>  STATX_MNT_ID	Want stx_mnt_id (since Linux 5.8)
> -STATX_DIOALIGN	Want stx_dio_mem_align and stx_dio_offset_align
> +STATX_DIOALIGN	Want stx_dio_mem_align and stx_dio_offset_align.
>  	(since Linux 6.1; support varies by filesystem)
>  STATX_MNT_ID_UNIQUE	Want unique stx_mnt_id (since Linux 6.8)
>  STATX_SUBVOL	Want stx_subvol
> @@ -270,6 +273,8 @@ STATX_WRITE_ATOMIC	Want stx_atomic_write_unit_min,
>  	stx_atomic_write_unit_max,
>  	and stx_atomic_write_segments_max.
>  	(since Linux 6.11; support varies by filesystem)
> +STATX_DIO_READ_ALIGN	Want stx_dio_read_offset_align.
> +	(since Linux 6.14; support varies by filesystem)
>  .TE
>  .in
>  .P
> @@ -467,6 +472,26 @@ This will only be nonzero if
>  .I stx_dio_mem_align
>  is nonzero, and vice versa.
>  .TP
> +.I stx_dio_read_offset_align
> +The alignment (in bytes) required for file offsets and I/O segment lengths for
> +direct I/O reads
> +.RB ( O_DIRECT )
> +on this file.  If zero the limit in

manpage nit: new sentences should start on a new line.

> +.I
> +stx_dio_offset_align
> +applies for reads as well.  If non-zero this value must be

Here too.

> +smaller than
> +.I
> +stx_dio_offset_align
> +which must be provided by the file system.

I can't imagine a filesystem where dio_read_offset > dio_offset makes
sense, but why do we need to put that in the manpage?

vs. "If non-zero, the filesystem must also provide stx_dio_offset_align."

> +This value does not affect the memory alignent in

                                         alignment

> +.I stx_dio_mem_align .
> +.IP
> +.B STATX_DIO_READ_ALIGN
> +.I ( stx_dio_offset_align )
> +support by filesystem;
> +it is supported by xfs since Linux 6.14.

Aside from those bits, this looks good to me.

--D

> +.TP
>  .I stx_subvol
>  Subvolume number of the current file.
>  .IP
> -- 
> 2.45.2
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] statx.2: document STATX_DIO_READ_ALIGN
  2025-01-06 17:40   ` Darrick J. Wong
@ 2025-01-06 18:09     ` Christoph Hellwig
  2025-01-06 19:09       ` Darrick J. Wong
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2025-01-06 18:09 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Alexander Viro, Christian Brauner, Jan Kara,
	Chandan Babu R, Hongbo Li, Ryusuke Konishi, linux-nilfs,
	linux-fsdevel, linux-xfs, linux-man

On Mon, Jan 06, 2025 at 09:40:07AM -0800, Darrick J. Wong wrote:
> > +stx_dio_offset_align
> > +which must be provided by the file system.
> 
> I can't imagine a filesystem where dio_read_offset > dio_offset makes
> sense, but why do we need to put that in the manpage?

Well, to be backwards compatible to older userspace the value put into
stx_dio_offset_align also needs to work for reads.  Given that there
were questions about this in the RFC round I thought I'd mention it.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] xfs: report the correct read/write dio alignment for reflinked inodes
  2025-01-06 15:15 ` [PATCH 4/4] xfs: report the correct read/write dio alignment for reflinked inodes Christoph Hellwig
  2025-01-06 17:33   ` Darrick J. Wong
@ 2025-01-06 18:37   ` John Garry
  2025-01-07  6:10     ` Christoph Hellwig
  1 sibling, 1 reply; 19+ messages in thread
From: John Garry @ 2025-01-06 18:37 UTC (permalink / raw)
  To: Christoph Hellwig, Alexander Viro, Christian Brauner
  Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
	Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs

On 06/01/2025 15:15, Christoph Hellwig wrote:
> For I/O to reflinked blocks we always need to write an entire new
> file system block, and the code enforces the file system block alignment
> for the entire file if it has any reflinked blocks.
> 
> Use the new STATX_DIO_READ_ALIGN flag to report the asymmetric read
> vs write alignments for reflinked files.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/xfs/xfs_iops.c | 24 +++++++++++++++++++++---
>   1 file changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 6b0228a21617..053d05f5567d 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -580,9 +580,27 @@ xfs_report_dioalign(
>   	struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
>   	struct block_device	*bdev = target->bt_bdev;
>   
> -	stat->result_mask |= STATX_DIOALIGN;
> +	stat->result_mask |= STATX_DIOALIGN | STATX_DIO_READ_ALIGN;
>   	stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
> -	stat->dio_offset_align = bdev_logical_block_size(bdev);
> +	stat->dio_read_offset_align = bdev_logical_block_size(bdev);
> +
> +	/*
> +	 * On COW inodes we are forced to always rewrite an entire file system
> +	 * block or RT extent.
> +	 *
> +	 * Because applications assume they can do sector sized direct writes
> +	 * on XFS we fall back to buffered I/O for sub-block direct I/O in that
> +	 * case.  Because that needs to copy the entire block into the buffer
> +	 * cache it is highly inefficient and can easily lead to page cache
> +	 * invalidation races.
> +	 *
> +	 * Tell applications to avoid this case by reporting the natively
> +	 * supported direct I/O read alignment.

Maybe I mis-read the complete comment, but did you really mean "natively 
supported direct I/O write alignment"? You have been talking about 
writes only, but then finally mention read alignment.

> +	 */
> +	if (xfs_is_cow_inode(ip))
> +		stat->dio_offset_align = xfs_inode_alloc_unitsize(ip);
> +	else
> +		stat->dio_offset_align = stat->dio_read_offset_align;
>   }
>   
>   static void
> @@ -658,7 +676,7 @@ xfs_vn_getattr(
>   		stat->rdev = inode->i_rdev;
>   		break;
>   	case S_IFREG:
> -		if (request_mask & STATX_DIOALIGN)
> +		if (request_mask & (STATX_DIOALIGN | STATX_DIO_READ_ALIGN))
>   			xfs_report_dioalign(ip, stat);
>   		if (request_mask & STATX_WRITE_ATOMIC)
>   			xfs_report_atomic_write(ip, stat);


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] statx.2: document STATX_DIO_READ_ALIGN
  2025-01-06 18:09     ` Christoph Hellwig
@ 2025-01-06 19:09       ` Darrick J. Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2025-01-06 19:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Chandan Babu R,
	Hongbo Li, Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs,
	linux-man

On Mon, Jan 06, 2025 at 07:09:58PM +0100, Christoph Hellwig wrote:
> On Mon, Jan 06, 2025 at 09:40:07AM -0800, Darrick J. Wong wrote:
> > > +stx_dio_offset_align
> > > +which must be provided by the file system.
> > 
> > I can't imagine a filesystem where dio_read_offset > dio_offset makes
> > sense, but why do we need to put that in the manpage?
> 
> Well, to be backwards compatible to older userspace the value put into
> stx_dio_offset_align also needs to work for reads.  Given that there
> were questions about this in the RFC round I thought I'd mention it.

Ah ok then. :)

With the other formatting nits fixed,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] statx.2: document STATX_DIO_READ_ALIGN
  2025-01-06 15:19 ` [PATCH] statx.2: document STATX_DIO_READ_ALIGN Christoph Hellwig
  2025-01-06 17:40   ` Darrick J. Wong
@ 2025-01-06 22:01   ` Alejandro Colomar
  1 sibling, 0 replies; 19+ messages in thread
From: Alejandro Colomar @ 2025-01-06 22:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Chandan Babu R,
	Darrick J. Wong, Hongbo Li, Ryusuke Konishi, linux-nilfs,
	linux-fsdevel, linux-xfs, linux-man

[-- Attachment #1: Type: text/plain, Size: 3375 bytes --]

Hi Christoph,

On Mon, Jan 06, 2025 at 04:19:38PM +0100, Christoph Hellwig wrote:
> Document the new STATX_DIO_READ_ALIGN flag and the new
> stx_dio_read_offset_align field guarded by it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Thanks for the patch!  Please see some minor comments below.

Have a lovely night!
Alex

> ---
>  man/man2/statx.2 | 27 ++++++++++++++++++++++++++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/man/man2/statx.2 b/man/man2/statx.2
> index c5b5a28ec2f1..378bf363d93f 100644
> --- a/man/man2/statx.2
> +++ b/man/man2/statx.2
> @@ -76,6 +76,9 @@ struct statx {
>      __u32 stx_atomic_write_unit_min;
>      __u32 stx_atomic_write_unit_max;
>      __u32 stx_atomic_write_segments_max;
> +
> +    /* File offset alignment for direct I/O reads */
> +    __u32   stx_dio_read_offset_align;
>  };
>  .EE
>  .in
> @@ -261,7 +264,7 @@ STATX_BTIME	Want stx_btime
>  STATX_ALL	The same as STATX_BASIC_STATS | STATX_BTIME.
>  	It is deprecated and should not be used.
>  STATX_MNT_ID	Want stx_mnt_id (since Linux 5.8)
> -STATX_DIOALIGN	Want stx_dio_mem_align and stx_dio_offset_align
> +STATX_DIOALIGN	Want stx_dio_mem_align and stx_dio_offset_align.
>  	(since Linux 6.1; support varies by filesystem)
>  STATX_MNT_ID_UNIQUE	Want unique stx_mnt_id (since Linux 6.8)
>  STATX_SUBVOL	Want stx_subvol
> @@ -270,6 +273,8 @@ STATX_WRITE_ATOMIC	Want stx_atomic_write_unit_min,
>  	stx_atomic_write_unit_max,
>  	and stx_atomic_write_segments_max.
>  	(since Linux 6.11; support varies by filesystem)
> +STATX_DIO_READ_ALIGN	Want stx_dio_read_offset_align.
> +	(since Linux 6.14; support varies by filesystem)
>  .TE
>  .in
>  .P
> @@ -467,6 +472,26 @@ This will only be nonzero if
>  .I stx_dio_mem_align
>  is nonzero, and vice versa.
>  .TP
> +.I stx_dio_read_offset_align
> +The alignment (in bytes) required for file offsets and I/O segment lengths for
> +direct I/O reads
> +.RB ( O_DIRECT )
> +on this file.  If zero the limit in

Please write poems, not prose.  :)

In other words, new sentence, new line.  See man-pages(7).

$ MANWIDTH=72 man man-pages | sed -n '/Use semantic newlines/,/^$/p'
   Use semantic newlines
     In the source of a manual page, new sentences should be started on
     new lines, long sentences should be split  into  lines  at  clause
     breaks  (commas,  semicolons, colons, and so on), and long clauses
     should be split at phrase boundaries.  This convention,  sometimes
     known as "semantic newlines", makes it easier to see the effect of
     patches, which often operate at the level of individual sentences,
     clauses, or phrases.

> +.I
> +stx_dio_offset_align

We put the italics word in the same line as the .I.

> +applies for reads as well.  If non-zero this value must be
> +smaller than
> +.I
> +stx_dio_offset_align
> +which must be provided by the file system.
> +This value does not affect the memory alignent in
> +.I stx_dio_mem_align .
> +.IP
> +.B STATX_DIO_READ_ALIGN
> +.I ( stx_dio_offset_align )

You probably meant .RI (roman-italics alternating).

> +support by filesystem;
> +it is supported by xfs since Linux 6.14.
> +.TP
>  .I stx_subvol
>  Subvolume number of the current file.
>  .IP
> -- 
> 2.45.2
> 
> 

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] xfs: report the correct read/write dio alignment for reflinked inodes
  2025-01-06 18:37   ` John Garry
@ 2025-01-07  6:10     ` Christoph Hellwig
  2025-01-07  7:03       ` Darrick J. Wong
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2025-01-07  6:10 UTC (permalink / raw)
  To: John Garry
  Cc: Christoph Hellwig, Alexander Viro, Christian Brauner, Jan Kara,
	Chandan Babu R, Darrick J. Wong, Hongbo Li, Ryusuke Konishi,
	linux-nilfs, linux-fsdevel, linux-xfs

On Mon, Jan 06, 2025 at 06:37:06PM +0000, John Garry wrote:
>> +	/*
>> +	 * On COW inodes we are forced to always rewrite an entire file system
>> +	 * block or RT extent.
>> +	 *
>> +	 * Because applications assume they can do sector sized direct writes
>> +	 * on XFS we fall back to buffered I/O for sub-block direct I/O in that
>> +	 * case.  Because that needs to copy the entire block into the buffer
>> +	 * cache it is highly inefficient and can easily lead to page cache
>> +	 * invalidation races.
>> +	 *
>> +	 * Tell applications to avoid this case by reporting the natively
>> +	 * supported direct I/O read alignment.
>
> Maybe I mis-read the complete comment, but did you really mean "natively 
> supported direct I/O write alignment"? You have been talking about writes 
> only, but then finally mention read alignment.

No, this is indeed intended to talk about the different (smaller) read
alignment we are now reporting.  But I guess the wording is confusing
enough that I should improve it?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] xfs: report the correct read/write dio alignment for reflinked inodes
  2025-01-07  6:10     ` Christoph Hellwig
@ 2025-01-07  7:03       ` Darrick J. Wong
  2025-01-07  9:00         ` John Garry
  0 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2025-01-07  7:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: John Garry, Alexander Viro, Christian Brauner, Jan Kara,
	Chandan Babu R, Hongbo Li, Ryusuke Konishi, linux-nilfs,
	linux-fsdevel, linux-xfs

On Tue, Jan 07, 2025 at 07:10:12AM +0100, Christoph Hellwig wrote:
> On Mon, Jan 06, 2025 at 06:37:06PM +0000, John Garry wrote:
> >> +	/*
> >> +	 * On COW inodes we are forced to always rewrite an entire file system
> >> +	 * block or RT extent.
> >> +	 *
> >> +	 * Because applications assume they can do sector sized direct writes
> >> +	 * on XFS we fall back to buffered I/O for sub-block direct I/O in that
> >> +	 * case.  Because that needs to copy the entire block into the buffer
> >> +	 * cache it is highly inefficient and can easily lead to page cache
> >> +	 * invalidation races.
> >> +	 *
> >> +	 * Tell applications to avoid this case by reporting the natively
> >> +	 * supported direct I/O read alignment.
> >
> > Maybe I mis-read the complete comment, but did you really mean "natively 
> > supported direct I/O write alignment"? You have been talking about writes 
> > only, but then finally mention read alignment.
> 
> No, this is indeed intended to talk about the different (smaller) read
> alignment we are now reporting.  But I guess the wording is confusing
> enough that I should improve it?

How about:

/*
 * For COW inodes, we can only perform out of place writes of entire
 * file allocation units (clusters).  For a sub-cluster directio write,
 * we must fall back to buffered I/O to perform the RMW.  At best this
 * is highly inefficient; at worst it leads to page cache invalidation
 * races.  Tell applications to avoid this by reporting separately the
 * read and (larger) write alignments.
 */

--D

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] xfs: report the correct read/write dio alignment for reflinked inodes
  2025-01-07  7:03       ` Darrick J. Wong
@ 2025-01-07  9:00         ` John Garry
  0 siblings, 0 replies; 19+ messages in thread
From: John Garry @ 2025-01-07  9:00 UTC (permalink / raw)
  To: Darrick J. Wong, Christoph Hellwig
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Chandan Babu R,
	Hongbo Li, Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs

On 07/01/2025 07:03, Darrick J. Wong wrote:
> On Tue, Jan 07, 2025 at 07:10:12AM +0100, Christoph Hellwig wrote:
>> On Mon, Jan 06, 2025 at 06:37:06PM +0000, John Garry wrote:
>>>> +	/*
>>>> +	 * On COW inodes we are forced to always rewrite an entire file system
>>>> +	 * block or RT extent.
>>>> +	 *
>>>> +	 * Because applications assume they can do sector sized direct writes
>>>> +	 * on XFS we fall back to buffered I/O for sub-block direct I/O in that
>>>> +	 * case.  Because that needs to copy the entire block into the buffer
>>>> +	 * cache it is highly inefficient and can easily lead to page cache
>>>> +	 * invalidation races.
>>>> +	 *
>>>> +	 * Tell applications to avoid this case by reporting the natively
>>>> +	 * supported direct I/O read alignment.
>>>
>>> Maybe I mis-read the complete comment, but did you really mean "natively
>>> supported direct I/O write alignment"? You have been talking about writes
>>> only, but then finally mention read alignment.
>>
>> No, this is indeed intended to talk about the different (smaller) read
>> alignment we are now reporting.  But I guess the wording is confusing
>> enough that I should improve it?

I know what you are saying, but I found the last paragraph odd. All 
along we talk about writes, but then conclude by mentioning reads (and 
not writes at all).

> 
> How about:
> 
> /*
>   * For COW inodes, we can only perform out of place writes of entire
>   * file allocation units (clusters).  For a sub-cluster directio write,
>   * we must fall back to buffered I/O to perform the RMW.  At best this
>   * is highly inefficient; at worst it leads to page cache invalidation
>   * races.  Tell applications to avoid this by reporting separately the
>   * read and (larger) write alignments.

yeah, that ending is better, but maybe Christoph wants to keep the more 
wordy original previous paragraphs.

>   */
> 
> --D
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-01-07  9:00 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-06 15:15 add STATX_DIO_READ_ALIGN Christoph Hellwig
2025-01-06 15:15 ` [PATCH 1/4] fs: reformat the statx definition Christoph Hellwig
2025-01-06 16:28   ` Jan Kara
2025-01-06 15:15 ` [PATCH 2/4] fs: add STATX_DIO_READ_ALIGN Christoph Hellwig
2025-01-06 16:32   ` Jan Kara
2025-01-06 16:40     ` Christoph Hellwig
2025-01-06 15:15 ` [PATCH 3/4] xfs: cleanup xfs_vn_getattr Christoph Hellwig
2025-01-06 17:07   ` John Garry
2025-01-06 15:15 ` [PATCH 4/4] xfs: report the correct read/write dio alignment for reflinked inodes Christoph Hellwig
2025-01-06 17:33   ` Darrick J. Wong
2025-01-06 18:37   ` John Garry
2025-01-07  6:10     ` Christoph Hellwig
2025-01-07  7:03       ` Darrick J. Wong
2025-01-07  9:00         ` John Garry
2025-01-06 15:19 ` [PATCH] statx.2: document STATX_DIO_READ_ALIGN Christoph Hellwig
2025-01-06 17:40   ` Darrick J. Wong
2025-01-06 18:09     ` Christoph Hellwig
2025-01-06 19:09       ` Darrick J. Wong
2025-01-06 22:01   ` Alejandro Colomar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).