* [PATCH 1/3] fs: reformat the statx definition
2024-08-28 5:11 RFC: add STATX_DIO_READ_ALIGN Christoph Hellwig
@ 2024-08-28 5:11 ` Christoph Hellwig
2024-08-28 16:20 ` Darrick J. Wong
2024-08-28 5:11 ` [PATCH 2/3] fs: add STATX_DIO_READ_ALIGN Christoph Hellwig
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2024-08-28 5:11 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner
Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs
The comments after the declaration are becoming rather unreadable with
long enough comments. Move them into lines of their own.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
include/uapi/linux/stat.h | 95 +++++++++++++++++++++++++++++----------
1 file changed, 72 insertions(+), 23 deletions(-)
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index 887a2528644168..8b35d7d511a287 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -98,43 +98,92 @@ struct statx_timestamp {
*/
struct statx {
/* 0x00 */
- __u32 stx_mask; /* What results were written [uncond] */
- __u32 stx_blksize; /* Preferred general I/O size [uncond] */
- __u64 stx_attributes; /* Flags conveying information about the file [uncond] */
+ /* What results were written [uncond] */
+ __u32 stx_mask;
+
+ /* Preferred general I/O size [uncond] */
+ __u32 stx_blksize;
+
+ /* Flags conveying information about the file [uncond] */
+ __u64 stx_attributes;
+
/* 0x10 */
- __u32 stx_nlink; /* Number of hard links */
- __u32 stx_uid; /* User ID of owner */
- __u32 stx_gid; /* Group ID of owner */
- __u16 stx_mode; /* File mode */
+ /* Number of hard links */
+ __u32 stx_nlink;
+
+ /* User ID of owner */
+ __u32 stx_uid;
+
+ /* Group ID of owner */
+ __u32 stx_gid;
+
+ /* File mode */
+ __u16 stx_mode;
__u16 __spare0[1];
+
/* 0x20 */
- __u64 stx_ino; /* Inode number */
- __u64 stx_size; /* File size */
- __u64 stx_blocks; /* Number of 512-byte blocks allocated */
- __u64 stx_attributes_mask; /* Mask to show what's supported in stx_attributes */
+ /* Inode number */
+ __u64 stx_ino;
+
+ /* File size */
+ __u64 stx_size;
+
+ /* Number of 512-byte blocks allocated */
+ __u64 stx_blocks;
+
+ /* Mask to show what's supported in stx_attributes */
+ __u64 stx_attributes_mask;
+
/* 0x40 */
- struct statx_timestamp stx_atime; /* Last access time */
- struct statx_timestamp stx_btime; /* File creation time */
- struct statx_timestamp stx_ctime; /* Last attribute change time */
- struct statx_timestamp stx_mtime; /* Last data modification time */
+ /* Last access time */
+ struct statx_timestamp stx_atime;
+
+ /* File creation time */
+ struct statx_timestamp stx_btime;
+
+ /* Last attribute change time */
+ struct statx_timestamp stx_ctime;
+
+ /* Last data modification time */
+ struct statx_timestamp stx_mtime;
+
/* 0x80 */
- __u32 stx_rdev_major; /* Device ID of special file [if bdev/cdev] */
+ /* Device ID of special file [if bdev/cdev] */
+ __u32 stx_rdev_major;
__u32 stx_rdev_minor;
- __u32 stx_dev_major; /* ID of device containing file [uncond] */
+
+ /* ID of device containing file [uncond] */
+ __u32 stx_dev_major;
__u32 stx_dev_minor;
+
/* 0x90 */
__u64 stx_mnt_id;
- __u32 stx_dio_mem_align; /* Memory buffer alignment for direct I/O */
- __u32 stx_dio_offset_align; /* File offset alignment for direct I/O */
+
+ /* Memory buffer alignment for direct I/O */
+ __u32 stx_dio_mem_align;
+
+ /* File offset alignment for direct I/O */
+ __u32 stx_dio_offset_align;
+
/* 0xa0 */
- __u64 stx_subvol; /* Subvolume identifier */
- __u32 stx_atomic_write_unit_min; /* Min atomic write unit in bytes */
- __u32 stx_atomic_write_unit_max; /* Max atomic write unit in bytes */
+ /* Subvolume identifier */
+ __u64 stx_subvol;
+
+ /* Min atomic write unit in bytes */
+ __u32 stx_atomic_write_unit_min;
+
+ /* Max atomic write unit in bytes */
+ __u32 stx_atomic_write_unit_max;
+
/* 0xb0 */
- __u32 stx_atomic_write_segments_max; /* Max atomic write segment count */
+ /* Max atomic write segment count */
+ __u32 stx_atomic_write_segments_max;
+
__u32 __spare1[1];
+
/* 0xb8 */
__u64 __spare3[9]; /* Spare space for future expansion */
+
/* 0x100 */
};
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH 1/3] fs: reformat the statx definition
2024-08-28 5:11 ` [PATCH 1/3] fs: reformat the statx definition Christoph Hellwig
@ 2024-08-28 16:20 ` Darrick J. Wong
0 siblings, 0 replies; 11+ messages in thread
From: Darrick J. Wong @ 2024-08-28 16:20 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Alexander Viro, Christian Brauner, Jan Kara, Chandan Babu R,
Hongbo Li, Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs
On Wed, Aug 28, 2024 at 08:11:01AM +0300, Christoph Hellwig wrote:
> The comments after the declaration are becoming rather unreadable with
> long enough comments. Move them into lines of their own.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Space for full sentences, what luxury! ;)
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> ---
> include/uapi/linux/stat.h | 95 +++++++++++++++++++++++++++++----------
> 1 file changed, 72 insertions(+), 23 deletions(-)
>
> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> index 887a2528644168..8b35d7d511a287 100644
> --- a/include/uapi/linux/stat.h
> +++ b/include/uapi/linux/stat.h
> @@ -98,43 +98,92 @@ struct statx_timestamp {
> */
> struct statx {
> /* 0x00 */
> - __u32 stx_mask; /* What results were written [uncond] */
> - __u32 stx_blksize; /* Preferred general I/O size [uncond] */
> - __u64 stx_attributes; /* Flags conveying information about the file [uncond] */
> + /* What results were written [uncond] */
> + __u32 stx_mask;
> +
> + /* Preferred general I/O size [uncond] */
> + __u32 stx_blksize;
> +
> + /* Flags conveying information about the file [uncond] */
> + __u64 stx_attributes;
> +
> /* 0x10 */
> - __u32 stx_nlink; /* Number of hard links */
> - __u32 stx_uid; /* User ID of owner */
> - __u32 stx_gid; /* Group ID of owner */
> - __u16 stx_mode; /* File mode */
> + /* Number of hard links */
> + __u32 stx_nlink;
> +
> + /* User ID of owner */
> + __u32 stx_uid;
> +
> + /* Group ID of owner */
> + __u32 stx_gid;
> +
> + /* File mode */
> + __u16 stx_mode;
> __u16 __spare0[1];
> +
> /* 0x20 */
> - __u64 stx_ino; /* Inode number */
> - __u64 stx_size; /* File size */
> - __u64 stx_blocks; /* Number of 512-byte blocks allocated */
> - __u64 stx_attributes_mask; /* Mask to show what's supported in stx_attributes */
> + /* Inode number */
> + __u64 stx_ino;
> +
> + /* File size */
> + __u64 stx_size;
> +
> + /* Number of 512-byte blocks allocated */
> + __u64 stx_blocks;
> +
> + /* Mask to show what's supported in stx_attributes */
> + __u64 stx_attributes_mask;
> +
> /* 0x40 */
> - struct statx_timestamp stx_atime; /* Last access time */
> - struct statx_timestamp stx_btime; /* File creation time */
> - struct statx_timestamp stx_ctime; /* Last attribute change time */
> - struct statx_timestamp stx_mtime; /* Last data modification time */
> + /* Last access time */
> + struct statx_timestamp stx_atime;
> +
> + /* File creation time */
> + struct statx_timestamp stx_btime;
> +
> + /* Last attribute change time */
> + struct statx_timestamp stx_ctime;
> +
> + /* Last data modification time */
> + struct statx_timestamp stx_mtime;
> +
> /* 0x80 */
> - __u32 stx_rdev_major; /* Device ID of special file [if bdev/cdev] */
> + /* Device ID of special file [if bdev/cdev] */
> + __u32 stx_rdev_major;
> __u32 stx_rdev_minor;
> - __u32 stx_dev_major; /* ID of device containing file [uncond] */
> +
> + /* ID of device containing file [uncond] */
> + __u32 stx_dev_major;
> __u32 stx_dev_minor;
> +
> /* 0x90 */
> __u64 stx_mnt_id;
> - __u32 stx_dio_mem_align; /* Memory buffer alignment for direct I/O */
> - __u32 stx_dio_offset_align; /* File offset alignment for direct I/O */
> +
> + /* Memory buffer alignment for direct I/O */
> + __u32 stx_dio_mem_align;
> +
> + /* File offset alignment for direct I/O */
> + __u32 stx_dio_offset_align;
> +
> /* 0xa0 */
> - __u64 stx_subvol; /* Subvolume identifier */
> - __u32 stx_atomic_write_unit_min; /* Min atomic write unit in bytes */
> - __u32 stx_atomic_write_unit_max; /* Max atomic write unit in bytes */
> + /* Subvolume identifier */
> + __u64 stx_subvol;
> +
> + /* Min atomic write unit in bytes */
> + __u32 stx_atomic_write_unit_min;
> +
> + /* Max atomic write unit in bytes */
> + __u32 stx_atomic_write_unit_max;
> +
> /* 0xb0 */
> - __u32 stx_atomic_write_segments_max; /* Max atomic write segment count */
> + /* Max atomic write segment count */
> + __u32 stx_atomic_write_segments_max;
> +
> __u32 __spare1[1];
> +
> /* 0xb8 */
> __u64 __spare3[9]; /* Spare space for future expansion */
> +
> /* 0x100 */
> };
>
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/3] fs: add STATX_DIO_READ_ALIGN
2024-08-28 5:11 RFC: add STATX_DIO_READ_ALIGN Christoph Hellwig
2024-08-28 5:11 ` [PATCH 1/3] fs: reformat the statx definition Christoph Hellwig
@ 2024-08-28 5:11 ` Christoph Hellwig
2024-08-28 16:24 ` Darrick J. Wong
2024-08-28 23:52 ` Eric Biggers
2024-08-28 5:11 ` [PATCH 3/3] xfs: report the correct read/write dio alignment for reflinked inodes Christoph Hellwig
2024-08-28 13:43 ` RFC: add STATX_DIO_READ_ALIGN Christian Brauner
3 siblings, 2 replies; 11+ messages in thread
From: Christoph Hellwig @ 2024-08-28 5:11 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner
Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs
Add a separate dio read align field, as many out of place write
file systems can easily do reads aligned to the device sector size,
but require bigger alignment for writes.
This is usually papered over by falling back to buffered I/O for smaller
writes and doing read-modify-write cycles, but performance for this
sucks, so applications benefit from knowing the actual write alignment.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/stat.c | 1 +
include/linux/stat.h | 1 +
include/uapi/linux/stat.h | 4 +++-
3 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/stat.c b/fs/stat.c
index 89ce1be563108c..044e7ad9f3abc2 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -700,6 +700,7 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer)
tmp.stx_mnt_id = stat->mnt_id;
tmp.stx_dio_mem_align = stat->dio_mem_align;
tmp.stx_dio_offset_align = stat->dio_offset_align;
+ tmp.stx_dio_read_offset_align = stat->dio_read_offset_align;
tmp.stx_subvol = stat->subvol;
tmp.stx_atomic_write_unit_min = stat->atomic_write_unit_min;
tmp.stx_atomic_write_unit_max = stat->atomic_write_unit_max;
diff --git a/include/linux/stat.h b/include/linux/stat.h
index 3d900c86981c5b..9d8382e23a9c69 100644
--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -52,6 +52,7 @@ struct kstat {
u64 mnt_id;
u32 dio_mem_align;
u32 dio_offset_align;
+ u32 dio_read_offset_align;
u64 change_cookie;
u64 subvol;
u32 atomic_write_unit_min;
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index 8b35d7d511a287..f78ee3670dd5d7 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -179,7 +179,8 @@ struct statx {
/* Max atomic write segment count */
__u32 stx_atomic_write_segments_max;
- __u32 __spare1[1];
+ /* File offset alignment for direct I/O reads */
+ __u32 stx_dio_read_offset_align;
/* 0xb8 */
__u64 __spare3[9]; /* Spare space for future expansion */
@@ -213,6 +214,7 @@ struct statx {
#define STATX_MNT_ID_UNIQUE 0x00004000U /* Want/got extended stx_mount_id */
#define STATX_SUBVOL 0x00008000U /* Want/got stx_subvol */
#define STATX_WRITE_ATOMIC 0x00010000U /* Want/got atomic_write_* fields */
+#define STATX_DIO_READ_ALIGN 0x00020000U /* Want/got dio read alignment info */
#define STATX__RESERVED 0x80000000U /* Reserved for future struct statx expansion */
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH 2/3] fs: add STATX_DIO_READ_ALIGN
2024-08-28 5:11 ` [PATCH 2/3] fs: add STATX_DIO_READ_ALIGN Christoph Hellwig
@ 2024-08-28 16:24 ` Darrick J. Wong
2024-08-28 23:52 ` Eric Biggers
1 sibling, 0 replies; 11+ messages in thread
From: Darrick J. Wong @ 2024-08-28 16:24 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Alexander Viro, Christian Brauner, Jan Kara, Chandan Babu R,
Hongbo Li, Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs
On Wed, Aug 28, 2024 at 08:11:02AM +0300, Christoph Hellwig wrote:
> Add a separate dio read align field, as many out of place write
> file systems can easily do reads aligned to the device sector size,
> but require bigger alignment for writes.
>
> This is usually papered over by falling back to buffered I/O for smaller
> writes and doing read-modify-write cycles, but performance for this
> sucks, so applications benefit from knowing the actual write alignment.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks fine to me...
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> ---
> fs/stat.c | 1 +
> include/linux/stat.h | 1 +
> include/uapi/linux/stat.h | 4 +++-
> 3 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/fs/stat.c b/fs/stat.c
> index 89ce1be563108c..044e7ad9f3abc2 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -700,6 +700,7 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer)
> tmp.stx_mnt_id = stat->mnt_id;
> tmp.stx_dio_mem_align = stat->dio_mem_align;
> tmp.stx_dio_offset_align = stat->dio_offset_align;
> + tmp.stx_dio_read_offset_align = stat->dio_read_offset_align;
> tmp.stx_subvol = stat->subvol;
> tmp.stx_atomic_write_unit_min = stat->atomic_write_unit_min;
> tmp.stx_atomic_write_unit_max = stat->atomic_write_unit_max;
> diff --git a/include/linux/stat.h b/include/linux/stat.h
> index 3d900c86981c5b..9d8382e23a9c69 100644
> --- a/include/linux/stat.h
> +++ b/include/linux/stat.h
> @@ -52,6 +52,7 @@ struct kstat {
> u64 mnt_id;
> u32 dio_mem_align;
> u32 dio_offset_align;
> + u32 dio_read_offset_align;
> u64 change_cookie;
> u64 subvol;
> u32 atomic_write_unit_min;
> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> index 8b35d7d511a287..f78ee3670dd5d7 100644
> --- a/include/uapi/linux/stat.h
> +++ b/include/uapi/linux/stat.h
> @@ -179,7 +179,8 @@ struct statx {
> /* Max atomic write segment count */
> __u32 stx_atomic_write_segments_max;
>
> - __u32 __spare1[1];
> + /* File offset alignment for direct I/O reads */
> + __u32 stx_dio_read_offset_align;
>
> /* 0xb8 */
> __u64 __spare3[9]; /* Spare space for future expansion */
> @@ -213,6 +214,7 @@ struct statx {
> #define STATX_MNT_ID_UNIQUE 0x00004000U /* Want/got extended stx_mount_id */
> #define STATX_SUBVOL 0x00008000U /* Want/got stx_subvol */
> #define STATX_WRITE_ATOMIC 0x00010000U /* Want/got atomic_write_* fields */
> +#define STATX_DIO_READ_ALIGN 0x00020000U /* Want/got dio read alignment info */
>
> #define STATX__RESERVED 0x80000000U /* Reserved for future struct statx expansion */
>
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH 2/3] fs: add STATX_DIO_READ_ALIGN
2024-08-28 5:11 ` [PATCH 2/3] fs: add STATX_DIO_READ_ALIGN Christoph Hellwig
2024-08-28 16:24 ` Darrick J. Wong
@ 2024-08-28 23:52 ` Eric Biggers
2024-08-29 3:44 ` Christoph Hellwig
1 sibling, 1 reply; 11+ messages in thread
From: Eric Biggers @ 2024-08-28 23:52 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Alexander Viro, Christian Brauner, Jan Kara, Chandan Babu R,
Darrick J. Wong, Hongbo Li, Ryusuke Konishi, linux-nilfs,
linux-fsdevel, linux-xfs
On Wed, Aug 28, 2024 at 08:11:02AM +0300, Christoph Hellwig wrote:
> Add a separate dio read align field, as many out of place write
> file systems can easily do reads aligned to the device sector size,
> but require bigger alignment for writes.
>
> This is usually papered over by falling back to buffered I/O for smaller
> writes and doing read-modify-write cycles, but performance for this
> sucks, so applications benefit from knowing the actual write alignment.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/stat.c | 1 +
> include/linux/stat.h | 1 +
> include/uapi/linux/stat.h | 4 +++-
> 3 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/fs/stat.c b/fs/stat.c
> index 89ce1be563108c..044e7ad9f3abc2 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -700,6 +700,7 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer)
> tmp.stx_mnt_id = stat->mnt_id;
> tmp.stx_dio_mem_align = stat->dio_mem_align;
> tmp.stx_dio_offset_align = stat->dio_offset_align;
> + tmp.stx_dio_read_offset_align = stat->dio_read_offset_align;
> tmp.stx_subvol = stat->subvol;
> tmp.stx_atomic_write_unit_min = stat->atomic_write_unit_min;
> tmp.stx_atomic_write_unit_max = stat->atomic_write_unit_max;
> diff --git a/include/linux/stat.h b/include/linux/stat.h
> index 3d900c86981c5b..9d8382e23a9c69 100644
> --- a/include/linux/stat.h
> +++ b/include/linux/stat.h
> @@ -52,6 +52,7 @@ struct kstat {
> u64 mnt_id;
> u32 dio_mem_align;
> u32 dio_offset_align;
> + u32 dio_read_offset_align;
> u64 change_cookie;
> u64 subvol;
> u32 atomic_write_unit_min;
> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> index 8b35d7d511a287..f78ee3670dd5d7 100644
> --- a/include/uapi/linux/stat.h
> +++ b/include/uapi/linux/stat.h
> @@ -179,7 +179,8 @@ struct statx {
> /* Max atomic write segment count */
> __u32 stx_atomic_write_segments_max;
>
> - __u32 __spare1[1];
> + /* File offset alignment for direct I/O reads */
> + __u32 stx_dio_read_offset_align;
>
> /* 0xb8 */
> __u64 __spare3[9]; /* Spare space for future expansion */
> @@ -213,6 +214,7 @@ struct statx {
> #define STATX_MNT_ID_UNIQUE 0x00004000U /* Want/got extended stx_mount_id */
> #define STATX_SUBVOL 0x00008000U /* Want/got stx_subvol */
> #define STATX_WRITE_ATOMIC 0x00010000U /* Want/got atomic_write_* fields */
> +#define STATX_DIO_READ_ALIGN 0x00020000U /* Want/got dio read alignment info */
>
> #define STATX__RESERVED 0x80000000U /* Reserved for future struct statx expansion */
Thanks. We maybe should have included read/write separation in STATX_DIOALIGN,
but at the time the only case that was brought up was "DIO reads are supported
but DIO writes are not" which people had argued was not useful.
Is this patch meant to support that case, or just the case where DIO in both
directions is supported but with different alignments? Is that different file
offset alignments, different memory alignments, or both? This patch doesn't add
a stx_dio_read_mem_align field, so it's still assumed that both directions share
the existing stx_dio_mem_align property, including whether DIO is supported at
all (0 vs. nonzero). So as proposed, the only case it helps with is where DIO
in both directions is supported with the same memory alignment but different
file offset alignments. Maybe that is intended, but it's not clear to me.
Are there specific userspace applications that would like to take advantage of a
smaller value of stx_dio_read_offset_align compared to the existing
stx_dio_offset_align? Even in the case of a discrepancy between reads and
writes, applications are supposed to be able to just use stx_dio_offset_align
since it has to be the greater of the two alignments.
And as always, new UAPIs need to be accompanied by documentation. In this case
that's an update to the statx man page.
- Eric
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH 2/3] fs: add STATX_DIO_READ_ALIGN
2024-08-28 23:52 ` Eric Biggers
@ 2024-08-29 3:44 ` Christoph Hellwig
0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2024-08-29 3:44 UTC (permalink / raw)
To: Eric Biggers
Cc: Christoph Hellwig, Alexander Viro, Christian Brauner, Jan Kara,
Chandan Babu R, Darrick J. Wong, Hongbo Li, Ryusuke Konishi,
linux-nilfs, linux-fsdevel, linux-xfs
On Wed, Aug 28, 2024 at 11:52:27PM +0000, Eric Biggers wrote:
> Thanks. We maybe should have included read/write separation in STATX_DIOALIGN,
> but at the time the only case that was brought up was "DIO reads are supported
> but DIO writes are not" which people had argued was not useful.
>
> Is this patch meant to support that case,
Why would anyone support direct I/O reads but not writes? That seems
really weird, but maybe I'm missing something important.
> or just the case where DIO in both
> directions is supported but with different alignments? Is that different file
> offset alignments, different memory alignments, or both? This patch doesn't add
> a stx_dio_read_mem_align field, so it's still assumed that both directions share
> the existing stx_dio_mem_align property, including whether DIO is supported at
> all (0 vs. nonzero).
Yes. The memory alignment really is dependent on the underlying storage
hardware DMA engine, which doesn't distinguish between reads and writes.
> So as proposed, the only case it helps with is where DIO
> in both directions is supported with the same memory alignment but different
> file offset alignments.
Yes.
> Maybe that is intended, but it's not clear to me.
Well, that's good feedback to make it more clear.
> Are there specific userspace applications that would like to take advantage of a
> smaller value of stx_dio_read_offset_align compared to the existing
> stx_dio_offset_align?
There are a lot of read-heavy workloads where smaller reads do make
a difference.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 3/3] xfs: report the correct read/write dio alignment for reflinked inodes
2024-08-28 5:11 RFC: add STATX_DIO_READ_ALIGN Christoph Hellwig
2024-08-28 5:11 ` [PATCH 1/3] fs: reformat the statx definition Christoph Hellwig
2024-08-28 5:11 ` [PATCH 2/3] fs: add STATX_DIO_READ_ALIGN Christoph Hellwig
@ 2024-08-28 5:11 ` Christoph Hellwig
2024-08-28 16:23 ` Darrick J. Wong
2024-08-29 1:13 ` Dave Chinner
2024-08-28 13:43 ` RFC: add STATX_DIO_READ_ALIGN Christian Brauner
3 siblings, 2 replies; 11+ messages in thread
From: Christoph Hellwig @ 2024-08-28 5:11 UTC (permalink / raw)
To: Alexander Viro, Christian Brauner
Cc: Jan Kara, Chandan Babu R, Darrick J. Wong, Hongbo Li,
Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs
For I/O to reflinked blocks we always need to write an entire new
file system block, and the code enforces the file system block alignment
for the entire file if it has any reflinked blocks.
Use the new STATX_DIO_READ_ALIGN flag to report the asymmetric read
vs write alignments for reflinked files.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_iops.c | 37 +++++++++++++++++++++++++++++--------
1 file changed, 29 insertions(+), 8 deletions(-)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 1cdc8034f54d93..de2fc12688dc23 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -570,6 +570,33 @@ xfs_stat_blksize(
return PAGE_SIZE;
}
+static void
+xfs_report_dioalign(
+ struct xfs_inode *ip,
+ struct kstat *stat)
+{
+ struct xfs_buftarg *target = xfs_inode_buftarg(ip);
+ struct block_device *bdev = target->bt_bdev;
+
+ stat->result_mask |= STATX_DIOALIGN | STATX_DIO_READ_ALIGN;
+ stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
+ stat->dio_read_offset_align = bdev_logical_block_size(bdev);
+
+ /*
+ * On COW inodes we are forced to always rewrite an entire file system
+ * block.
+ *
+ * Because applications assume they can do sector sized direct writes
+ * on XFS we provide an emulation by doing a read-modify-write cycle
+ * through the cache, but that is highly inefficient. Thus report the
+ * natively supported size here.
+ */
+ if (xfs_is_cow_inode(ip))
+ stat->dio_offset_align = ip->i_mount->m_sb.sb_blocksize;
+ else
+ stat->dio_offset_align = stat->dio_read_offset_align;
+}
+
STATIC int
xfs_vn_getattr(
struct mnt_idmap *idmap,
@@ -635,14 +662,8 @@ xfs_vn_getattr(
stat->rdev = inode->i_rdev;
break;
case S_IFREG:
- if (request_mask & STATX_DIOALIGN) {
- struct xfs_buftarg *target = xfs_inode_buftarg(ip);
- struct block_device *bdev = target->bt_bdev;
-
- stat->result_mask |= STATX_DIOALIGN;
- stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
- stat->dio_offset_align = bdev_logical_block_size(bdev);
- }
+ if (request_mask & (STATX_DIOALIGN | STATX_DIO_READ_ALIGN))
+ xfs_report_dioalign(ip, stat);
fallthrough;
default:
stat->blksize = xfs_stat_blksize(ip);
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH 3/3] xfs: report the correct read/write dio alignment for reflinked inodes
2024-08-28 5:11 ` [PATCH 3/3] xfs: report the correct read/write dio alignment for reflinked inodes Christoph Hellwig
@ 2024-08-28 16:23 ` Darrick J. Wong
2024-08-29 1:13 ` Dave Chinner
1 sibling, 0 replies; 11+ messages in thread
From: Darrick J. Wong @ 2024-08-28 16:23 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Alexander Viro, Christian Brauner, Jan Kara, Chandan Babu R,
Hongbo Li, Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs
On Wed, Aug 28, 2024 at 08:11:03AM +0300, Christoph Hellwig wrote:
> For I/O to reflinked blocks we always need to write an entire new
> file system block, and the code enforces the file system block alignment
> for the entire file if it has any reflinked blocks.
>
> Use the new STATX_DIO_READ_ALIGN flag to report the asymmetric read
> vs write alignments for reflinked files.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/xfs/xfs_iops.c | 37 +++++++++++++++++++++++++++++--------
> 1 file changed, 29 insertions(+), 8 deletions(-)
>
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 1cdc8034f54d93..de2fc12688dc23 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -570,6 +570,33 @@ xfs_stat_blksize(
> return PAGE_SIZE;
> }
>
> +static void
> +xfs_report_dioalign(
> + struct xfs_inode *ip,
> + struct kstat *stat)
> +{
> + struct xfs_buftarg *target = xfs_inode_buftarg(ip);
> + struct block_device *bdev = target->bt_bdev;
> +
> + stat->result_mask |= STATX_DIOALIGN | STATX_DIO_READ_ALIGN;
> + stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
> + stat->dio_read_offset_align = bdev_logical_block_size(bdev);
> +
> + /*
> + * On COW inodes we are forced to always rewrite an entire file system
> + * block.
> + *
> + * Because applications assume they can do sector sized direct writes
> + * on XFS we provide an emulation by doing a read-modify-write cycle
> + * through the cache, but that is highly inefficient. Thus report the
> + * natively supported size here.
> + */
> + if (xfs_is_cow_inode(ip))
> + stat->dio_offset_align = ip->i_mount->m_sb.sb_blocksize;
xfs_inode_alloc_unitsize(), since we can only cow full allocation units.
(Not necessary today, but we might as well make it work for rtreflink
from the start.)
--D
> + else
> + stat->dio_offset_align = stat->dio_read_offset_align;
> +}
> +
> STATIC int
> xfs_vn_getattr(
> struct mnt_idmap *idmap,
> @@ -635,14 +662,8 @@ xfs_vn_getattr(
> stat->rdev = inode->i_rdev;
> break;
> case S_IFREG:
> - if (request_mask & STATX_DIOALIGN) {
> - struct xfs_buftarg *target = xfs_inode_buftarg(ip);
> - struct block_device *bdev = target->bt_bdev;
> -
> - stat->result_mask |= STATX_DIOALIGN;
> - stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
> - stat->dio_offset_align = bdev_logical_block_size(bdev);
> - }
> + if (request_mask & (STATX_DIOALIGN | STATX_DIO_READ_ALIGN))
> + xfs_report_dioalign(ip, stat);
> fallthrough;
> default:
> stat->blksize = xfs_stat_blksize(ip);
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH 3/3] xfs: report the correct read/write dio alignment for reflinked inodes
2024-08-28 5:11 ` [PATCH 3/3] xfs: report the correct read/write dio alignment for reflinked inodes Christoph Hellwig
2024-08-28 16:23 ` Darrick J. Wong
@ 2024-08-29 1:13 ` Dave Chinner
1 sibling, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2024-08-29 1:13 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Alexander Viro, Christian Brauner, Jan Kara, Chandan Babu R,
Darrick J. Wong, Hongbo Li, Ryusuke Konishi, linux-nilfs,
linux-fsdevel, linux-xfs
On Wed, Aug 28, 2024 at 08:11:03AM +0300, Christoph Hellwig wrote:
> For I/O to reflinked blocks we always need to write an entire new
> file system block, and the code enforces the file system block alignment
> for the entire file if it has any reflinked blocks.
>
> Use the new STATX_DIO_READ_ALIGN flag to report the asymmetric read
> vs write alignments for reflinked files.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/xfs/xfs_iops.c | 37 +++++++++++++++++++++++++++++--------
> 1 file changed, 29 insertions(+), 8 deletions(-)
>
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 1cdc8034f54d93..de2fc12688dc23 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -570,6 +570,33 @@ xfs_stat_blksize(
> return PAGE_SIZE;
> }
>
> +static void
> +xfs_report_dioalign(
> + struct xfs_inode *ip,
> + struct kstat *stat)
> +{
> + struct xfs_buftarg *target = xfs_inode_buftarg(ip);
> + struct block_device *bdev = target->bt_bdev;
> +
> + stat->result_mask |= STATX_DIOALIGN | STATX_DIO_READ_ALIGN;
> + stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
> + stat->dio_read_offset_align = bdev_logical_block_size(bdev);
> +
> + /*
> + * On COW inodes we are forced to always rewrite an entire file system
> + * block.
> + *
> + * Because applications assume they can do sector sized direct writes
> + * on XFS we provide an emulation by doing a read-modify-write cycle
> + * through the cache, but that is highly inefficient. Thus report the
> + * natively supported size here.
> + */
> + if (xfs_is_cow_inode(ip))
> + stat->dio_offset_align = ip->i_mount->m_sb.sb_blocksize;
> + else
> + stat->dio_offset_align = stat->dio_read_offset_align;
It might be worth making it explicitly clear that logical block size
aligned IO for COW operations will still work. I think that's what
you are trying to say, but it took me a while to work out. Perhaps
something like:
/*
* COW operations are inefficient on sub-fsblock aligned
* ranges. They need to copy the entire block, so the
* minimum IO size we will ever do in this case is a single
* filesystem block.
*
* Even though we support sector sized IO on COW inodes, we
* want to help applications avoid the costly RMW cycle it
* requires for COW inodes. Hence report the native
* filesystem allocation unit size here to indicate the
* smallest alignment that will avoid RMW cycles in the DIO
* write path.
*/
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: RFC: add STATX_DIO_READ_ALIGN
2024-08-28 5:11 RFC: add STATX_DIO_READ_ALIGN Christoph Hellwig
` (2 preceding siblings ...)
2024-08-28 5:11 ` [PATCH 3/3] xfs: report the correct read/write dio alignment for reflinked inodes Christoph Hellwig
@ 2024-08-28 13:43 ` Christian Brauner
3 siblings, 0 replies; 11+ messages in thread
From: Christian Brauner @ 2024-08-28 13:43 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Alexander Viro, Jan Kara, Chandan Babu R, Darrick J. Wong,
Hongbo Li, Ryusuke Konishi, linux-nilfs, linux-fsdevel, linux-xfs
On Wed, Aug 28, 2024 at 08:11:00AM GMT, Christoph Hellwig wrote:
> Hi all,
>
> file systems that write out of place usually require different alignment
> for direct I/O writes than what they can do for reads. This series tries
> to address this by yet another statx field.
I think that's fine. If we run out of statx spare fields we can start
versioning by size using via STATX__RESERVED.
^ permalink raw reply [flat|nested] 11+ messages in thread