From: "Darrick J. Wong" <djwong@kernel.org>
To: Eric Biggers <ebiggers@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
linux-f2fs-devel@lists.sourceforge.net,
linux-xfs@vger.kernel.org, linux-api@vger.kernel.org,
linux-fscrypt@vger.kernel.org, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org, Keith Busch <kbusch@kernel.org>
Subject: Re: [RFC PATCH v2 1/7] statx: add I/O alignment information
Date: Thu, 19 May 2022 16:06:05 -0700 [thread overview]
Message-ID: <YobNXbYnhBiqniTH@magnolia> (raw)
In-Reply-To: <20220518235011.153058-2-ebiggers@kernel.org>
On Wed, May 18, 2022 at 04:50:05PM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
>
> Traditionally, the conditions for when DIO (direct I/O) is supported
> were fairly simple: filesystems either supported DIO aligned to the
> block device's logical block size, or didn't support DIO at all.
>
> However, due to filesystem features that have been added over time (e.g,
> data journalling, inline data, encryption, verity, compression,
> checkpoint disabling, log-structured mode), the conditions for when DIO
> is allowed on a file have gotten increasingly complex. Whether a
> particular file supports DIO, and with what alignment, can depend on
> various file attributes and filesystem mount options, as well as which
> block device(s) the file's data is located on.
>
> XFS has an ioctl XFS_IOC_DIOINFO which exposes this information to
> applications. However, as discussed
> (https://lore.kernel.org/linux-fsdevel/20220120071215.123274-1-ebiggers@kernel.org/T/#u),
> this ioctl is rarely used and not known to be used outside of
> XFS-specific code. It also was never intended to indicate when a file
> doesn't support DIO at all, and it only exposes the minimum I/O
> alignment, not the optimal I/O alignment which has been requested too.
>
> Therefore, let's expose this information via statx(). Add the
> STATX_IOALIGN flag and three fields associated with it:
>
> * stx_mem_align_dio: the alignment (in bytes) required for user memory
> buffers for DIO, or 0 if DIO is not supported on the file.
>
> * stx_offset_align_dio: the alignment (in bytes) required for file
> offsets and I/O segment lengths for DIO, or 0 if DIO is not supported
> on the file. This will only be nonzero if stx_mem_align_dio is
> nonzero, and vice versa.
>
> * stx_offset_align_optimal: the alignment (in bytes) suggested for file
> offsets and I/O segment lengths to get optimal performance. This
> applies to both DIO and buffered I/O. It differs from stx_blocksize
> in that stx_offset_align_optimal will contain the real optimum I/O
> size, which may be a large value. In contrast, for compatibility
> reasons stx_blocksize is the minimum size needed to avoid page cache
> read/write/modify cycles, which may be much smaller than the optimum
> I/O size. For more details about the motivation for this field, see
> https://lore.kernel.org/r/20220210040304.GM59729@dread.disaster.area
Hmm. So I guess this is supposed to be the filesystem's best guess at
the IO size that will minimize RMW cycles in the entire stack? i.e. if
the user does not want RMW of pagecache pages, of file allocation units
(if COW is enabled), of RAID stripes, or in the storage itself, then it
should ensure that all IOs are aligned to this value?
I guess that means for XFS it's effectively max(pagesize, i_blocksize,
bdev io_opt, sb_width, and (pretend XFS can reflink the realtime volume)
the rt extent size)? I didn't see a manpage update for statx(2) but
that's mostly what I'm interested in. :)
Looking ahead, it looks like the ext4/f2fs implementations only seem to
be returning max(i_blocksize, bdev io_opt)? But not the pagesize? Did
I misunderstood this, then?
(The plumbing changes in this patch look ok.)
--D
> Note that as with other statx() extensions, if STATX_IOALIGN isn't set
> in the returned statx struct, then these new fields won't be filled in.
> This will happen if the filesystem doesn't support STATX_IOALIGN, or if
> the file isn't a regular file. (It might be supported on block device
> files in the future.) It might also happen if the caller didn't include
> STATX_IOALIGN in the request mask, since statx() isn't required to
> return information that wasn't requested.
>
> This commit adds the VFS-level plumbing for STATX_IOALIGN. Individual
> filesystems will still need to add code to support it.
>
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
> fs/stat.c | 3 +++
> include/linux/stat.h | 3 +++
> include/uapi/linux/stat.h | 9 +++++++--
> 3 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/fs/stat.c b/fs/stat.c
> index 5c2c94464e8b0..9d477218545b8 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -611,6 +611,9 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer)
> tmp.stx_dev_major = MAJOR(stat->dev);
> tmp.stx_dev_minor = MINOR(stat->dev);
> tmp.stx_mnt_id = stat->mnt_id;
> + tmp.stx_mem_align_dio = stat->mem_align_dio;
> + tmp.stx_offset_align_dio = stat->offset_align_dio;
> + tmp.stx_offset_align_optimal = stat->offset_align_optimal;
>
> return copy_to_user(buffer, &tmp, sizeof(tmp)) ? -EFAULT : 0;
> }
> diff --git a/include/linux/stat.h b/include/linux/stat.h
> index 7df06931f25d8..48b8b1ad1567c 100644
> --- a/include/linux/stat.h
> +++ b/include/linux/stat.h
> @@ -50,6 +50,9 @@ struct kstat {
> struct timespec64 btime; /* File creation time */
> u64 blocks;
> u64 mnt_id;
> + u32 mem_align_dio;
> + u32 offset_align_dio;
> + u32 offset_align_optimal;
> };
>
> #endif
> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> index 1500a0f58041a..f822b23e81091 100644
> --- a/include/uapi/linux/stat.h
> +++ b/include/uapi/linux/stat.h
> @@ -124,9 +124,13 @@ struct statx {
> __u32 stx_dev_minor;
> /* 0x90 */
> __u64 stx_mnt_id;
> - __u64 __spare2;
> + __u32 stx_mem_align_dio; /* Memory buffer alignment for direct I/O */
> + __u32 stx_offset_align_dio; /* File offset alignment for direct I/O */
> /* 0xa0 */
> - __u64 __spare3[12]; /* Spare space for future expansion */
> + __u32 stx_offset_align_optimal; /* Optimal file offset alignment for I/O */
> + __u32 __spare2;
> + /* 0xa8 */
> + __u64 __spare3[11]; /* Spare space for future expansion */
> /* 0x100 */
> };
>
> @@ -152,6 +156,7 @@ struct statx {
> #define STATX_BASIC_STATS 0x000007ffU /* The stuff in the normal stat struct */
> #define STATX_BTIME 0x00000800U /* Want/got stx_btime */
> #define STATX_MNT_ID 0x00001000U /* Got stx_mnt_id */
> +#define STATX_IOALIGN 0x00002000U /* Want/got IO alignment info */
>
> #define STATX__RESERVED 0x80000000U /* Reserved for future struct statx expansion */
>
> --
> 2.36.1
>
next prev parent reply other threads:[~2022-05-19 23:06 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-18 23:50 [RFC PATCH v2 0/7] make statx() return I/O alignment information Eric Biggers
2022-05-18 23:50 ` [RFC PATCH v2 1/7] statx: add " Eric Biggers
2022-05-19 7:05 ` Christoph Hellwig
2022-05-19 23:06 ` Darrick J. Wong [this message]
2022-05-20 3:27 ` Dave Chinner
2022-06-14 5:25 ` Eric Biggers
2022-06-15 13:12 ` Christoph Hellwig
2022-06-16 0:04 ` Eric Biggers
2022-06-16 6:07 ` Christoph Hellwig
2022-06-16 6:19 ` Eric Biggers
2022-06-16 6:29 ` Christoph Hellwig
2022-05-20 6:30 ` Eric Biggers
2022-05-20 11:52 ` Christian Brauner
2022-05-27 9:02 ` Florian Weimer
2022-05-27 16:22 ` Darrick J. Wong
2022-05-18 23:50 ` [RFC PATCH v2 2/7] fscrypt: change fscrypt_dio_supported() to prepare for STATX_IOALIGN Eric Biggers
2022-05-18 23:50 ` [RFC PATCH v2 3/7] ext4: support STATX_IOALIGN Eric Biggers
2022-05-18 23:50 ` [RFC PATCH v2 4/7] f2fs: move f2fs_force_buffered_io() into file.c Eric Biggers
2022-05-18 23:50 ` [RFC PATCH v2 5/7] f2fs: don't allow DIO reads but not DIO writes Eric Biggers
2022-05-18 23:50 ` [RFC PATCH v2 6/7] f2fs: simplify f2fs_force_buffered_io() Eric Biggers
2022-05-18 23:50 ` [RFC PATCH v2 7/7] f2fs: support STATX_IOALIGN Eric Biggers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YobNXbYnhBiqniTH@magnolia \
--to=djwong@kernel.org \
--cc=ebiggers@kernel.org \
--cc=kbusch@kernel.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-fscrypt@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).