* alloc misaligned vectors for zoned XFS v2
@ 2025-10-31 13:10 Christoph Hellwig
2025-10-31 13:10 ` [PATCH 1/2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag Christoph Hellwig
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Christoph Hellwig @ 2025-10-31 13:10 UTC (permalink / raw)
To: Christian Brauner, Carlos Maiolino
Cc: Darrick J. Wong, Qu Wenruo, linux-xfs, linux-fsdevel
Hi all,
this series enables the new block layer support for misaligned
individual vectors for zoned XFS.
The first patch is the from Qu and supposedly already applied to
the vfs iomap 6.19 branch, but I can't find it there. The next
two are small fixups for it, and the last one makes use of this
new functionality in XFS.
Note: the first patch replaces the patch of the same name in the
vfs iomap-6.19 branch.
Changes since v1:
- squash the first patches
- trace the new flag (based on a patch from Darrick)
Diffstat:
fs/iomap/direct-io.c | 17 +++++++++++++++--
fs/iomap/trace.h | 7 ++++---
fs/xfs/xfs_file.c | 21 +++++++++++----------
include/linux/iomap.h | 8 ++++++++
4 files changed, 38 insertions(+), 15 deletions(-)
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag
2025-10-31 13:10 alloc misaligned vectors for zoned XFS v2 Christoph Hellwig
@ 2025-10-31 13:10 ` Christoph Hellwig
2025-10-31 15:24 ` Darrick J. Wong
2025-10-31 13:10 ` [PATCH 2/2] xfs: support sub-block aligned vectors in always COW mode Christoph Hellwig
2025-11-05 12:09 ` alloc misaligned vectors for zoned XFS v2 Christian Brauner
2 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2025-10-31 13:10 UTC (permalink / raw)
To: Christian Brauner, Carlos Maiolino
Cc: Darrick J. Wong, Qu Wenruo, linux-xfs, linux-fsdevel,
Pankaj Raghav
From: Qu Wenruo <wqu@suse.com>
Btrfs requires all of its bios to be fs block aligned, normally it's
totally fine but with the incoming block size larger than page size
(bs > ps) support, the requirement is no longer met for direct IOs.
Because iomap_dio_bio_iter() calls bio_iov_iter_get_pages(), only
requiring alignment to be bdev_logical_block_size().
In the real world that value is either 512 or 4K, on 4K page sized
systems it means bio_iov_iter_get_pages() can break the bio at any page
boundary, breaking btrfs' requirement for bs > ps cases.
To address this problem, introduce a new public iomap dio flag,
IOMAP_DIO_FSBLOCK_ALIGNED.
When calling __iomap_dio_rw() with that new flag, iomap_dio::flags will
inherit that new flag, and iomap_dio_bio_iter() will take fs block size
into the calculation of the alignment, and pass the alignment to
bio_iov_iter_get_pages(), respecting the fs block size requirement.
The initial user of this flag will be btrfs, which needs to calculate the
checksum for direct read and thus requires the biovec to be fs block
aligned for the incoming bs > ps support.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
[hch: also align pos/len, incorporate the trace flags from Darrick]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/iomap/direct-io.c | 17 +++++++++++++++--
fs/iomap/trace.h | 7 ++++---
include/linux/iomap.h | 8 ++++++++
3 files changed, 27 insertions(+), 5 deletions(-)
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 5d5d63efbd57..13def8418659 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -336,8 +336,18 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
int nr_pages, ret = 0;
u64 copied = 0;
size_t orig_count;
+ unsigned int alignment;
- if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1))
+ /*
+ * File systems that write out of place and always allocate new blocks
+ * need each bio to be block aligned as that's the unit of allocation.
+ */
+ if (dio->flags & IOMAP_DIO_FSBLOCK_ALIGNED)
+ alignment = fs_block_size;
+ else
+ alignment = bdev_logical_block_size(iomap->bdev);
+
+ if ((pos | length) & (alignment - 1))
return -EINVAL;
if (dio->flags & IOMAP_DIO_WRITE) {
@@ -434,7 +444,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
bio->bi_end_io = iomap_dio_bio_end_io;
ret = bio_iov_iter_get_pages(bio, dio->submit.iter,
- bdev_logical_block_size(iomap->bdev) - 1);
+ alignment - 1);
if (unlikely(ret)) {
/*
* We have to stop part way through an IO. We must fall
@@ -639,6 +649,9 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (iocb->ki_flags & IOCB_NOWAIT)
iomi.flags |= IOMAP_NOWAIT;
+ if (dio_flags & IOMAP_DIO_FSBLOCK_ALIGNED)
+ dio->flags |= IOMAP_DIO_FSBLOCK_ALIGNED;
+
if (iov_iter_rw(iter) == READ) {
/* reads can always complete inline */
dio->flags |= IOMAP_DIO_INLINE_COMP;
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index a61c1dae4742..532787277b16 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -122,9 +122,10 @@ DEFINE_RANGE_EVENT(iomap_zero_iter);
#define IOMAP_DIO_STRINGS \
- {IOMAP_DIO_FORCE_WAIT, "DIO_FORCE_WAIT" }, \
- {IOMAP_DIO_OVERWRITE_ONLY, "DIO_OVERWRITE_ONLY" }, \
- {IOMAP_DIO_PARTIAL, "DIO_PARTIAL" }
+ {IOMAP_DIO_FORCE_WAIT, "DIO_FORCE_WAIT" }, \
+ {IOMAP_DIO_OVERWRITE_ONLY, "DIO_OVERWRITE_ONLY" }, \
+ {IOMAP_DIO_PARTIAL, "DIO_PARTIAL" }, \
+ {IOMAP_DIO_FSBLOCK_ALIGNED, "DIO_FSBLOCK_ALIGNED" }
DECLARE_EVENT_CLASS(iomap_class,
TP_PROTO(struct inode *inode, struct iomap *iomap),
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 73dceabc21c8..4da13fe24ce8 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -518,6 +518,14 @@ struct iomap_dio_ops {
*/
#define IOMAP_DIO_PARTIAL (1 << 2)
+/*
+ * Ensure each bio is aligned to fs block size.
+ *
+ * For filesystems which need to calculate/verify the checksum of each fs
+ * block. Otherwise they may not be able to handle unaligned bios.
+ */
+#define IOMAP_DIO_FSBLOCK_ALIGNED (1 << 3)
+
ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
unsigned int dio_flags, void *private, size_t done_before);
--
2.47.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] xfs: support sub-block aligned vectors in always COW mode
2025-10-31 13:10 alloc misaligned vectors for zoned XFS v2 Christoph Hellwig
2025-10-31 13:10 ` [PATCH 1/2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag Christoph Hellwig
@ 2025-10-31 13:10 ` Christoph Hellwig
2025-11-05 12:09 ` alloc misaligned vectors for zoned XFS v2 Christian Brauner
2 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2025-10-31 13:10 UTC (permalink / raw)
To: Christian Brauner, Carlos Maiolino
Cc: Darrick J. Wong, Qu Wenruo, linux-xfs, linux-fsdevel
Now that the block layer and iomap have grown support to indicate
the bio sector size explicitly instead of assuming the device sector
size, we can ask for logical block size alignment and thus support
direct I/O writes where the overall size is logical block size
aligned, but the boundaries between vectors might not be.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
fs/xfs/xfs_file.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 2702fef2c90c..f2ac4115c18b 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -674,8 +674,17 @@ xfs_file_dio_write_aligned(
struct xfs_zone_alloc_ctx *ac)
{
unsigned int iolock = XFS_IOLOCK_SHARED;
+ unsigned int dio_flags = 0;
ssize_t ret;
+ /*
+ * For always COW inodes, each bio must be aligned to the file system
+ * block size and not just the device sector size because we need to
+ * allocate a block-aligned amount of space for each write.
+ */
+ if (xfs_is_always_cow_inode(ip))
+ dio_flags |= IOMAP_DIO_FSBLOCK_ALIGNED;
+
ret = xfs_ilock_iocb_for_write(iocb, &iolock);
if (ret)
return ret;
@@ -693,7 +702,7 @@ xfs_file_dio_write_aligned(
iolock = XFS_IOLOCK_SHARED;
}
trace_xfs_file_direct_write(iocb, from);
- ret = iomap_dio_rw(iocb, from, ops, dops, 0, ac, 0);
+ ret = iomap_dio_rw(iocb, from, ops, dops, dio_flags, ac, 0);
out_unlock:
xfs_iunlock(ip, iolock);
return ret;
@@ -890,15 +899,7 @@ xfs_file_dio_write(
if ((iocb->ki_pos | count) & target->bt_logical_sectormask)
return -EINVAL;
- /*
- * For always COW inodes we also must check the alignment of each
- * individual iovec segment, as they could end up with different
- * I/Os due to the way bio_iov_iter_get_pages works, and we'd
- * then overwrite an already written block.
- */
- if (((iocb->ki_pos | count) & ip->i_mount->m_blockmask) ||
- (xfs_is_always_cow_inode(ip) &&
- (iov_iter_alignment(from) & ip->i_mount->m_blockmask)))
+ if ((iocb->ki_pos | count) & ip->i_mount->m_blockmask)
return xfs_file_dio_write_unaligned(ip, iocb, from);
if (xfs_is_zoned_inode(ip))
return xfs_file_dio_write_zoned(ip, iocb, from);
--
2.47.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag
2025-10-31 13:10 ` [PATCH 1/2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag Christoph Hellwig
@ 2025-10-31 15:24 ` Darrick J. Wong
0 siblings, 0 replies; 5+ messages in thread
From: Darrick J. Wong @ 2025-10-31 15:24 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christian Brauner, Carlos Maiolino, Qu Wenruo, linux-xfs,
linux-fsdevel, Pankaj Raghav
On Fri, Oct 31, 2025 at 02:10:26PM +0100, Christoph Hellwig wrote:
> From: Qu Wenruo <wqu@suse.com>
>
> Btrfs requires all of its bios to be fs block aligned, normally it's
> totally fine but with the incoming block size larger than page size
> (bs > ps) support, the requirement is no longer met for direct IOs.
>
> Because iomap_dio_bio_iter() calls bio_iov_iter_get_pages(), only
> requiring alignment to be bdev_logical_block_size().
>
> In the real world that value is either 512 or 4K, on 4K page sized
> systems it means bio_iov_iter_get_pages() can break the bio at any page
> boundary, breaking btrfs' requirement for bs > ps cases.
>
> To address this problem, introduce a new public iomap dio flag,
> IOMAP_DIO_FSBLOCK_ALIGNED.
>
> When calling __iomap_dio_rw() with that new flag, iomap_dio::flags will
> inherit that new flag, and iomap_dio_bio_iter() will take fs block size
> into the calculation of the alignment, and pass the alignment to
> bio_iov_iter_get_pages(), respecting the fs block size requirement.
>
> The initial user of this flag will be btrfs, which needs to calculate the
> checksum for direct read and thus requires the biovec to be fs block
> aligned for the incoming bs > ps support.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
> [hch: also align pos/len, incorporate the trace flags from Darrick]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
LGTM
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> ---
> fs/iomap/direct-io.c | 17 +++++++++++++++--
> fs/iomap/trace.h | 7 ++++---
> include/linux/iomap.h | 8 ++++++++
> 3 files changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 5d5d63efbd57..13def8418659 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -336,8 +336,18 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
> int nr_pages, ret = 0;
> u64 copied = 0;
> size_t orig_count;
> + unsigned int alignment;
>
> - if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1))
> + /*
> + * File systems that write out of place and always allocate new blocks
> + * need each bio to be block aligned as that's the unit of allocation.
> + */
> + if (dio->flags & IOMAP_DIO_FSBLOCK_ALIGNED)
> + alignment = fs_block_size;
> + else
> + alignment = bdev_logical_block_size(iomap->bdev);
> +
> + if ((pos | length) & (alignment - 1))
> return -EINVAL;
>
> if (dio->flags & IOMAP_DIO_WRITE) {
> @@ -434,7 +444,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
> bio->bi_end_io = iomap_dio_bio_end_io;
>
> ret = bio_iov_iter_get_pages(bio, dio->submit.iter,
> - bdev_logical_block_size(iomap->bdev) - 1);
> + alignment - 1);
> if (unlikely(ret)) {
> /*
> * We have to stop part way through an IO. We must fall
> @@ -639,6 +649,9 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> if (iocb->ki_flags & IOCB_NOWAIT)
> iomi.flags |= IOMAP_NOWAIT;
>
> + if (dio_flags & IOMAP_DIO_FSBLOCK_ALIGNED)
> + dio->flags |= IOMAP_DIO_FSBLOCK_ALIGNED;
> +
> if (iov_iter_rw(iter) == READ) {
> /* reads can always complete inline */
> dio->flags |= IOMAP_DIO_INLINE_COMP;
> diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
> index a61c1dae4742..532787277b16 100644
> --- a/fs/iomap/trace.h
> +++ b/fs/iomap/trace.h
> @@ -122,9 +122,10 @@ DEFINE_RANGE_EVENT(iomap_zero_iter);
>
>
> #define IOMAP_DIO_STRINGS \
> - {IOMAP_DIO_FORCE_WAIT, "DIO_FORCE_WAIT" }, \
> - {IOMAP_DIO_OVERWRITE_ONLY, "DIO_OVERWRITE_ONLY" }, \
> - {IOMAP_DIO_PARTIAL, "DIO_PARTIAL" }
> + {IOMAP_DIO_FORCE_WAIT, "DIO_FORCE_WAIT" }, \
> + {IOMAP_DIO_OVERWRITE_ONLY, "DIO_OVERWRITE_ONLY" }, \
> + {IOMAP_DIO_PARTIAL, "DIO_PARTIAL" }, \
> + {IOMAP_DIO_FSBLOCK_ALIGNED, "DIO_FSBLOCK_ALIGNED" }
>
> DECLARE_EVENT_CLASS(iomap_class,
> TP_PROTO(struct inode *inode, struct iomap *iomap),
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 73dceabc21c8..4da13fe24ce8 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -518,6 +518,14 @@ struct iomap_dio_ops {
> */
> #define IOMAP_DIO_PARTIAL (1 << 2)
>
> +/*
> + * Ensure each bio is aligned to fs block size.
> + *
> + * For filesystems which need to calculate/verify the checksum of each fs
> + * block. Otherwise they may not be able to handle unaligned bios.
> + */
> +#define IOMAP_DIO_FSBLOCK_ALIGNED (1 << 3)
> +
> ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
> unsigned int dio_flags, void *private, size_t done_before);
> --
> 2.47.3
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: alloc misaligned vectors for zoned XFS v2
2025-10-31 13:10 alloc misaligned vectors for zoned XFS v2 Christoph Hellwig
2025-10-31 13:10 ` [PATCH 1/2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag Christoph Hellwig
2025-10-31 13:10 ` [PATCH 2/2] xfs: support sub-block aligned vectors in always COW mode Christoph Hellwig
@ 2025-11-05 12:09 ` Christian Brauner
2 siblings, 0 replies; 5+ messages in thread
From: Christian Brauner @ 2025-11-05 12:09 UTC (permalink / raw)
To: Carlos Maiolino, Christoph Hellwig
Cc: Christian Brauner, Darrick J. Wong, Qu Wenruo, linux-xfs,
linux-fsdevel
On Fri, 31 Oct 2025 14:10:25 +0100, Christoph Hellwig wrote:
> this series enables the new block layer support for misaligned
> individual vectors for zoned XFS.
>
> The first patch is the from Qu and supposedly already applied to
> the vfs iomap 6.19 branch, but I can't find it there. The next
> two are small fixups for it, and the last one makes use of this
> new functionality in XFS.
>
> [...]
Applied to the vfs-6.19.iomap branch of the vfs/vfs.git tree.
Patches in the vfs-6.19.iomap branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.19.iomap
[1/2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag
https://git.kernel.org/vfs/vfs/c/001397f5ef49
[2/2] xfs: support sub-block aligned vectors in always COW mode
https://git.kernel.org/vfs/vfs/c/8caec6c9fef7
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-11-05 12:10 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-31 13:10 alloc misaligned vectors for zoned XFS v2 Christoph Hellwig
2025-10-31 13:10 ` [PATCH 1/2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag Christoph Hellwig
2025-10-31 15:24 ` Darrick J. Wong
2025-10-31 13:10 ` [PATCH 2/2] xfs: support sub-block aligned vectors in always COW mode Christoph Hellwig
2025-11-05 12:09 ` alloc misaligned vectors for zoned XFS v2 Christian Brauner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).