linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag
@ 2025-10-13  9:05 Qu Wenruo
  2025-10-13 11:14 ` Johannes Thumshirn
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Qu Wenruo @ 2025-10-13  9:05 UTC (permalink / raw)
  To: linux-btrfs; +Cc: brauner, djwong, linux-xfs, linux-fsdevel

Btrfs requires all of its bios to be fs block aligned, normally it's
totally fine but with the incoming block size larger than page size
(bs > ps) support, the requirement is no longer met for direct IOs.

Because iomap_dio_bio_iter() calls bio_iov_iter_get_pages(), only
requiring alignment to be bdev_logical_block_size().

In the real world that value is either 512 or 4K, on 4K page sized
systems it means bio_iov_iter_get_pages() can break the bio at any page
boundary, breaking btrfs' requirement for bs > ps cases.

To address this problem, introduce a new public iomap dio flag,
IOMAP_DIO_FSBLOCK_ALIGNED.

When calling __iomap_dio_rw() with that new flag, iomap_dio::flags will
inherit that new flag, and iomap_dio_bio_iter() will take fs block size
into the calculation of the alignment, and pass the alignment to
bio_iov_iter_get_pages(), respecting the fs block size requirement.

The initial user of this flag will be btrfs, which needs to calculate the
checksum for direct read and thus requires the biovec to be fs block
aligned for the incoming bs > ps support.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Changelog:
v2:
- Fix too long lines
  To follow the old 80 chars limits.

- Reword the commit messages a little

- Rename the public IOMAP flag to IOMAP_DIO_FSBLOCK_ALIGNED

- Make the calculation of alignment eaiser to read
  With a short comment explaing why we need to use the larger value of
  bdev and fs block size.

- Remove the btrfs part that utilize the new flag
  Now it's in the enablement patch of btrfs' bs > ps direct IO support.

 fs/iomap/direct-io.c  | 13 ++++++++++++-
 include/linux/iomap.h |  8 ++++++++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 5d5d63efbd57..ce9cbd2bace0 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -336,10 +336,18 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 	int nr_pages, ret = 0;
 	u64 copied = 0;
 	size_t orig_count;
+	unsigned int alignment = bdev_logical_block_size(iomap->bdev);
 
 	if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1))
 		return -EINVAL;
 
+	/*
+	 * Align to the larger one of bdev and fs block size, to meet the
+	 * alignment requirement of both layers.
+	 */
+	if (dio->flags & IOMAP_DIO_FSBLOCK_ALIGNED)
+		alignment = max(alignment, fs_block_size);
+
 	if (dio->flags & IOMAP_DIO_WRITE) {
 		bio_opf |= REQ_OP_WRITE;
 
@@ -434,7 +442,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 		bio->bi_end_io = iomap_dio_bio_end_io;
 
 		ret = bio_iov_iter_get_pages(bio, dio->submit.iter,
-				bdev_logical_block_size(iomap->bdev) - 1);
+					     alignment - 1);
 		if (unlikely(ret)) {
 			/*
 			 * We have to stop part way through an IO. We must fall
@@ -639,6 +647,9 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	if (iocb->ki_flags & IOCB_NOWAIT)
 		iomi.flags |= IOMAP_NOWAIT;
 
+	if (dio_flags & IOMAP_DIO_FSBLOCK_ALIGNED)
+		dio->flags |= IOMAP_DIO_FSBLOCK_ALIGNED;
+
 	if (iov_iter_rw(iter) == READ) {
 		/* reads can always complete inline */
 		dio->flags |= IOMAP_DIO_INLINE_COMP;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 73dceabc21c8..4da13fe24ce8 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -518,6 +518,14 @@ struct iomap_dio_ops {
  */
 #define IOMAP_DIO_PARTIAL		(1 << 2)
 
+/*
+ * Ensure each bio is aligned to fs block size.
+ *
+ * For filesystems which need to calculate/verify the checksum of each fs
+ * block. Otherwise they may not be able to handle unaligned bios.
+ */
+#define IOMAP_DIO_FSBLOCK_ALIGNED	(1 << 3)
+
 ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
 		unsigned int dio_flags, void *private, size_t done_before);
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag
  2025-10-13  9:05 [PATCH v2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag Qu Wenruo
@ 2025-10-13 11:14 ` Johannes Thumshirn
  2025-10-13 20:39   ` Qu Wenruo
  2025-10-14  4:36 ` Christoph Hellwig
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Johannes Thumshirn @ 2025-10-13 11:14 UTC (permalink / raw)
  To: WenRuo Qu, linux-btrfs@vger.kernel.org
  Cc: brauner@kernel.org, djwong@kernel.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org

On 10/13/25 11:05 AM, Qu Wenruo wrote:
> - Remove the btrfs part that utilize the new flag
>    Now it's in the enablement patch of btrfs' bs > ps direct IO support.

But then we potentially have code merged without a user. I think it is 
better to have the btrfs patch in the same series and merged via the 
same tree.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag
  2025-10-13 11:14 ` Johannes Thumshirn
@ 2025-10-13 20:39   ` Qu Wenruo
  0 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2025-10-13 20:39 UTC (permalink / raw)
  To: Johannes Thumshirn, linux-btrfs@vger.kernel.org
  Cc: brauner@kernel.org, djwong@kernel.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org



在 2025/10/13 21:44, Johannes Thumshirn 写道:
> On 10/13/25 11:05 AM, Qu Wenruo wrote:
>> - Remove the btrfs part that utilize the new flag
>>     Now it's in the enablement patch of btrfs' bs > ps direct IO support.
> 
> But then we potentially have code merged without a user. I think it is
> better to have the btrfs patch in the same series and merged via the
> same tree.
> 

It's fine, and in fact that already happened, for the remove_bdev() 
callback.

So I'm more or less fine with that, as long as we have a proper plan to 
add the initial user.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag
  2025-10-13  9:05 [PATCH v2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag Qu Wenruo
  2025-10-13 11:14 ` Johannes Thumshirn
@ 2025-10-14  4:36 ` Christoph Hellwig
  2025-10-14  4:55   ` Qu Wenruo
  2025-10-16 12:10 ` Pankaj Raghav (Samsung)
  2025-10-21 13:03 ` Christian Brauner
  3 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2025-10-14  4:36 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, brauner, djwong, linux-xfs, linux-fsdevel

On Mon, Oct 13, 2025 at 07:35:16PM +1030, Qu Wenruo wrote:
> - Remove the btrfs part that utilize the new flag
>   Now it's in the enablement patch of btrfs' bs > ps direct IO support.

I didn't really suggest removing it, but splitting it into a separate
patch in a two-patch series.  We could probably even move everything
through the btrfs tree for 6.18 to get the fix in.  It's just important
to keep infrastructure and user separate if you have to e.g. revert the
btrfs part for some reason, but we have another user in the meantime.
And I plan to use this for zoned XFS soon.

> +	/*
> +	 * Align to the larger one of bdev and fs block size, to meet the
> +	 * alignment requirement of both layers.
> +	 */
> +	if (dio->flags & IOMAP_DIO_FSBLOCK_ALIGNED)
> +		alignment = max(alignment, fs_block_size);
> +

This looks much nicer, and thanks for the explanation!

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag
  2025-10-14  4:36 ` Christoph Hellwig
@ 2025-10-14  4:55   ` Qu Wenruo
  2025-10-14  4:58     ` Christoph Hellwig
  0 siblings, 1 reply; 11+ messages in thread
From: Qu Wenruo @ 2025-10-14  4:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs, brauner, djwong, linux-xfs, linux-fsdevel



在 2025/10/14 15:06, Christoph Hellwig 写道:
> On Mon, Oct 13, 2025 at 07:35:16PM +1030, Qu Wenruo wrote:
>> - Remove the btrfs part that utilize the new flag
>>    Now it's in the enablement patch of btrfs' bs > ps direct IO support.
> 
> I didn't really suggest removing it, but splitting it into a separate
> patch in a two-patch series.

Oh, sorry, I just kept the btrfs part into the enablement patch, and 
that enablement patch requires quite some other code that is only 
submitted but not yet even merged into btrfs development branch.

Thus I didn't really want to send it right now.

>  We could probably even move everything
> through the btrfs tree for 6.18 to get the fix in.

Unfortunately that may not be that easy. Either we merge it early, 
meaning just this change + using the new flag in btrfs.
But that means it makes no real change at all, as bs > ps direct IO is 
still disabled.

Or we wait for the btrfs sub-block checksum handling patchset merged, 
then with the full bs > ps direct IO enablement.
But that also means we're waiting for some other btrfs patches.
There are already too many btrfs bs > ps patches pending now.


I'd prefer to get this merged through iomap tree, then utilize it in 
btrfs tree later.
Just like what is going on with the remove_bdev() super block callback.
(VFS change is merged in v6.17, but btrfs will only merge it through v6.19)

Thanks,
Qu

>  It's just important
> to keep infrastructure and user separate if you have to e.g. revert the
> btrfs part for some reason, but we have another user in the meantime.
> And I plan to use this for zoned XFS soon.
> 
>> +	/*
>> +	 * Align to the larger one of bdev and fs block size, to meet the
>> +	 * alignment requirement of both layers.
>> +	 */
>> +	if (dio->flags & IOMAP_DIO_FSBLOCK_ALIGNED)
>> +		alignment = max(alignment, fs_block_size);
>> +
> 
> This looks much nicer, and thanks for the explanation!
> 
> Looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag
  2025-10-14  4:55   ` Qu Wenruo
@ 2025-10-14  4:58     ` Christoph Hellwig
  2025-10-14  5:10       ` Qu Wenruo
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2025-10-14  4:58 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, linux-btrfs, brauner, djwong, linux-xfs,
	linux-fsdevel

On Tue, Oct 14, 2025 at 03:25:31PM +1030, Qu Wenruo wrote:
> Unfortunately that may not be that easy. Either we merge it early, meaning
> just this change + using the new flag in btrfs.
> But that means it makes no real change at all, as bs > ps direct IO is still
> disabled.
> 
> Or we wait for the btrfs sub-block checksum handling patchset merged, then
> with the full bs > ps direct IO enablement.
> But that also means we're waiting for some other btrfs patches.
> There are already too many btrfs bs > ps patches pending now.

What's your plan for merging this?  I was going to look into doing a
patch like this to improve the zoned XFS direct I/O handling soon,
so if you aim for 6.19 we need to figure out a way to get it into
the iomap tree and merge that into the xfs and btrfs trees.  If you're
not aiming for 6.19 we could merge the iomap and xfs work through
either the iomap or xfs trees.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag
  2025-10-14  4:58     ` Christoph Hellwig
@ 2025-10-14  5:10       ` Qu Wenruo
  0 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2025-10-14  5:10 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs, brauner, djwong, linux-xfs, linux-fsdevel



在 2025/10/14 15:28, Christoph Hellwig 写道:
> On Tue, Oct 14, 2025 at 03:25:31PM +1030, Qu Wenruo wrote:
>> Unfortunately that may not be that easy. Either we merge it early, meaning
>> just this change + using the new flag in btrfs.
>> But that means it makes no real change at all, as bs > ps direct IO is still
>> disabled.
>>
>> Or we wait for the btrfs sub-block checksum handling patchset merged, then
>> with the full bs > ps direct IO enablement.
>> But that also means we're waiting for some other btrfs patches.
>> There are already too many btrfs bs > ps patches pending now.
> 
> What's your plan for merging this?  I was going to look into doing a
> patch like this to improve the zoned XFS direct I/O handling soon,
> so if you aim for 6.19 we need to figure out a way to get it into
> the iomap tree and merge that into the xfs and btrfs trees.  If you're
> not aiming for 6.19 we could merge the iomap and xfs work through
> either the iomap or xfs trees.
> 

I'm not sure if I can get this patch through btrfs tree in v6.19.
There are too many pending btrfs patches.

So please go ahead through iomap/xfs tree instead.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag
  2025-10-13  9:05 [PATCH v2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag Qu Wenruo
  2025-10-13 11:14 ` Johannes Thumshirn
  2025-10-14  4:36 ` Christoph Hellwig
@ 2025-10-16 12:10 ` Pankaj Raghav (Samsung)
  2025-10-21 13:03 ` Christian Brauner
  3 siblings, 0 replies; 11+ messages in thread
From: Pankaj Raghav (Samsung) @ 2025-10-16 12:10 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, brauner, djwong, linux-xfs, linux-fsdevel

On Mon, Oct 13, 2025 at 07:35:16PM +1030, Qu Wenruo wrote:
> Btrfs requires all of its bios to be fs block aligned, normally it's
> totally fine but with the incoming block size larger than page size
> (bs > ps) support, the requirement is no longer met for direct IOs.
> 
> Because iomap_dio_bio_iter() calls bio_iov_iter_get_pages(), only
> requiring alignment to be bdev_logical_block_size().
> 
> In the real world that value is either 512 or 4K, on 4K page sized
> systems it means bio_iov_iter_get_pages() can break the bio at any page
> boundary, breaking btrfs' requirement for bs > ps cases.
> 
> To address this problem, introduce a new public iomap dio flag,
> IOMAP_DIO_FSBLOCK_ALIGNED.
> 
> When calling __iomap_dio_rw() with that new flag, iomap_dio::flags will
> inherit that new flag, and iomap_dio_bio_iter() will take fs block size
> into the calculation of the alignment, and pass the alignment to
> bio_iov_iter_get_pages(), respecting the fs block size requirement.
> 
> The initial user of this flag will be btrfs, which needs to calculate the
> checksum for direct read and thus requires the biovec to be fs block
> aligned for the incoming bs > ps support.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---

Looks good.
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
-- 
Pankaj

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag
  2025-10-13  9:05 [PATCH v2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag Qu Wenruo
                   ` (2 preceding siblings ...)
  2025-10-16 12:10 ` Pankaj Raghav (Samsung)
@ 2025-10-21 13:03 ` Christian Brauner
  2025-10-22  7:23   ` Christoph Hellwig
  3 siblings, 1 reply; 11+ messages in thread
From: Christian Brauner @ 2025-10-21 13:03 UTC (permalink / raw)
  To: linux-btrfs, Qu Wenruo, Christoph Hellwig
  Cc: Christian Brauner, djwong, linux-xfs, linux-fsdevel

On Mon, 13 Oct 2025 19:35:16 +1030, Qu Wenruo wrote:
> Btrfs requires all of its bios to be fs block aligned, normally it's
> totally fine but with the incoming block size larger than page size
> (bs > ps) support, the requirement is no longer met for direct IOs.
> 
> Because iomap_dio_bio_iter() calls bio_iov_iter_get_pages(), only
> requiring alignment to be bdev_logical_block_size().
> 
> [...]

Applied to the vfs-6.19.iomap branch of the vfs/vfs.git tree.
Patches in the vfs-6.19.iomap branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.19.iomap

[1/1] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag
      https://git.kernel.org/vfs/vfs/c/96a9ee1c896f

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag
  2025-10-21 13:03 ` Christian Brauner
@ 2025-10-22  7:23   ` Christoph Hellwig
  2025-10-29 12:45     ` Christian Brauner
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2025-10-22  7:23 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-btrfs, Qu Wenruo, Christoph Hellwig, djwong, linux-xfs,
	linux-fsdevel

On Tue, Oct 21, 2025 at 03:03:08PM +0200, Christian Brauner wrote:
> On Mon, 13 Oct 2025 19:35:16 +1030, Qu Wenruo wrote:
> > Btrfs requires all of its bios to be fs block aligned, normally it's
> > totally fine but with the incoming block size larger than page size
> > (bs > ps) support, the requirement is no longer met for direct IOs.
> > 
> > Because iomap_dio_bio_iter() calls bio_iov_iter_get_pages(), only
> > requiring alignment to be bdev_logical_block_size().
> > 
> > [...]
> 
> Applied to the vfs-6.19.iomap branch of the vfs/vfs.git tree.
> Patches in the vfs-6.19.iomap branch should appear in linux-next soon.

I'm not seeing it yet.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag
  2025-10-22  7:23   ` Christoph Hellwig
@ 2025-10-29 12:45     ` Christian Brauner
  0 siblings, 0 replies; 11+ messages in thread
From: Christian Brauner @ 2025-10-29 12:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-btrfs, Qu Wenruo, djwong, linux-xfs, linux-fsdevel

On Wed, Oct 22, 2025 at 12:23:00AM -0700, Christoph Hellwig wrote:
> On Tue, Oct 21, 2025 at 03:03:08PM +0200, Christian Brauner wrote:
> > On Mon, 13 Oct 2025 19:35:16 +1030, Qu Wenruo wrote:
> > > Btrfs requires all of its bios to be fs block aligned, normally it's
> > > totally fine but with the incoming block size larger than page size
> > > (bs > ps) support, the requirement is no longer met for direct IOs.
> > > 
> > > Because iomap_dio_bio_iter() calls bio_iov_iter_get_pages(), only
> > > requiring alignment to be bdev_logical_block_size().
> > > 
> > > [...]
> > 
> > Applied to the vfs-6.19.iomap branch of the vfs/vfs.git tree.
> > Patches in the vfs-6.19.iomap branch should appear in linux-next soon.
> 
> I'm not seeing it yet.

Sorry, I got lost in some work. Pushed.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-10-29 12:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-13  9:05 [PATCH v2] iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag Qu Wenruo
2025-10-13 11:14 ` Johannes Thumshirn
2025-10-13 20:39   ` Qu Wenruo
2025-10-14  4:36 ` Christoph Hellwig
2025-10-14  4:55   ` Qu Wenruo
2025-10-14  4:58     ` Christoph Hellwig
2025-10-14  5:10       ` Qu Wenruo
2025-10-16 12:10 ` Pankaj Raghav (Samsung)
2025-10-21 13:03 ` Christian Brauner
2025-10-22  7:23   ` Christoph Hellwig
2025-10-29 12:45     ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).