linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Cc: brauner@kernel.org, djwong@kernel.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: [PATCH RFC] iomap: ensure iomap_dio_bio_iter() only submit bios that are fs block aligned
Date: Tue,  7 Oct 2025 13:40:22 +1030	[thread overview]
Message-ID: <aeed3476f7cff20c59172f790167b5879f5fec87.1759806405.git.wqu@suse.com> (raw)

[ASSERT TRIGGERED]
During my development of btrfs bs > ps support, I hit some read bios
that are submitted from iomap to btrfs, but are not aligned to fs block
size.

In my case the fs block size is 8K, the page size is 4K. The ASSERT()
looks like this:

 assertion failed: IS_ALIGNED(logical, blocksize) && IS_ALIGNED(length, blocksize) && length != 0 :: 0, in fs/btrfs/bio.c:833 (root=256 inode=260 logical=299360256 length=69632)
 ------------[ cut here ]------------
 kernel BUG at fs/btrfs/bio.c:833!
 Oops: invalid opcode: 0000 [#1] SMP
 CPU: 6 UID: 0 PID: 1153 Comm: fsstress Tainted: G            E       6.17.0-rc7-custom+ #291 PREEMPT(full)  be3ff76d2e76a554af2cfea604366d16e719ba97
 Tainted: [E]=UNSIGNED_MODULE
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
 RIP: 0010:btrfs_submit_bbio.cold+0x10c/0x127 [btrfs]
 Call Trace:
  <TASK>
  iomap_dio_bio_iter+0x1d3/0x570
  __iomap_dio_rw+0x547/0x8e0
  iomap_dio_rw+0x12/0x30
  btrfs_direct_read+0x135/0x220 [btrfs 24de5898492ba42c5e58573a6f20bf3c9894c726]
  btrfs_file_read_iter+0x21/0x70 [btrfs 24de5898492ba42c5e58573a6f20bf3c9894c726]
  vfs_read+0x25e/0x380
  ksys_read+0x73/0xe0
  do_syscall_64+0x82/0xae0
  entry_SYSCALL_64_after_hwframe+0x4b/0x53
 Dumping ftrace buffer:
 ---------------------------------
 fsstress-1153      6..... 68530us : iomap_dio_bio_iter: length=81920 nr_pages=20 enter
 fsstress-1153      6..... 68539us : iomap_dio_bio_iter: length=81920 realsize=69632
 fsstress-1153      6..... 68540us : iomap_dio_bio_iter: nr_pages=3 for next
 ---------------------------------

[CAUSE]
The function iomap_dio_bio_iter() is doing the bio assembly and
submission, and it's calling bio_iov_iter_get_pages().

However that function can split the range, and in my case it split the
original 20 pages range into two ranges, with 17 and 3 pages for each.

Then the 17 pages range is passed to btrfs_dio_submit_io(), which later
calls into assert_bbio_alignment() and triggered the ASSERT() on fs
block size check.

This check is critical as btrfs needs to verify the data checksum at
read time and retry other mirrors when necessary.

If a sub-block range is passed in the read-verification and read-repair
functionality is lost.

This is never a problem for btrfs in the past, just because we do not
have the support for bs > ps cases.

And this is also not a problem for fses using iomap, because there is no
data checksum support.

[ENHANCEMENT]
Just follow what bcachefs is doing, check the bio size and revert the bio
to the fs boundary.

If after revert the bio is empty, we have to error out because we can
not fault in enough pages to fill a fs block.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Reason for RFC:

This change is forcing all fses using iomap to revert the iov iter each
time an unaligned range hit, even if the fs doesn't need to (e.g. no
data checksum requirement).

I'm not sure if the cost is acceptable or even necessary.

If the extra cost is not acceptable, I can add a new
iomap_dio_ops::need_strong_alignment() callback so that only btrfs will
do the revert.

 fs/iomap/direct-io.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index b84f6af2eb4c..f08babe7c83f 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -419,6 +419,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 	nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_VECS);
 	do {
 		size_t n;
+		size_t unaligned;
 		if (dio->error) {
 			iov_iter_revert(dio->submit.iter, copied);
 			copied = ret = 0;
@@ -444,9 +445,26 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 			 */
 			bio_put(bio);
 			goto zero_tail;
+
 		}
 
+		/*
+		 * bio_iov_iter_get_pages() can split the ranges at page boundary,
+		 * if the fs has block size > page size and requires checksum,
+		 * such unaligned bio will cause problems.
+		 * Revert back to the fs block boundary.
+		 */
+		unaligned = bio->bi_iter.bi_size & (fs_block_size - 1);
+		bio->bi_iter.bi_size -= unaligned;
+		iov_iter_revert(dio->submit.iter, unaligned);
 		n = bio->bi_iter.bi_size;
+
+		/* Failed to get any aligned range. */
+		if (unlikely(n == 0)) {
+			bio_put(bio);
+			ret = -EFAULT;
+			goto zero_tail;
+		}
 		if (WARN_ON_ONCE((bio_opf & REQ_ATOMIC) && n != length)) {
 			/*
 			 * An atomic write bio must cover the complete length,
-- 
2.50.1


             reply	other threads:[~2025-10-07  3:10 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-07  3:10 Qu Wenruo [this message]
2025-10-07  3:52 ` [PATCH RFC] iomap: ensure iomap_dio_bio_iter() only submit bios that are fs block aligned Matthew Wilcox
2025-10-07  4:05   ` Christoph Hellwig
2025-10-07  4:19     ` Matthew Wilcox
2025-10-07  4:24       ` Christoph Hellwig
2025-10-07  4:26       ` Qu Wenruo
2025-10-07  4:10 ` Christoph Hellwig
2025-10-07  4:29   ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeed3476f7cff20c59172f790167b5879f5fec87.1759806405.git.wqu@suse.com \
    --to=wqu@suse.com \
    --cc=brauner@kernel.org \
    --cc=djwong@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).