public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* Hang in generic/648 on zoned btrfs after aa35dd5cbc06 ("iomap: fix invalid folio access after folio_end_read()")
@ 2026-03-09 16:37 Johannes Thumshirn
  2026-03-09 20:56 ` Joanne Koong
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Thumshirn @ 2026-03-09 16:37 UTC (permalink / raw)
  To: Joanne Koong; +Cc: linux-fsdevel@vger.kernel.org

Hi Joanne,

After commit aa35dd5cbc06 ("iomap: fix invalid folio access after 
folio_end_read()") my zoned btrfs test setup hangs. I've bisected it to 
this commit and reverting fixes my problem. The last thing I see in 
dmesg is:

[    9.387175] ------------[ cut here ]------------
[    9.387320] WARNING: fs/iomap/buffered-io.c:487 at 
iomap_read_end+0x11c/0x140, CPU#5: (udev-worker)/463
[    9.387431] Modules linked in:
[    9.387502] CPU: 5 UID: 0 PID: 463 Comm: (udev-worker) Not tainted 
6.19.0-rc1+ #385 PREEMPT(full)
[    9.387626] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.17.0-9.fc43 06/10/2025
[    9.387810] RIP: 0010:iomap_read_end+0x11c/0x140
[    9.387886] Code: 00 48 89 ef 48 3b 04 24 0f 94 04 24 44 0f b6 34 24 
e8 b8 88 69 00 48 89 df 48 83 c4 08 41 0f b6 f6 5b 5d 41 5e e9 54 e7 e8 
ff <0f> 0b e9 48 ff ff ff ba 00 10 00 00 eb 9d 0f 0b e9 53 ff ff ff 0f
[    9.388096] RSP: 0018:ffffc90000e279c0 EFLAGS: 00010206
[    9.388178] RAX: ffff888111c6b800 RBX: ffffea0004363900 RCX: 
0000000000000000
[    9.388281] RDX: 0000000000000000 RSI: 0000000000000400 RDI: 
ffffea0004363900
[    9.388386] RBP: 0000000000000000 R08: 0000000000000000 R09: 
0000000000000001
[    9.388491] R10: ffffc90000e279a0 R11: ffff888111c6c168 R12: 
ffffffff81e5cdb0
[    9.388585] R13: ffffc90000e27c58 R14: ffffea0004363900 R15: 
ffffc90000e27c58
[    9.388690] FS:  00007f6e03fdfc00(0000) GS:ffff8882b4c13000(0000) 
knlGS:0000000000000000
[    9.388793] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    9.388873] CR2: 00007f6e03280000 CR3: 00000001106d7006 CR4: 
0000000000770eb0
[    9.388967] PKRU: 55555554
[    9.389006] Call Trace:
[    9.389044]  <TASK>
[    9.389087]  iomap_readahead+0x23c/0x2e0
[    9.389167]  blkdev_readahead+0x3d/0x50
[    9.389222]  read_pages+0x56/0x200
[    9.389277]  ? __folio_batch_add_and_move+0x1cf/0x2d0
[    9.389354]  page_cache_ra_unbounded+0x1db/0x2c0
[    9.389423]  force_page_cache_ra+0x96/0xb0
[    9.389470]  filemap_get_pages+0x12f/0x490
[    9.389532]  filemap_read+0xed/0x400
[    9.389590]  ? lock_acquire+0xd5/0x2b0
[    9.389633]  ? blkdev_read_iter+0x6b/0x180
[    9.389678]  ? lock_acquire+0xe5/0x2b0
[    9.389720]  ? lock_is_held_type+0xcd/0x130
[    9.389761]  ? find_held_lock+0x2b/0x80
[    9.389813]  ? lock_acquired+0x1e9/0x3c0
[    9.389864]  blkdev_read_iter+0x79/0x180
[    9.389911]  ? local_clock_noinstr+0x17/0x110
[    9.389975]  vfs_read+0x240/0x340
[    9.390033]  ksys_read+0x61/0xd0
[    9.390083]  do_syscall_64+0x74/0x3a0
[    9.390143]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    9.390204] RIP: 0033:0x7f6e048c5c5e
[    9.390255] Code: 4d 89 d8 e8 34 bd 00 00 4c 8b 5d f8 41 8b 93 08 03 
00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 
05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f3 0f 1e fa
[    9.390447] RSP: 002b:00007ffdf50cded0 EFLAGS: 00000202 ORIG_RAX: 
0000000000000000
[    9.390529] RAX: ffffffffffffffda RBX: 000000013aa55200 RCX: 
00007f6e048c5c5e
[    9.390619] RDX: 0000000000000200 RSI: 00007f6e0327f000 RDI: 
0000000000000014
[    9.390704] RBP: 00007ffdf50cdee0 R08: 0000000000000000 R09: 
0000000000000000
[    9.390789] R10: 0000000000000000 R11: 0000000000000202 R12: 
0000000000000000
[    9.390866] R13: 000055d5f99fa270 R14: 000055d5f96e5cb0 R15: 
000055d5f96e5cc8
[    9.390967]  </TASK>
[    9.390997] irq event stamp: 54269
[    9.391044] hardirqs last  enabled at (54279): [<ffffffff8138ac02>] 
__up_console_sem+0x52/0x60
[    9.391141] hardirqs last disabled at (54288): [<ffffffff8138abe7>] 
__up_console_sem+0x37/0x60
[    9.391240] softirqs last  enabled at (53796): [<ffffffff81304768>] 
irq_exit_rcu+0x78/0x110
[    9.391327] softirqs last disabled at (53787): [<ffffffff81304768>] 
irq_exit_rcu+0x78/0x110
[    9.391420] ---[ end trace 0000000000000000 ]---

I haven't debugged this further yet, maybe you have an idea what 
could've caused it. On my side it's trivial to reproduce, so if you 
can't reproduce it just yell.

Byte,

     Johannes


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hang in generic/648 on zoned btrfs after aa35dd5cbc06 ("iomap: fix invalid folio access after folio_end_read()")
  2026-03-09 16:37 Hang in generic/648 on zoned btrfs after aa35dd5cbc06 ("iomap: fix invalid folio access after folio_end_read()") Johannes Thumshirn
@ 2026-03-09 20:56 ` Joanne Koong
  2026-03-10 11:44   ` Johannes Thumshirn
  0 siblings, 1 reply; 9+ messages in thread
From: Joanne Koong @ 2026-03-09 20:56 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: linux-fsdevel@vger.kernel.org

On Mon, Mar 9, 2026 at 9:37 AM Johannes Thumshirn
<Johannes.Thumshirn@wdc.com> wrote:
>
> Hi Joanne,
>
> After commit aa35dd5cbc06 ("iomap: fix invalid folio access after
> folio_end_read()") my zoned btrfs test setup hangs. I've bisected it to
> this commit and reverting fixes my problem. The last thing I see in
> dmesg is:
>
> [    9.387175] ------------[ cut here ]------------
> [    9.387320] WARNING: fs/iomap/buffered-io.c:487 at
> iomap_read_end+0x11c/0x140, CPU#5: (udev-worker)/463
> [    9.387431] Modules linked in:
> [    9.387502] CPU: 5 UID: 0 PID: 463 Comm: (udev-worker) Not tainted
> 6.19.0-rc1+ #385 PREEMPT(full)
> [    9.387626] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.17.0-9.fc43 06/10/2025
> [    9.387810] RIP: 0010:iomap_read_end+0x11c/0x140
> [    9.387886] Code: 00 48 89 ef 48 3b 04 24 0f 94 04 24 44 0f b6 34 24
> e8 b8 88 69 00 48 89 df 48 83 c4 08 41 0f b6 f6 5b 5d 41 5e e9 54 e7 e8
> ff <0f> 0b e9 48 ff ff ff ba 00 10 00 00 eb 9d 0f 0b e9 53 ff ff ff 0f
> [    9.388096] RSP: 0018:ffffc90000e279c0 EFLAGS: 00010206
> [    9.388178] RAX: ffff888111c6b800 RBX: ffffea0004363900 RCX:
> 0000000000000000
> [    9.388281] RDX: 0000000000000000 RSI: 0000000000000400 RDI:
> ffffea0004363900
> [    9.388386] RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000001
> [    9.388491] R10: ffffc90000e279a0 R11: ffff888111c6c168 R12:
> ffffffff81e5cdb0
> [    9.388585] R13: ffffc90000e27c58 R14: ffffea0004363900 R15:
> ffffc90000e27c58
> [    9.388690] FS:  00007f6e03fdfc00(0000) GS:ffff8882b4c13000(0000)
> knlGS:0000000000000000
> [    9.388793] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    9.388873] CR2: 00007f6e03280000 CR3: 00000001106d7006 CR4:
> 0000000000770eb0
> [    9.388967] PKRU: 55555554
> [    9.389006] Call Trace:
> [    9.389044]  <TASK>
> [    9.389087]  iomap_readahead+0x23c/0x2e0
> [    9.389167]  blkdev_readahead+0x3d/0x50
> [    9.389222]  read_pages+0x56/0x200
> [    9.389277]  ? __folio_batch_add_and_move+0x1cf/0x2d0
> [    9.389354]  page_cache_ra_unbounded+0x1db/0x2c0
> [    9.389423]  force_page_cache_ra+0x96/0xb0
> [    9.389470]  filemap_get_pages+0x12f/0x490
> [    9.389532]  filemap_read+0xed/0x400
> [    9.389590]  ? lock_acquire+0xd5/0x2b0
> [    9.389633]  ? blkdev_read_iter+0x6b/0x180
> [    9.389678]  ? lock_acquire+0xe5/0x2b0
> [    9.389720]  ? lock_is_held_type+0xcd/0x130
> [    9.389761]  ? find_held_lock+0x2b/0x80
> [    9.389813]  ? lock_acquired+0x1e9/0x3c0
> [    9.389864]  blkdev_read_iter+0x79/0x180
> [    9.389911]  ? local_clock_noinstr+0x17/0x110
> [    9.389975]  vfs_read+0x240/0x340
> [    9.390033]  ksys_read+0x61/0xd0
> [    9.390083]  do_syscall_64+0x74/0x3a0
> [    9.390143]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [    9.390204] RIP: 0033:0x7f6e048c5c5e
> [    9.390255] Code: 4d 89 d8 e8 34 bd 00 00 4c 8b 5d f8 41 8b 93 08 03
> 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f
> 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f3 0f 1e fa
> [    9.390447] RSP: 002b:00007ffdf50cded0 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000000
> [    9.390529] RAX: ffffffffffffffda RBX: 000000013aa55200 RCX:
> 00007f6e048c5c5e
> [    9.390619] RDX: 0000000000000200 RSI: 00007f6e0327f000 RDI:
> 0000000000000014
> [    9.390704] RBP: 00007ffdf50cdee0 R08: 0000000000000000 R09:
> 0000000000000000
> [    9.390789] R10: 0000000000000000 R11: 0000000000000202 R12:
> 0000000000000000
> [    9.390866] R13: 000055d5f99fa270 R14: 000055d5f96e5cb0 R15:
> 000055d5f96e5cc8
> [    9.390967]  </TASK>
> [    9.390997] irq event stamp: 54269
> [    9.391044] hardirqs last  enabled at (54279): [<ffffffff8138ac02>]
> __up_console_sem+0x52/0x60
> [    9.391141] hardirqs last disabled at (54288): [<ffffffff8138abe7>]
> __up_console_sem+0x37/0x60
> [    9.391240] softirqs last  enabled at (53796): [<ffffffff81304768>]
> irq_exit_rcu+0x78/0x110
> [    9.391327] softirqs last disabled at (53787): [<ffffffff81304768>]
> irq_exit_rcu+0x78/0x110
> [    9.391420] ---[ end trace 0000000000000000 ]---
>
> I haven't debugged this further yet, maybe you have an idea what
> could've caused it. On my side it's trivial to reproduce, so if you
> can't reproduce it just yell.

Hi Johannes,

Thanks for your report and for bisecting this.

A few questions:
1.  From the stack trace it looks like this is happening during a
block device read triggered by udev-worker's device probing. Does this
trigger for you consistently when the zoned device is probed or only
when running the btrfs xfstests generic/648?

2.  I tried to repro your setup by running:
sudo modprobe null_blk nr_devices=2 zoned=1 zone_size=256
zone_nr_conv=8 memory_backed=1
sudo mkdir -p /mnt/test && sudo mkdir -p /mnt/scratch
sudo mkfs.btrfs -f /dev/nullb0
sudo mount /dev/nullb0 /mnt/test
sudo ./check generic/648

with local.config set to:
TEST_DEV=/dev/nullb0
TEST_DIR=/mnt/test
SCRATCH_DEV=/dev/nullb1
SCRATCH_MNT=/mnt/scratch
export FSTYP=btrfs

but I’m not seeing the hang or the WARNING in dmesg show up.  I am
running it with PREEMPT(full) enabled. Does this match your setup or
are you doing something differently? Are you using null_blk or an
actual zoned device? If you are using null_blk, what module parameters
are you using?

3. If you're able to repro this consistently, would you be able to add
these lines right above the WARN_ON on line 487 and sharing what it
prints out?

+++ b/fs/iomap/buffered-io.c
@@ -484,6 +484,17 @@ static void iomap_read_end(struct folio *folio,
size_t bytes_submitted)
                 * to the IO helper, in which case we are responsible for
                 * unlocking the folio here.
                 */
+               if (bytes_submitted) {
+                       struct inode *inode = folio->mapping->host;
+                       struct block_device *bdev = inode->i_sb->s_bdev;
+
+                       pr_warn("bytes_submitted=%zu folio_size=%zu
blkbits=%u isize=%lld "
+                               "logical_bs=%u physical_bs=%u\n",
+                               bytes_submitted, folio_size(folio),
inode->i_blkbits,
+                               i_size_read(inode),
+                               bdev ? bdev_logical_block_size(bdev) : 0,
+                               bdev ? bdev_physical_block_size(bdev) : 0);
+               }
                WARN_ON_ONCE(bytes_submitted);
                folio_unlock(folio);
        }

Thanks,
Joanne
>
> Byte,
>
>      Johannes
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hang in generic/648 on zoned btrfs after aa35dd5cbc06 ("iomap: fix invalid folio access after folio_end_read()")
  2026-03-09 20:56 ` Joanne Koong
@ 2026-03-10 11:44   ` Johannes Thumshirn
  2026-03-10 21:08     ` Joanne Koong
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Thumshirn @ 2026-03-10 11:44 UTC (permalink / raw)
  To: Joanne Koong; +Cc: linux-fsdevel@vger.kernel.org

On 3/9/26 9:56 PM, Joanne Koong wrote:
> A few questions:
> 1.  From the stack trace it looks like this is happening during a
> block device read triggered by udev-worker's device probing. Does this
> trigger for you consistently when the zoned device is probed or only
> when running the btrfs xfstests generic/648?

Its only on g/648. I've also tried zoned XFS but g/648 didn't like my 
XFS and skipped the test case.


> 2.  I tried to repro your setup by running:
> sudo modprobe null_blk nr_devices=2 zoned=1 zone_size=256
> zone_nr_conv=8 memory_backed=1
> sudo mkdir -p /mnt/test && sudo mkdir -p /mnt/scratch
> sudo mkfs.btrfs -f /dev/nullb0
> sudo mount /dev/nullb0 /mnt/test
> sudo ./check generic/648
>
> with local.config set to:
> TEST_DEV=/dev/nullb0
> TEST_DIR=/mnt/test
> SCRATCH_DEV=/dev/nullb1
> SCRATCH_MNT=/mnt/scratch
> export FSTYP=btrfs
>
> but I’m not seeing the hang or the WARNING in dmesg show up.  I am
> running it with PREEMPT(full) enabled. Does this match your setup or
> are you doing something differently? Are you using null_blk or an
> actual zoned device? If you are using null_blk, what module parameters
> are you using?

I'm using a zloop device passed to a VM via virtio-blk. The zloop 
configuration is just:

echo "add id=0,zone_size_mb=256,conv_zones=4" > /dev/zloop-control

echo "add id=1,zone_size_mb=256,conv_zones=4" > /dev/zloop-control

The virtio-blk conifg is:

-drive driver=host_device,file=/dev/zloop0,if=virtio,cache.direct=on \

-drive driver=host_device,file=/dev/zloop1,if=virtio,cache.direct=on


The rootfs is on virtio-fs (using virtme-ng).

>
> 3. If you're able to repro this consistently, would you be able to add
> these lines right above the WARN_ON on line 487 and sharing what it
> prints out?
>
> +++ b/fs/iomap/buffered-io.c
> @@ -484,6 +484,17 @@ static void iomap_read_end(struct folio *folio,
> size_t bytes_submitted)
>                   * to the IO helper, in which case we are responsible for
>                   * unlocking the folio here.
>                   */
> +               if (bytes_submitted) {
> +                       struct inode *inode = folio->mapping->host;
> +                       struct block_device *bdev = inode->i_sb->s_bdev;
> +
> +                       pr_warn("bytes_submitted=%zu folio_size=%zu
> blkbits=%u isize=%lld "
> +                               "logical_bs=%u physical_bs=%u\n",
> +                               bytes_submitted, folio_size(folio),
> inode->i_blkbits,
> +                               i_size_read(inode),
> +                               bdev ? bdev_logical_block_size(bdev) : 0,
> +                               bdev ? bdev_physical_block_size(bdev) : 0);
> +               }
>                  WARN_ON_ONCE(bytes_submitted);
>                  folio_unlock(folio);
>          }

Here we go:

[   17.872952] bytes_submitted=1024 folio_size=4096 blkbits=12 
isize=5278880768 logical_bs=0 physical_bs=0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hang in generic/648 on zoned btrfs after aa35dd5cbc06 ("iomap: fix invalid folio access after folio_end_read()")
  2026-03-10 11:44   ` Johannes Thumshirn
@ 2026-03-10 21:08     ` Joanne Koong
  2026-03-10 21:55       ` Joanne Koong
  2026-03-10 21:59       ` Darrick J. Wong
  0 siblings, 2 replies; 9+ messages in thread
From: Joanne Koong @ 2026-03-10 21:08 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: linux-fsdevel@vger.kernel.org

On Tue, Mar 10, 2026 at 4:44 AM Johannes Thumshirn
<Johannes.Thumshirn@wdc.com> wrote:
>
>
> >
> > 3. If you're able to repro this consistently, would you be able to add
> > these lines right above the WARN_ON on line 487 and sharing what it
> > prints out?
> >
> > +++ b/fs/iomap/buffered-io.c
> > @@ -484,6 +484,17 @@ static void iomap_read_end(struct folio *folio,
> > size_t bytes_submitted)
> >                   * to the IO helper, in which case we are responsible for
> >                   * unlocking the folio here.
> >                   */
> > +               if (bytes_submitted) {
> > +                       struct inode *inode = folio->mapping->host;
> > +                       struct block_device *bdev = inode->i_sb->s_bdev;
> > +
> > +                       pr_warn("bytes_submitted=%zu folio_size=%zu
> > blkbits=%u isize=%lld "
> > +                               "logical_bs=%u physical_bs=%u\n",
> > +                               bytes_submitted, folio_size(folio),
> > inode->i_blkbits,
> > +                               i_size_read(inode),
> > +                               bdev ? bdev_logical_block_size(bdev) : 0,
> > +                               bdev ? bdev_physical_block_size(bdev) : 0);
> > +               }
> >                  WARN_ON_ONCE(bytes_submitted);
> >                  folio_unlock(folio);
> >          }
>
> Here we go:
>
> [   17.872952] bytes_submitted=1024 folio_size=4096 blkbits=12
> isize=5278880768 logical_bs=0 physical_bs=0

Thank you, this is very helpful.

Is it correct that the block device's inode->i_blkbits doesn't reflect
its actual logical block size (512) or is that itself a bug somewhere
in the block layer? On the iomap side, it uses inode->i_blkbits to
determine whether or not an ifs should be attached and what logic to
correspondingly call. For this case, shouldn't the inode's i_blkbits
be 9 since "isize= 5278880768" indicates it's 512-byte aligned, not
4096-byte aligned? I'm not familiar with the block layer, so your
thoughts on this would be appreciated.

If it is fine for the block device's inode->i_blkbits to be a
different value from the logical block size, then I think the iomap
side needs this fix:

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 867e8ac761c8..03a97361570f 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -543,7 +543,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
                         * helper, then the helper owns the folio and will end
                         * the read on it.
                         */
-                       if (*bytes_submitted == folio_len)
+                       if (*bytes_submitted == folio_len || !ifs)
                                ctx->cur_folio = NULL;
                }

Could you verify that this fixes the hang?

Thanks,
Joanne
>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Hang in generic/648 on zoned btrfs after aa35dd5cbc06 ("iomap: fix invalid folio access after folio_end_read()")
  2026-03-10 21:08     ` Joanne Koong
@ 2026-03-10 21:55       ` Joanne Koong
  2026-03-17  9:34         ` Johannes Thumshirn
  2026-03-10 21:59       ` Darrick J. Wong
  1 sibling, 1 reply; 9+ messages in thread
From: Joanne Koong @ 2026-03-10 21:55 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: linux-fsdevel@vger.kernel.org

On Tue, Mar 10, 2026 at 2:08 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Tue, Mar 10, 2026 at 4:44 AM Johannes Thumshirn
> <Johannes.Thumshirn@wdc.com> wrote:
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 867e8ac761c8..03a97361570f 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -543,7 +543,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
>                          * helper, then the helper owns the folio and will end
>                          * the read on it.
>                          */
> -                       if (*bytes_submitted == folio_len)
> +                       if (*bytes_submitted == folio_len || !ifs)
>                                 ctx->cur_folio = NULL;
>                 }

it should be this instead:

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 867e8ac761c8..b803f518adaf 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -497,6 +497,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
        loff_t length = iomap_length(iter);
        struct folio *folio = ctx->cur_folio;
        size_t folio_len = folio_size(folio);
+       struct iomap_folio_state *ifs;
        size_t poff, plen;
        loff_t pos_diff;
        int ret;
@@ -508,7 +509,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
                return iomap_iter_advance(iter, length);
        }

-       ifs_alloc(iter->inode, folio, iter->flags);
+       ifs = ifs_alloc(iter->inode, folio, iter->flags);

        length = min_t(loff_t, length, folio_len - offset_in_folio(folio, pos));
        while (length) {
@@ -543,7 +544,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
                         * helper, then the helper owns the folio and will end
                         * the read on it.
                         */
-                       if (*bytes_submitted == folio_len)
+                       if (*bytes_submitted == folio_len || !ifs)
                                ctx->cur_folio = NULL;
                }

Thanks,
Joanne

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Hang in generic/648 on zoned btrfs after aa35dd5cbc06 ("iomap: fix invalid folio access after folio_end_read()")
  2026-03-10 21:08     ` Joanne Koong
  2026-03-10 21:55       ` Joanne Koong
@ 2026-03-10 21:59       ` Darrick J. Wong
  2026-03-17 19:47         ` Joanne Koong
  1 sibling, 1 reply; 9+ messages in thread
From: Darrick J. Wong @ 2026-03-10 21:59 UTC (permalink / raw)
  To: Joanne Koong; +Cc: Johannes Thumshirn, linux-fsdevel@vger.kernel.org

On Tue, Mar 10, 2026 at 02:08:28PM -0700, Joanne Koong wrote:
> On Tue, Mar 10, 2026 at 4:44 AM Johannes Thumshirn
> <Johannes.Thumshirn@wdc.com> wrote:
> >
> >
> > >
> > > 3. If you're able to repro this consistently, would you be able to add
> > > these lines right above the WARN_ON on line 487 and sharing what it
> > > prints out?
> > >
> > > +++ b/fs/iomap/buffered-io.c
> > > @@ -484,6 +484,17 @@ static void iomap_read_end(struct folio *folio,
> > > size_t bytes_submitted)
> > >                   * to the IO helper, in which case we are responsible for
> > >                   * unlocking the folio here.
> > >                   */
> > > +               if (bytes_submitted) {
> > > +                       struct inode *inode = folio->mapping->host;
> > > +                       struct block_device *bdev = inode->i_sb->s_bdev;
> > > +
> > > +                       pr_warn("bytes_submitted=%zu folio_size=%zu
> > > blkbits=%u isize=%lld "
> > > +                               "logical_bs=%u physical_bs=%u\n",
> > > +                               bytes_submitted, folio_size(folio),
> > > inode->i_blkbits,
> > > +                               i_size_read(inode),
> > > +                               bdev ? bdev_logical_block_size(bdev) : 0,
> > > +                               bdev ? bdev_physical_block_size(bdev) : 0);
> > > +               }
> > >                  WARN_ON_ONCE(bytes_submitted);
> > >                  folio_unlock(folio);
> > >          }
> >
> > Here we go:
> >
> > [   17.872952] bytes_submitted=1024 folio_size=4096 blkbits=12
> > isize=5278880768 logical_bs=0 physical_bs=0
> 
> Thank you, this is very helpful.
> 
> Is it correct that the block device's inode->i_blkbits doesn't reflect
> its actual logical block size (512) or is that itself a bug somewhere
> in the block layer?

For a bdev, i_blkbits is only loosely tied to the storage device's block
size.  i_blkbits can't be smaller than log2(lba_size) and it can't be
larger than PAGE_SHIFT (or the max folio order if LBS is present).

Typically it starts out matching PAGE_SIZE, but a program that opens the
block device can call BLKBSZSET to set the block size.  This is usually
done by userspace filesystem tools to improve performance and/or ensure
that the kernel actually supports that block size.

Note that kernel filesystem drivers also call sb_set_blocksize to set
the block size.

The bdev block size seemingly resets to 4k after closing.

> On the iomap side, it uses inode->i_blkbits to
> determine whether or not an ifs should be attached and what logic to
> correspondingly call. For this case, shouldn't the inode's i_blkbits
> be 9 since "isize= 5278880768" indicates it's 512-byte aligned, not
> 4096-byte aligned? I'm not familiar with the block layer, so your
> thoughts on this would be appreciated.

<shrug> Regular files can have a larger i_blkbits than the underlying
storage device, so I don't see why bdevs would be different?

--D

> If it is fine for the block device's inode->i_blkbits to be a
> different value from the logical block size, then I think the iomap
> side needs this fix:
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 867e8ac761c8..03a97361570f 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -543,7 +543,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
>                          * helper, then the helper owns the folio and will end
>                          * the read on it.
>                          */
> -                       if (*bytes_submitted == folio_len)
> +                       if (*bytes_submitted == folio_len || !ifs)
>                                 ctx->cur_folio = NULL;
>                 }
> 
> Could you verify that this fixes the hang?
> 
> Thanks,
> Joanne
> >
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hang in generic/648 on zoned btrfs after aa35dd5cbc06 ("iomap: fix invalid folio access after folio_end_read()")
  2026-03-10 21:55       ` Joanne Koong
@ 2026-03-17  9:34         ` Johannes Thumshirn
  2026-03-17 19:48           ` Joanne Koong
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Thumshirn @ 2026-03-17  9:34 UTC (permalink / raw)
  To: joannelkoong; +Cc: linux-fsdevel@vger.kernel.org

On 3/10/26 10:55 PM, Joanne Koong wrote:
> On Tue, Mar 10, 2026 at 2:08 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>> On Tue, Mar 10, 2026 at 4:44 AM Johannes Thumshirn
>> <Johannes.Thumshirn@wdc.com> wrote:
>>
>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>> index 867e8ac761c8..03a97361570f 100644
>> --- a/fs/iomap/buffered-io.c
>> +++ b/fs/iomap/buffered-io.c
>> @@ -543,7 +543,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
>>                           * helper, then the helper owns the folio and will end
>>                           * the read on it.
>>                           */
>> -                       if (*bytes_submitted == folio_len)
>> +                       if (*bytes_submitted == folio_len || !ifs)
>>                                  ctx->cur_folio = NULL;
>>                  }
> it should be this instead:
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 867e8ac761c8..b803f518adaf 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -497,6 +497,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
>          loff_t length = iomap_length(iter);
>          struct folio *folio = ctx->cur_folio;
>          size_t folio_len = folio_size(folio);
> +       struct iomap_folio_state *ifs;
>          size_t poff, plen;
>          loff_t pos_diff;
>          int ret;
> @@ -508,7 +509,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
>                  return iomap_iter_advance(iter, length);
>          }
>
> -       ifs_alloc(iter->inode, folio, iter->flags);
> +       ifs = ifs_alloc(iter->inode, folio, iter->flags);
>
>          length = min_t(loff_t, length, folio_len - offset_in_folio(folio, pos));
>          while (length) {
> @@ -543,7 +544,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
>                           * helper, then the helper owns the folio and will end
>                           * the read on it.
>                           */
> -                       if (*bytes_submitted == folio_len)
> +                       if (*bytes_submitted == folio_len || !ifs)
>                                  ctx->cur_folio = NULL;
>                  }
>
> Thanks,
> Joanne
>
This version is making the test pass again,

Reported-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Thanks a lot for this!


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hang in generic/648 on zoned btrfs after aa35dd5cbc06 ("iomap: fix invalid folio access after folio_end_read()")
  2026-03-10 21:59       ` Darrick J. Wong
@ 2026-03-17 19:47         ` Joanne Koong
  0 siblings, 0 replies; 9+ messages in thread
From: Joanne Koong @ 2026-03-17 19:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Johannes Thumshirn, linux-fsdevel@vger.kernel.org

On Tue, Mar 10, 2026 at 2:59 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Tue, Mar 10, 2026 at 02:08:28PM -0700, Joanne Koong wrote:
> > On Tue, Mar 10, 2026 at 4:44 AM Johannes Thumshirn
> > <Johannes.Thumshirn@wdc.com> wrote:
> > >
> > >
> > > >
> > > > 3. If you're able to repro this consistently, would you be able to add
> > > > these lines right above the WARN_ON on line 487 and sharing what it
> > > > prints out?
> > > >
> > > > +++ b/fs/iomap/buffered-io.c
> > > > @@ -484,6 +484,17 @@ static void iomap_read_end(struct folio *folio,
> > > > size_t bytes_submitted)
> > > >                   * to the IO helper, in which case we are responsible for
> > > >                   * unlocking the folio here.
> > > >                   */
> > > > +               if (bytes_submitted) {
> > > > +                       struct inode *inode = folio->mapping->host;
> > > > +                       struct block_device *bdev = inode->i_sb->s_bdev;
> > > > +
> > > > +                       pr_warn("bytes_submitted=%zu folio_size=%zu
> > > > blkbits=%u isize=%lld "
> > > > +                               "logical_bs=%u physical_bs=%u\n",
> > > > +                               bytes_submitted, folio_size(folio),
> > > > inode->i_blkbits,
> > > > +                               i_size_read(inode),
> > > > +                               bdev ? bdev_logical_block_size(bdev) : 0,
> > > > +                               bdev ? bdev_physical_block_size(bdev) : 0);
> > > > +               }
> > > >                  WARN_ON_ONCE(bytes_submitted);
> > > >                  folio_unlock(folio);
> > > >          }
> > >
> > > Here we go:
> > >
> > > [   17.872952] bytes_submitted=1024 folio_size=4096 blkbits=12
> > > isize=5278880768 logical_bs=0 physical_bs=0
> >
> > Thank you, this is very helpful.
> >
> > Is it correct that the block device's inode->i_blkbits doesn't reflect
> > its actual logical block size (512) or is that itself a bug somewhere
> > in the block layer?
>
> For a bdev, i_blkbits is only loosely tied to the storage device's block
> size.  i_blkbits can't be smaller than log2(lba_size) and it can't be
> larger than PAGE_SHIFT (or the max folio order if LBS is present).
>
> Typically it starts out matching PAGE_SIZE, but a program that opens the
> block device can call BLKBSZSET to set the block size.  This is usually
> done by userspace filesystem tools to improve performance and/or ensure
> that the kernel actually supports that block size.
>
> Note that kernel filesystem drivers also call sb_set_blocksize to set
> the block size.
>
> The bdev block size seemingly resets to 4k after closing.

Thanks for the explanation. I hadn't realized inode->i_blkbits is only
loosely tied to the actual I/O granularity.

Thanks,
Joanne
>
> > On the iomap side, it uses inode->i_blkbits to
> > determine whether or not an ifs should be attached and what logic to
> > correspondingly call. For this case, shouldn't the inode's i_blkbits
> > be 9 since "isize= 5278880768" indicates it's 512-byte aligned, not
> > 4096-byte aligned? I'm not familiar with the block layer, so your
> > thoughts on this would be appreciated.
>
> <shrug> Regular files can have a larger i_blkbits than the underlying
> storage device, so I don't see why bdevs would be different?
>
> --D
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hang in generic/648 on zoned btrfs after aa35dd5cbc06 ("iomap: fix invalid folio access after folio_end_read()")
  2026-03-17  9:34         ` Johannes Thumshirn
@ 2026-03-17 19:48           ` Joanne Koong
  0 siblings, 0 replies; 9+ messages in thread
From: Joanne Koong @ 2026-03-17 19:48 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: linux-fsdevel@vger.kernel.org

On Tue, Mar 17, 2026 at 2:35 AM Johannes Thumshirn
<Johannes.Thumshirn@wdc.com> wrote:
>
> On 3/10/26 10:55 PM, Joanne Koong wrote:
> > On Tue, Mar 10, 2026 at 2:08 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >> On Tue, Mar 10, 2026 at 4:44 AM Johannes Thumshirn
> >> <Johannes.Thumshirn@wdc.com> wrote:
> >>
> >> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> >> index 867e8ac761c8..03a97361570f 100644
> >> --- a/fs/iomap/buffered-io.c
> >> +++ b/fs/iomap/buffered-io.c
> >> @@ -543,7 +543,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
> >>                           * helper, then the helper owns the folio and will end
> >>                           * the read on it.
> >>                           */
> >> -                       if (*bytes_submitted == folio_len)
> >> +                       if (*bytes_submitted == folio_len || !ifs)
> >>                                  ctx->cur_folio = NULL;
> >>                  }
> > it should be this instead:
> >
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index 867e8ac761c8..b803f518adaf 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -497,6 +497,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
> >          loff_t length = iomap_length(iter);
> >          struct folio *folio = ctx->cur_folio;
> >          size_t folio_len = folio_size(folio);
> > +       struct iomap_folio_state *ifs;
> >          size_t poff, plen;
> >          loff_t pos_diff;
> >          int ret;
> > @@ -508,7 +509,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
> >                  return iomap_iter_advance(iter, length);
> >          }
> >
> > -       ifs_alloc(iter->inode, folio, iter->flags);
> > +       ifs = ifs_alloc(iter->inode, folio, iter->flags);
> >
> >          length = min_t(loff_t, length, folio_len - offset_in_folio(folio, pos));
> >          while (length) {
> > @@ -543,7 +544,7 @@ static int iomap_read_folio_iter(struct iomap_iter *iter,
> >                           * helper, then the helper owns the folio and will end
> >                           * the read on it.
> >                           */
> > -                       if (*bytes_submitted == folio_len)
> > +                       if (*bytes_submitted == folio_len || !ifs)
> >                                  ctx->cur_folio = NULL;
> >                  }
> >
> > Thanks,
> > Joanne
> >
> This version is making the test pass again,
>
> Reported-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> Thanks a lot for this!
>

Thanks for reporting this and testing / helping to debug the issue.
I'll submit the fix for this upstream today.

Thanks,
Joanne

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-03-17 19:48 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-09 16:37 Hang in generic/648 on zoned btrfs after aa35dd5cbc06 ("iomap: fix invalid folio access after folio_end_read()") Johannes Thumshirn
2026-03-09 20:56 ` Joanne Koong
2026-03-10 11:44   ` Johannes Thumshirn
2026-03-10 21:08     ` Joanne Koong
2026-03-10 21:55       ` Joanne Koong
2026-03-17  9:34         ` Johannes Thumshirn
2026-03-17 19:48           ` Joanne Koong
2026-03-10 21:59       ` Darrick J. Wong
2026-03-17 19:47         ` Joanne Koong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox