Repeatable, raid1+O_DIRECT, hang/warn

Linux block layer
 help / color / mirror / Atom feed

* Repeatable, raid1+O_DIRECT, hang/warn
@ 2026-06-14 17:57 Dr. David Alan Gilbert
  2026-06-15 10:34 ` Thorsten Leemhuis
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Dr. David Alan Gilbert @ 2026-06-14 17:57 UTC (permalink / raw)
  To: linux-block, dm-devel

Hi,
  I've got a repeatable raid hang/warn and would appreciate some pointers
as where to debug.
  (I've been logging stuff on  https://bugzilla.kernel.org/show_bug.cgi?id=221535 )

  This started off as debugging a case where I'd get my RAID1 (on the host)
getting a reliable 'rescheduling sector'/disk failure while running the qemu block test suite
during a qemu build, but then I tried to build a smaller discrete
test, and now I've got a simply triggerable warn and test hang.
There's no errors from the underlying SATA layer on the storage,
everything resyncs just fine.

I've got an existing LVM vg ('main') with two mirrors on sda2, and sdb2
which are SATA disks.

# lvcreate --type mirror --mirrors 1 -L 1G main /dev/sda2 /dev/sdb2
# mkfs.ext4 /dev/mapper/main-lvol0
# mount /dev/mapper/main-lvol0 /mnt/tmp/
# chmod a+rwx /mnt/tmp

$ dd if=/dev/zero of=/mnt/tmp/testfile bs=1024k count=1

(I then wait for the IO to stop)

then we've got this little test program:

<--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><-->
#include <errno.h>
#include <fcntl.h>             
#include <asm-generic/fcntl.h>
#include <stdio.h> 
#include <unistd.h>


const char* path="/mnt/tmp/testfile";
static char buf[8192];

int main()                                       
{
  int fd=open(path, O_RDWR|O_DIRECT|O_CLOEXEC);
    
  errno=0;
  int res3=pread(fd, buf, 4096, 0);
  printf("pread of 4096 said: %d (%m)\n", res3);

}
<--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><-->

running that, either hangs or gets a 'pread of 4096 said: -1 (Input/output error)'
when it hangs it's unkillable.

at the moment (on 7.1.0-rc7) this is giving:
Jun 14 18:08:32 dalek kernel: device-mapper: raid1: Mirror read failed from 252:24. Trying alternative device.
Jun 14 18:08:32 dalek kernel: ------------[ cut here ]------------
Jun 14 18:08:32 dalek dmeventd[1010]: Primary mirror device 252:24 read failed.
Jun 14 18:08:32 dalek kernel: WARNING: block/bio.c:1044 at bio_add_page+0x18b/0x250, CPU#15: kworker/15:1/369

(full backtrace below)
(Note there is a moan in there about sdb IO error - repeated a lot - but
again, there's no SATA level errors, and the drive is fine on smart, and
I can read the whole of the underlying lvm mirrors, so I don't think it's
physically there).

I did a blktrace, although that gives me a 23G blkparse output, hmm
(I see each event repeated a lot - maybe per thread?)

252,26  15        1     0.000000000  3435  Q  RS 264192 + 8 [dbf]
  252,26 is /dev/mapper/main-lvol0
252,24  15        1     0.000005501  3435  A  RS 264192 + 8 <- (252,26) 264192
  252,24 is main-lvol0_mimage_0
252,24  15        2     0.000005761  3435  Q  RS 264192 + 8 [dbf]
  8,0   15        1     0.000008646  3435  A  RS 71634944 + 8 <- (252,24) 264192
    so that's sda 
  8,0   15        2     0.000008787  3435  A  RS 73734144 + 8 <- (8,2) 71634944
    I guess mapping down from sda2 to sda
  8,0   15        3     0.000009037  3435  Q  RS 73734144 + 8 [dbf]
  8,0   15        4     0.000009809  3435  C  RS 73734144 + 8 [65514]
      ??? Hmm what's the 65514 there?
252,24  15        3     0.000010320  3435  C  RS 264192 + 8 [65514]
252,25  15        1     0.000290384   369  Q   R 264192 + 8 [kworker/15:1]
   252,25 is main-lvol0_mimage_1

and at this point I'm a bit lost as to what I'm looking for.

Hints appreciated!

(I don't believe this is a regression - or at least not recent)

Dave




Jun 14 18:08:32 dalek kernel: device-mapper: raid1: Mirror read failed from 252:24. Trying alternative device.
Jun 14 18:08:32 dalek kernel: ------------[ cut here ]------------
Jun 14 18:08:32 dalek dmeventd[1010]: Primary mirror device 252:24 read failed.
Jun 14 18:08:32 dalek kernel: WARNING: block/bio.c:1044 at bio_add_page+0x18b/0x250, CPU#15: kworker/15:1/369
Jun 14 18:08:32 dalek dmeventd[1010]: main-lvol0 is now in-sync.
Jun 14 18:08:32 dalek kernel: Modules linked in: nft_masq nft_reject_ipv4 act_csum cls_u32 sch_htb nf_nat_tftp nf_conntrack_tftp bridge stp llc rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reje>
Jun 14 18:08:32 dalek kernel:  drm_panel_backlight_quirks gpu_sched drm_suballoc_helper video nvme drm_display_helper nvme_core cec nvme_keyring sp5100_tco nvme_auth wmi serio_raw fuse scsi_dh_alua i2c_dev scsi_dh_rdac scsi_dh_emc
Jun 14 18:08:32 dalek kernel: CPU: 15 UID: 0 PID: 369 Comm: kworker/15:1 Not tainted 7.1.0-rc7+ #786 PREEMPT(lazy) 
Jun 14 18:08:32 dalek kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Pro4, BIOS P3.10 07/13/2020
Jun 14 18:08:32 dalek kernel: Workqueue: kmirrord do_mirror
Jun 14 18:08:32 dalek kernel: RIP: 0010:bio_add_page+0x18b/0x250
Jun 14 18:08:32 dalek kernel: Code: 24 10 4c 8b 04 24 84 c0 0f 85 c9 00 00 00 41 0f b7 40 78 48 8b 74 24 08 8b 4c 24 14 e9 b4 fe ff ff 0f 0b 31 c0 e9 55 d1 af 00 <0f> 0b eb f5 48 8b 7f 08 83 7f 60 05 0f 85 00 ff ff ff 49 8b 3b 4c
Jun 14 18:08:32 dalek kernel: RSP: 0018:ffffd1fb8176fc10 EFLAGS: 00010246
Jun 14 18:08:32 dalek kernel: RAX: 0000000000000000 RBX: ffffd1fb8176fd18 RCX: 0000000000000000
Jun 14 18:08:32 dalek kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8d1a8eb28b00
Jun 14 18:08:32 dalek kernel: RBP: 0000000000000000 R08: ffffd1fb8176fc38 R09: ffffd1fb8176fc40
Jun 14 18:08:32 dalek kernel: R10: ffffd1fb8176fc34 R11: 0000000000000000 R12: 0000000000000000
Jun 14 18:08:32 dalek kernel: R13: ffffd1fb8176fd90 R14: 0000000000000001 R15: ffff8d1a8eb28b00
Jun 14 18:08:32 dalek kernel: FS:  0000000000000000(0000) GS:ffff8d29d161f000(0000) knlGS:0000000000000000
Jun 14 18:08:32 dalek kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 14 18:08:32 dalek kernel: CR2: 00007f0ddcd7b9d0 CR3: 000000023dcbf000 CR4: 0000000000350ef0
Jun 14 18:08:32 dalek kernel: Call Trace:
Jun 14 18:08:32 dalek kernel:  <TASK>
Jun 14 18:08:32 dalek kernel:  do_region+0x227/0x2a0
Jun 14 18:08:32 dalek kernel:  dispatch_io+0xf1/0x150
Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_get_page+0x10/0x10
Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_next_page+0x10/0x10
Jun 14 18:08:32 dalek kernel:  ? __pfx_read_callback+0x10/0x10
Jun 14 18:08:32 dalek kernel:  dm_io+0x169/0x2d0
Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_get_page+0x10/0x10
Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_next_page+0x10/0x10
Jun 14 18:08:32 dalek kernel:  do_reads+0x149/0x230
Jun 14 18:08:32 dalek kernel:  ? __pfx_read_callback+0x10/0x10
Jun 14 18:08:32 dalek kernel:  do_mirror+0x11a/0x2b0
Jun 14 18:08:32 dalek kernel:  process_one_work+0x19e/0x390
Jun 14 18:08:32 dalek kernel:  worker_thread+0x1a6/0x310
Jun 14 18:08:32 dalek kernel:  ? __pfx_worker_thread+0x10/0x10
Jun 14 18:08:32 dalek kernel:  kthread+0xe4/0x120
Jun 14 18:08:32 dalek kernel:  ? __pfx_kthread+0x10/0x10
Jun 14 18:08:32 dalek kernel:  ret_from_fork+0x1a1/0x270
Jun 14 18:08:32 dalek kernel:  ? __pfx_kthread+0x10/0x10
Jun 14 18:08:32 dalek kernel:  ret_from_fork_asm+0x1a/0x30
Jun 14 18:08:32 dalek kernel:  </TASK>
Jun 14 18:08:32 dalek kernel: ---[ end trace 0000000000000000 ]---
Jun 14 18:08:32 dalek kernel: ------------[ cut here ]------------
Jun 14 18:08:32 dalek kernel: WARNING: drivers/scsi/scsi_lib.c:1164 at scsi_alloc_sgtables+0x38a/0x400, CPU#15: kworker/15:1/369
Jun 14 18:08:32 dalek kernel: Modules linked in: nft_masq nft_reject_ipv4 act_csum cls_u32 sch_htb nf_nat_tftp nf_conntrack_tftp bridge stp llc rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reje>
Jun 14 18:08:32 dalek kernel:  drm_panel_backlight_quirks gpu_sched drm_suballoc_helper video nvme drm_display_helper nvme_core cec nvme_keyring sp5100_tco nvme_auth wmi serio_raw fuse scsi_dh_alua i2c_dev scsi_dh_rdac scsi_dh_emc
Jun 14 18:08:32 dalek kernel: CPU: 15 UID: 0 PID: 369 Comm: kworker/15:1 Tainted: G        W           7.1.0-rc7+ #786 PREEMPT(lazy) 
Jun 14 18:08:32 dalek kernel: Tainted: [W]=WARN
Jun 14 18:08:32 dalek kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Pro4, BIOS P3.10 07/13/2020
Jun 14 18:08:32 dalek kernel: Workqueue: kmirrord do_mirror
Jun 14 18:08:32 dalek kernel: RIP: 0010:scsi_alloc_sgtables+0x38a/0x400
Jun 14 18:08:32 dalek kernel: Code: 8b 3d ba 2d a9 01 e9 d1 fd ff ff 48 8b 75 00 48 8d bb f0 fe ff ff e8 15 b7 b0 ff 48 89 ab e0 00 00 00 89 45 08 e9 30 ff ff ff <0f> 0b 4c 8b 6c 24 30 b8 0a 00 00 00 e9 21 ff ff ff b8 09 00 00 00
Jun 14 18:08:32 dalek kernel: RSP: 0018:ffffd1fb8176f7f0 EFLAGS: 00010246
Jun 14 18:08:32 dalek kernel: RAX: 0000000000000000 RBX: ffff8d1aedad0110 RCX: 0000000000000009
Jun 14 18:08:32 dalek kernel: RDX: 0000000000000000 RSI: ffffffff99c15960 RDI: ffff8d1aedad0110
Jun 14 18:08:32 dalek kernel: RBP: ffff8d1a93d17000 R08: ffff8d1aedad0110 R09: ffff8d1a818fa800
Jun 14 18:08:32 dalek kernel: R10: 7020676e69736961 R11: 0000000000000000 R12: 0000000000000000
Jun 14 18:08:32 dalek kernel: R13: 0000000000000000 R14: ffff8d1a93394000 R15: ffff8d1a93d17000
Jun 14 18:08:32 dalek kernel: FS:  0000000000000000(0000) GS:ffff8d29d161f000(0000) knlGS:0000000000000000
Jun 14 18:08:32 dalek kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 14 18:08:32 dalek kernel: CR2: 00007f0ddcd7b9d0 CR3: 000000023dcbf000 CR4: 0000000000350ef0
Jun 14 18:08:32 dalek kernel: Call Trace:
Jun 14 18:08:32 dalek kernel:  <TASK>
Jun 14 18:08:32 dalek kernel:  ? srso_return_thunk+0x5/0x5f
Jun 14 18:08:32 dalek kernel:  sd_setup_read_write_cmnd+0x9d/0x740
Jun 14 18:08:32 dalek kernel:  ? srso_return_thunk+0x5/0x5f
Jun 14 18:08:32 dalek kernel:  scsi_queue_rq+0x4d2/0x890
Jun 14 18:08:32 dalek kernel:  blk_mq_dispatch_rq_list+0x241/0x530
Jun 14 18:08:32 dalek kernel:  ? srso_return_thunk+0x5/0x5f
Jun 14 18:08:32 dalek kernel:  ? sbitmap_get+0x61/0x100
Jun 14 18:08:32 dalek kernel:  __blk_mq_do_dispatch_sched+0x330/0x340
Jun 14 18:08:32 dalek kernel:  __blk_mq_sched_dispatch_requests+0x143/0x180
Jun 14 18:08:32 dalek kernel:  blk_mq_sched_dispatch_requests+0x2d/0x70
Jun 14 18:08:32 dalek kernel:  blk_mq_run_hw_queue+0x2bf/0x350
Jun 14 18:08:32 dalek kernel:  ? srso_return_thunk+0x5/0x5f
Jun 14 18:08:32 dalek kernel:  blk_mq_dispatch_list+0x172/0x350
Jun 14 18:08:32 dalek kernel:  blk_mq_flush_plug_list+0x51/0x1a0
Jun 14 18:08:32 dalek kernel:  ? blk_mq_submit_bio+0x71c/0x9f0
Jun 14 18:08:32 dalek kernel:  __blk_flush_plug+0x112/0x180
Jun 14 18:08:32 dalek kernel:  ? srso_return_thunk+0x5/0x5f
Jun 14 18:08:32 dalek kernel:  __submit_bio+0x19c/0x260
Jun 14 18:08:32 dalek kernel:  __submit_bio_noacct+0x8e/0x210
Jun 14 18:08:32 dalek kernel:  do_region+0x14c/0x2a0
Jun 14 18:08:32 dalek kernel:  dispatch_io+0xf1/0x150
Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_get_page+0x10/0x10
Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_next_page+0x10/0x10
Jun 14 18:08:32 dalek kernel:  ? __pfx_read_callback+0x10/0x10
Jun 14 18:08:32 dalek kernel:  dm_io+0x169/0x2d0
Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_get_page+0x10/0x10
Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_next_page+0x10/0x10
Jun 14 18:08:32 dalek kernel:  do_reads+0x149/0x230
Jun 14 18:08:32 dalek kernel:  ? __pfx_read_callback+0x10/0x10
Jun 14 18:08:32 dalek kernel:  do_mirror+0x11a/0x2b0
Jun 14 18:08:32 dalek kernel:  process_one_work+0x19e/0x390
Jun 14 18:08:32 dalek kernel:  worker_thread+0x1a6/0x310
Jun 14 18:08:32 dalek kernel:  ? __pfx_worker_thread+0x10/0x10
Jun 14 18:08:32 dalek kernel:  kthread+0xe4/0x120
Jun 14 18:08:32 dalek kernel:  ? __pfx_kthread+0x10/0x10
Jun 14 18:08:32 dalek kernel:  ret_from_fork+0x1a1/0x270
Jun 14 18:08:32 dalek kernel:  ? __pfx_kthread+0x10/0x10
Jun 14 18:08:32 dalek kernel:  ret_from_fork_asm+0x1a/0x30
Jun 14 18:08:32 dalek kernel:  </TASK>
Jun 14 18:08:32 dalek kernel: ---[ end trace 0000000000000000 ]---
Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
Jun 14 18:08:37 dalek kernel: blk_print_req_error: 241000 callbacks suppressed
Jun 14 18:08:37 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2


-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Repeatable, raid1+O_DIRECT, hang/warn
  2026-06-14 17:57 Repeatable, raid1+O_DIRECT, hang/warn Dr. David Alan Gilbert
@ 2026-06-15 10:34 ` Thorsten Leemhuis
  2026-06-15 12:50   ` Dr. David Alan Gilbert
  2026-06-15 13:07 ` Zdenek Kabelac
  2026-06-15 15:20 ` Keith Busch
  2 siblings, 1 reply; 22+ messages in thread
From: Thorsten Leemhuis @ 2026-06-15 10:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, linux-block, dm-devel
  Cc: Linux kernel regressions list

On 6/14/26 19:57, Dr. David Alan Gilbert wrote:
>
>   I've got a repeatable raid hang/warn and would appreciate some pointers
> as where to debug.
>   (I've been logging stuff on  https://bugzilla.kernel.org/show_bug.cgi?id=221535 )

Note: not my area of expertise, so I might be sending you totally
off-track with this comment. Feel free to ignore it. But FWIW:

Have you seen these reports?
https://lore.kernel.org/all/2982107.4sosBPzcNG@electra/
https://lore.kernel.org/all/CAC_j7i1R7oy+nRhxEjCTba=DUgn02w9X+p94DCu0aHv5+5tKnQ@mail.gmail.com/

The former lead to a fix in the mdraid code that should be in the kernel
version you are using. But in a reply to the latter report the repoter
claimed that that fix is not enough (claiming "this was obvious" and
also using dm), but things then stalled there.

Ciao, Thorsten

>   This started off as debugging a case where I'd get my RAID1 (on the host)
> getting a reliable 'rescheduling sector'/disk failure while running the qemu block test suite
> during a qemu build, but then I tried to build a smaller discrete
> test, and now I've got a simply triggerable warn and test hang.
> There's no errors from the underlying SATA layer on the storage,
> everything resyncs just fine.
> 
> I've got an existing LVM vg ('main') with two mirrors on sda2, and sdb2
> which are SATA disks.
> 
> # lvcreate --type mirror --mirrors 1 -L 1G main /dev/sda2 /dev/sdb2
> # mkfs.ext4 /dev/mapper/main-lvol0
> # mount /dev/mapper/main-lvol0 /mnt/tmp/
> # chmod a+rwx /mnt/tmp
> 
> $ dd if=/dev/zero of=/mnt/tmp/testfile bs=1024k count=1
> 
> (I then wait for the IO to stop)
> 
> then we've got this little test program:
> 
> <--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><-->
> #include <errno.h>
> #include <fcntl.h>             
> #include <asm-generic/fcntl.h>
> #include <stdio.h> 
> #include <unistd.h>
> 
> 
> const char* path="/mnt/tmp/testfile";
> static char buf[8192];
> 
> int main()                                       
> {
>   int fd=open(path, O_RDWR|O_DIRECT|O_CLOEXEC);
>     
>   errno=0;
>   int res3=pread(fd, buf, 4096, 0);
>   printf("pread of 4096 said: %d (%m)\n", res3);
> 
> }
> <--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><-->
> 
> running that, either hangs or gets a 'pread of 4096 said: -1 (Input/output error)'
> when it hangs it's unkillable.
> 
> at the moment (on 7.1.0-rc7) this is giving:
> Jun 14 18:08:32 dalek kernel: device-mapper: raid1: Mirror read failed from 252:24. Trying alternative device.
> Jun 14 18:08:32 dalek kernel: ------------[ cut here ]------------
> Jun 14 18:08:32 dalek dmeventd[1010]: Primary mirror device 252:24 read failed.
> Jun 14 18:08:32 dalek kernel: WARNING: block/bio.c:1044 at bio_add_page+0x18b/0x250, CPU#15: kworker/15:1/369
> 
> (full backtrace below)
> (Note there is a moan in there about sdb IO error - repeated a lot - but
> again, there's no SATA level errors, and the drive is fine on smart, and
> I can read the whole of the underlying lvm mirrors, so I don't think it's
> physically there).
> 
> I did a blktrace, although that gives me a 23G blkparse output, hmm
> (I see each event repeated a lot - maybe per thread?)
> 
> 252,26  15        1     0.000000000  3435  Q  RS 264192 + 8 [dbf]
>   252,26 is /dev/mapper/main-lvol0
> 252,24  15        1     0.000005501  3435  A  RS 264192 + 8 <- (252,26) 264192
>   252,24 is main-lvol0_mimage_0
> 252,24  15        2     0.000005761  3435  Q  RS 264192 + 8 [dbf]
>   8,0   15        1     0.000008646  3435  A  RS 71634944 + 8 <- (252,24) 264192
>     so that's sda 
>   8,0   15        2     0.000008787  3435  A  RS 73734144 + 8 <- (8,2) 71634944
>     I guess mapping down from sda2 to sda
>   8,0   15        3     0.000009037  3435  Q  RS 73734144 + 8 [dbf]
>   8,0   15        4     0.000009809  3435  C  RS 73734144 + 8 [65514]
>       ??? Hmm what's the 65514 there?
> 252,24  15        3     0.000010320  3435  C  RS 264192 + 8 [65514]
> 252,25  15        1     0.000290384   369  Q   R 264192 + 8 [kworker/15:1]
>    252,25 is main-lvol0_mimage_1
> 
> and at this point I'm a bit lost as to what I'm looking for.
> 
> Hints appreciated!
> 
> (I don't believe this is a regression - or at least not recent)
> 
> Dave
> 
> 
> 
> 
> Jun 14 18:08:32 dalek kernel: device-mapper: raid1: Mirror read failed from 252:24. Trying alternative device.
> Jun 14 18:08:32 dalek kernel: ------------[ cut here ]------------
> Jun 14 18:08:32 dalek dmeventd[1010]: Primary mirror device 252:24 read failed.
> Jun 14 18:08:32 dalek kernel: WARNING: block/bio.c:1044 at bio_add_page+0x18b/0x250, CPU#15: kworker/15:1/369
> Jun 14 18:08:32 dalek dmeventd[1010]: main-lvol0 is now in-sync.
> Jun 14 18:08:32 dalek kernel: Modules linked in: nft_masq nft_reject_ipv4 act_csum cls_u32 sch_htb nf_nat_tftp nf_conntrack_tftp bridge stp llc rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reje>
> Jun 14 18:08:32 dalek kernel:  drm_panel_backlight_quirks gpu_sched drm_suballoc_helper video nvme drm_display_helper nvme_core cec nvme_keyring sp5100_tco nvme_auth wmi serio_raw fuse scsi_dh_alua i2c_dev scsi_dh_rdac scsi_dh_emc
> Jun 14 18:08:32 dalek kernel: CPU: 15 UID: 0 PID: 369 Comm: kworker/15:1 Not tainted 7.1.0-rc7+ #786 PREEMPT(lazy) 
> Jun 14 18:08:32 dalek kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Pro4, BIOS P3.10 07/13/2020
> Jun 14 18:08:32 dalek kernel: Workqueue: kmirrord do_mirror
> Jun 14 18:08:32 dalek kernel: RIP: 0010:bio_add_page+0x18b/0x250
> Jun 14 18:08:32 dalek kernel: Code: 24 10 4c 8b 04 24 84 c0 0f 85 c9 00 00 00 41 0f b7 40 78 48 8b 74 24 08 8b 4c 24 14 e9 b4 fe ff ff 0f 0b 31 c0 e9 55 d1 af 00 <0f> 0b eb f5 48 8b 7f 08 83 7f 60 05 0f 85 00 ff ff ff 49 8b 3b 4c
> Jun 14 18:08:32 dalek kernel: RSP: 0018:ffffd1fb8176fc10 EFLAGS: 00010246
> Jun 14 18:08:32 dalek kernel: RAX: 0000000000000000 RBX: ffffd1fb8176fd18 RCX: 0000000000000000
> Jun 14 18:08:32 dalek kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8d1a8eb28b00
> Jun 14 18:08:32 dalek kernel: RBP: 0000000000000000 R08: ffffd1fb8176fc38 R09: ffffd1fb8176fc40
> Jun 14 18:08:32 dalek kernel: R10: ffffd1fb8176fc34 R11: 0000000000000000 R12: 0000000000000000
> Jun 14 18:08:32 dalek kernel: R13: ffffd1fb8176fd90 R14: 0000000000000001 R15: ffff8d1a8eb28b00
> Jun 14 18:08:32 dalek kernel: FS:  0000000000000000(0000) GS:ffff8d29d161f000(0000) knlGS:0000000000000000
> Jun 14 18:08:32 dalek kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jun 14 18:08:32 dalek kernel: CR2: 00007f0ddcd7b9d0 CR3: 000000023dcbf000 CR4: 0000000000350ef0
> Jun 14 18:08:32 dalek kernel: Call Trace:
> Jun 14 18:08:32 dalek kernel:  <TASK>
> Jun 14 18:08:32 dalek kernel:  do_region+0x227/0x2a0
> Jun 14 18:08:32 dalek kernel:  dispatch_io+0xf1/0x150
> Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_get_page+0x10/0x10
> Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_next_page+0x10/0x10
> Jun 14 18:08:32 dalek kernel:  ? __pfx_read_callback+0x10/0x10
> Jun 14 18:08:32 dalek kernel:  dm_io+0x169/0x2d0
> Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_get_page+0x10/0x10
> Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_next_page+0x10/0x10
> Jun 14 18:08:32 dalek kernel:  do_reads+0x149/0x230
> Jun 14 18:08:32 dalek kernel:  ? __pfx_read_callback+0x10/0x10
> Jun 14 18:08:32 dalek kernel:  do_mirror+0x11a/0x2b0
> Jun 14 18:08:32 dalek kernel:  process_one_work+0x19e/0x390
> Jun 14 18:08:32 dalek kernel:  worker_thread+0x1a6/0x310
> Jun 14 18:08:32 dalek kernel:  ? __pfx_worker_thread+0x10/0x10
> Jun 14 18:08:32 dalek kernel:  kthread+0xe4/0x120
> Jun 14 18:08:32 dalek kernel:  ? __pfx_kthread+0x10/0x10
> Jun 14 18:08:32 dalek kernel:  ret_from_fork+0x1a1/0x270
> Jun 14 18:08:32 dalek kernel:  ? __pfx_kthread+0x10/0x10
> Jun 14 18:08:32 dalek kernel:  ret_from_fork_asm+0x1a/0x30
> Jun 14 18:08:32 dalek kernel:  </TASK>
> Jun 14 18:08:32 dalek kernel: ---[ end trace 0000000000000000 ]---
> Jun 14 18:08:32 dalek kernel: ------------[ cut here ]------------
> Jun 14 18:08:32 dalek kernel: WARNING: drivers/scsi/scsi_lib.c:1164 at scsi_alloc_sgtables+0x38a/0x400, CPU#15: kworker/15:1/369
> Jun 14 18:08:32 dalek kernel: Modules linked in: nft_masq nft_reject_ipv4 act_csum cls_u32 sch_htb nf_nat_tftp nf_conntrack_tftp bridge stp llc rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reje>
> Jun 14 18:08:32 dalek kernel:  drm_panel_backlight_quirks gpu_sched drm_suballoc_helper video nvme drm_display_helper nvme_core cec nvme_keyring sp5100_tco nvme_auth wmi serio_raw fuse scsi_dh_alua i2c_dev scsi_dh_rdac scsi_dh_emc
> Jun 14 18:08:32 dalek kernel: CPU: 15 UID: 0 PID: 369 Comm: kworker/15:1 Tainted: G        W           7.1.0-rc7+ #786 PREEMPT(lazy) 
> Jun 14 18:08:32 dalek kernel: Tainted: [W]=WARN
> Jun 14 18:08:32 dalek kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Pro4, BIOS P3.10 07/13/2020
> Jun 14 18:08:32 dalek kernel: Workqueue: kmirrord do_mirror
> Jun 14 18:08:32 dalek kernel: RIP: 0010:scsi_alloc_sgtables+0x38a/0x400
> Jun 14 18:08:32 dalek kernel: Code: 8b 3d ba 2d a9 01 e9 d1 fd ff ff 48 8b 75 00 48 8d bb f0 fe ff ff e8 15 b7 b0 ff 48 89 ab e0 00 00 00 89 45 08 e9 30 ff ff ff <0f> 0b 4c 8b 6c 24 30 b8 0a 00 00 00 e9 21 ff ff ff b8 09 00 00 00
> Jun 14 18:08:32 dalek kernel: RSP: 0018:ffffd1fb8176f7f0 EFLAGS: 00010246
> Jun 14 18:08:32 dalek kernel: RAX: 0000000000000000 RBX: ffff8d1aedad0110 RCX: 0000000000000009
> Jun 14 18:08:32 dalek kernel: RDX: 0000000000000000 RSI: ffffffff99c15960 RDI: ffff8d1aedad0110
> Jun 14 18:08:32 dalek kernel: RBP: ffff8d1a93d17000 R08: ffff8d1aedad0110 R09: ffff8d1a818fa800
> Jun 14 18:08:32 dalek kernel: R10: 7020676e69736961 R11: 0000000000000000 R12: 0000000000000000
> Jun 14 18:08:32 dalek kernel: R13: 0000000000000000 R14: ffff8d1a93394000 R15: ffff8d1a93d17000
> Jun 14 18:08:32 dalek kernel: FS:  0000000000000000(0000) GS:ffff8d29d161f000(0000) knlGS:0000000000000000
> Jun 14 18:08:32 dalek kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jun 14 18:08:32 dalek kernel: CR2: 00007f0ddcd7b9d0 CR3: 000000023dcbf000 CR4: 0000000000350ef0
> Jun 14 18:08:32 dalek kernel: Call Trace:
> Jun 14 18:08:32 dalek kernel:  <TASK>
> Jun 14 18:08:32 dalek kernel:  ? srso_return_thunk+0x5/0x5f
> Jun 14 18:08:32 dalek kernel:  sd_setup_read_write_cmnd+0x9d/0x740
> Jun 14 18:08:32 dalek kernel:  ? srso_return_thunk+0x5/0x5f
> Jun 14 18:08:32 dalek kernel:  scsi_queue_rq+0x4d2/0x890
> Jun 14 18:08:32 dalek kernel:  blk_mq_dispatch_rq_list+0x241/0x530
> Jun 14 18:08:32 dalek kernel:  ? srso_return_thunk+0x5/0x5f
> Jun 14 18:08:32 dalek kernel:  ? sbitmap_get+0x61/0x100
> Jun 14 18:08:32 dalek kernel:  __blk_mq_do_dispatch_sched+0x330/0x340
> Jun 14 18:08:32 dalek kernel:  __blk_mq_sched_dispatch_requests+0x143/0x180
> Jun 14 18:08:32 dalek kernel:  blk_mq_sched_dispatch_requests+0x2d/0x70
> Jun 14 18:08:32 dalek kernel:  blk_mq_run_hw_queue+0x2bf/0x350
> Jun 14 18:08:32 dalek kernel:  ? srso_return_thunk+0x5/0x5f
> Jun 14 18:08:32 dalek kernel:  blk_mq_dispatch_list+0x172/0x350
> Jun 14 18:08:32 dalek kernel:  blk_mq_flush_plug_list+0x51/0x1a0
> Jun 14 18:08:32 dalek kernel:  ? blk_mq_submit_bio+0x71c/0x9f0
> Jun 14 18:08:32 dalek kernel:  __blk_flush_plug+0x112/0x180
> Jun 14 18:08:32 dalek kernel:  ? srso_return_thunk+0x5/0x5f
> Jun 14 18:08:32 dalek kernel:  __submit_bio+0x19c/0x260
> Jun 14 18:08:32 dalek kernel:  __submit_bio_noacct+0x8e/0x210
> Jun 14 18:08:32 dalek kernel:  do_region+0x14c/0x2a0
> Jun 14 18:08:32 dalek kernel:  dispatch_io+0xf1/0x150
> Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_get_page+0x10/0x10
> Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_next_page+0x10/0x10
> Jun 14 18:08:32 dalek kernel:  ? __pfx_read_callback+0x10/0x10
> Jun 14 18:08:32 dalek kernel:  dm_io+0x169/0x2d0
> Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_get_page+0x10/0x10
> Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_next_page+0x10/0x10
> Jun 14 18:08:32 dalek kernel:  do_reads+0x149/0x230
> Jun 14 18:08:32 dalek kernel:  ? __pfx_read_callback+0x10/0x10
> Jun 14 18:08:32 dalek kernel:  do_mirror+0x11a/0x2b0
> Jun 14 18:08:32 dalek kernel:  process_one_work+0x19e/0x390
> Jun 14 18:08:32 dalek kernel:  worker_thread+0x1a6/0x310
> Jun 14 18:08:32 dalek kernel:  ? __pfx_worker_thread+0x10/0x10
> Jun 14 18:08:32 dalek kernel:  kthread+0xe4/0x120
> Jun 14 18:08:32 dalek kernel:  ? __pfx_kthread+0x10/0x10
> Jun 14 18:08:32 dalek kernel:  ret_from_fork+0x1a1/0x270
> Jun 14 18:08:32 dalek kernel:  ? __pfx_kthread+0x10/0x10
> Jun 14 18:08:32 dalek kernel:  ret_from_fork_asm+0x1a/0x30
> Jun 14 18:08:32 dalek kernel:  </TASK>
> Jun 14 18:08:32 dalek kernel: ---[ end trace 0000000000000000 ]---
> Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> Jun 14 18:08:37 dalek kernel: blk_print_req_error: 241000 callbacks suppressed
> Jun 14 18:08:37 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> 
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Repeatable, raid1+O_DIRECT, hang/warn
  2026-06-15 10:34 ` Thorsten Leemhuis
@ 2026-06-15 12:50   ` Dr. David Alan Gilbert
  2026-06-15 23:16     ` Vjaceslavs Klimovs
  0 siblings, 1 reply; 22+ messages in thread
From: Dr. David Alan Gilbert @ 2026-06-15 12:50 UTC (permalink / raw)
  To: Thorsten Leemhuis, kbusch, vklimovs, trnka
  Cc: linux-block, dm-devel, Linux kernel regressions list

* Thorsten Leemhuis (regressions@leemhuis.info) wrote:
> On 6/14/26 19:57, Dr. David Alan Gilbert wrote:
> >
> >   I've got a repeatable raid hang/warn and would appreciate some pointers
> > as where to debug.
> >   (I've been logging stuff on  https://bugzilla.kernel.org/show_bug.cgi?id=221535 )
> 
> Note: not my area of expertise, so I might be sending you totally
> off-track with this comment. Feel free to ignore it. But FWIW:

Hi Thorsten,
  Thanks for the reply - these do seem to be related!
(So copying in Keith, Vjaceslavs, and Tomáš )
(Not my area either).

> Have you seen these reports?
> https://lore.kernel.org/all/2982107.4sosBPzcNG@electra/
> https://lore.kernel.org/all/CAC_j7i1R7oy+nRhxEjCTba=DUgn02w9X+p94DCu0aHv5+5tKnQ@mail.gmail.com/

I hadn't!  Those are both the problem I originally was trying to debug
and stumbled into the WARN/BUG/hang with my test program.

> The former lead to a fix in the mdraid code that should be in the kernel
> version you are using. But in a reply to the latter report the repoter
> claimed that that fix is not enough (claiming "this was obvious" and
> also using dm), but things then stalled there.

Yeh I see my world has Keith's f7b24c7b41f23

I think the problem I'm seeing is zero length requests coming from somewhere.

The WARN I'm seeing in 7.1.0-rc7+ is:

[ 2681.597042] device-mapper: raid1: Mirror read failed from 252:25. Trying alternative device.
[ 2681.631933] ------------[ cut here ]------------
[ 2681.631939] WARNING: block/bio.c:1044 at bio_add_page+0x18b/0x250, CPU#22: kworker/22:0/18929

1039 int bio_add_page(struct bio *bio, struct page *page,
1040                  unsigned int len, unsigned int offset)
1041 {
1042         if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
1043                 return 0;
1044         if (WARN_ON_ONCE(len == 0))
1045                 return 0;

So it's the ' if (WARN_ON_ONCE(len == 0))'

and the warn I got on the older 7.0.8 was:
[Sun May 17 17:22:52 2026] WARNING: drivers/scsi/scsi_lib.c:1140 at scsi_alloc_sgtables+0x38a/0x400, CPU#28: kworker/28:1H/3943

which I *think* corresponds to:
1164         if (WARN_ON_ONCE(!nr_segs))
1165                 return BLK_STS_IOERR;

so it sounds like we need to find where zero length requests are coming from??

Thanks again,

Dave

> Ciao, Thorsten
> 
> >   This started off as debugging a case where I'd get my RAID1 (on the host)
> > getting a reliable 'rescheduling sector'/disk failure while running the qemu block test suite
> > during a qemu build, but then I tried to build a smaller discrete
> > test, and now I've got a simply triggerable warn and test hang.
> > There's no errors from the underlying SATA layer on the storage,
> > everything resyncs just fine.
> > 
> > I've got an existing LVM vg ('main') with two mirrors on sda2, and sdb2
> > which are SATA disks.
> > 
> > # lvcreate --type mirror --mirrors 1 -L 1G main /dev/sda2 /dev/sdb2
> > # mkfs.ext4 /dev/mapper/main-lvol0
> > # mount /dev/mapper/main-lvol0 /mnt/tmp/
> > # chmod a+rwx /mnt/tmp
> > 
> > $ dd if=/dev/zero of=/mnt/tmp/testfile bs=1024k count=1
> > 
> > (I then wait for the IO to stop)
> > 
> > then we've got this little test program:
> > 
> > <--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><-->
> > #include <errno.h>
> > #include <fcntl.h>             
> > #include <asm-generic/fcntl.h>
> > #include <stdio.h> 
> > #include <unistd.h>
> > 
> > 
> > const char* path="/mnt/tmp/testfile";
> > static char buf[8192];
> > 
> > int main()                                       
> > {
> >   int fd=open(path, O_RDWR|O_DIRECT|O_CLOEXEC);
> >     
> >   errno=0;
> >   int res3=pread(fd, buf, 4096, 0);
> >   printf("pread of 4096 said: %d (%m)\n", res3);
> > 
> > }
> > <--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><-->
> > 
> > running that, either hangs or gets a 'pread of 4096 said: -1 (Input/output error)'
> > when it hangs it's unkillable.
> > 
> > at the moment (on 7.1.0-rc7) this is giving:
> > Jun 14 18:08:32 dalek kernel: device-mapper: raid1: Mirror read failed from 252:24. Trying alternative device.
> > Jun 14 18:08:32 dalek kernel: ------------[ cut here ]------------
> > Jun 14 18:08:32 dalek dmeventd[1010]: Primary mirror device 252:24 read failed.
> > Jun 14 18:08:32 dalek kernel: WARNING: block/bio.c:1044 at bio_add_page+0x18b/0x250, CPU#15: kworker/15:1/369
> > 
> > (full backtrace below)
> > (Note there is a moan in there about sdb IO error - repeated a lot - but
> > again, there's no SATA level errors, and the drive is fine on smart, and
> > I can read the whole of the underlying lvm mirrors, so I don't think it's
> > physically there).
> > 
> > I did a blktrace, although that gives me a 23G blkparse output, hmm
> > (I see each event repeated a lot - maybe per thread?)
> > 
> > 252,26  15        1     0.000000000  3435  Q  RS 264192 + 8 [dbf]
> >   252,26 is /dev/mapper/main-lvol0
> > 252,24  15        1     0.000005501  3435  A  RS 264192 + 8 <- (252,26) 264192
> >   252,24 is main-lvol0_mimage_0
> > 252,24  15        2     0.000005761  3435  Q  RS 264192 + 8 [dbf]
> >   8,0   15        1     0.000008646  3435  A  RS 71634944 + 8 <- (252,24) 264192
> >     so that's sda 
> >   8,0   15        2     0.000008787  3435  A  RS 73734144 + 8 <- (8,2) 71634944
> >     I guess mapping down from sda2 to sda
> >   8,0   15        3     0.000009037  3435  Q  RS 73734144 + 8 [dbf]
> >   8,0   15        4     0.000009809  3435  C  RS 73734144 + 8 [65514]
> >       ??? Hmm what's the 65514 there?
> > 252,24  15        3     0.000010320  3435  C  RS 264192 + 8 [65514]
> > 252,25  15        1     0.000290384   369  Q   R 264192 + 8 [kworker/15:1]
> >    252,25 is main-lvol0_mimage_1
> > 
> > and at this point I'm a bit lost as to what I'm looking for.
> > 
> > Hints appreciated!
> > 
> > (I don't believe this is a regression - or at least not recent)
> > 
> > Dave
> > 
> > 
> > 
> > 
> > Jun 14 18:08:32 dalek kernel: device-mapper: raid1: Mirror read failed from 252:24. Trying alternative device.
> > Jun 14 18:08:32 dalek kernel: ------------[ cut here ]------------
> > Jun 14 18:08:32 dalek dmeventd[1010]: Primary mirror device 252:24 read failed.
> > Jun 14 18:08:32 dalek kernel: WARNING: block/bio.c:1044 at bio_add_page+0x18b/0x250, CPU#15: kworker/15:1/369
> > Jun 14 18:08:32 dalek dmeventd[1010]: main-lvol0 is now in-sync.
> > Jun 14 18:08:32 dalek kernel: Modules linked in: nft_masq nft_reject_ipv4 act_csum cls_u32 sch_htb nf_nat_tftp nf_conntrack_tftp bridge stp llc rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reje>
> > Jun 14 18:08:32 dalek kernel:  drm_panel_backlight_quirks gpu_sched drm_suballoc_helper video nvme drm_display_helper nvme_core cec nvme_keyring sp5100_tco nvme_auth wmi serio_raw fuse scsi_dh_alua i2c_dev scsi_dh_rdac scsi_dh_emc
> > Jun 14 18:08:32 dalek kernel: CPU: 15 UID: 0 PID: 369 Comm: kworker/15:1 Not tainted 7.1.0-rc7+ #786 PREEMPT(lazy) 
> > Jun 14 18:08:32 dalek kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Pro4, BIOS P3.10 07/13/2020
> > Jun 14 18:08:32 dalek kernel: Workqueue: kmirrord do_mirror
> > Jun 14 18:08:32 dalek kernel: RIP: 0010:bio_add_page+0x18b/0x250
> > Jun 14 18:08:32 dalek kernel: Code: 24 10 4c 8b 04 24 84 c0 0f 85 c9 00 00 00 41 0f b7 40 78 48 8b 74 24 08 8b 4c 24 14 e9 b4 fe ff ff 0f 0b 31 c0 e9 55 d1 af 00 <0f> 0b eb f5 48 8b 7f 08 83 7f 60 05 0f 85 00 ff ff ff 49 8b 3b 4c
> > Jun 14 18:08:32 dalek kernel: RSP: 0018:ffffd1fb8176fc10 EFLAGS: 00010246
> > Jun 14 18:08:32 dalek kernel: RAX: 0000000000000000 RBX: ffffd1fb8176fd18 RCX: 0000000000000000
> > Jun 14 18:08:32 dalek kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8d1a8eb28b00
> > Jun 14 18:08:32 dalek kernel: RBP: 0000000000000000 R08: ffffd1fb8176fc38 R09: ffffd1fb8176fc40
> > Jun 14 18:08:32 dalek kernel: R10: ffffd1fb8176fc34 R11: 0000000000000000 R12: 0000000000000000
> > Jun 14 18:08:32 dalek kernel: R13: ffffd1fb8176fd90 R14: 0000000000000001 R15: ffff8d1a8eb28b00
> > Jun 14 18:08:32 dalek kernel: FS:  0000000000000000(0000) GS:ffff8d29d161f000(0000) knlGS:0000000000000000
> > Jun 14 18:08:32 dalek kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Jun 14 18:08:32 dalek kernel: CR2: 00007f0ddcd7b9d0 CR3: 000000023dcbf000 CR4: 0000000000350ef0
> > Jun 14 18:08:32 dalek kernel: Call Trace:
> > Jun 14 18:08:32 dalek kernel:  <TASK>
> > Jun 14 18:08:32 dalek kernel:  do_region+0x227/0x2a0
> > Jun 14 18:08:32 dalek kernel:  dispatch_io+0xf1/0x150
> > Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_get_page+0x10/0x10
> > Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_next_page+0x10/0x10
> > Jun 14 18:08:32 dalek kernel:  ? __pfx_read_callback+0x10/0x10
> > Jun 14 18:08:32 dalek kernel:  dm_io+0x169/0x2d0
> > Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_get_page+0x10/0x10
> > Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_next_page+0x10/0x10
> > Jun 14 18:08:32 dalek kernel:  do_reads+0x149/0x230
> > Jun 14 18:08:32 dalek kernel:  ? __pfx_read_callback+0x10/0x10
> > Jun 14 18:08:32 dalek kernel:  do_mirror+0x11a/0x2b0
> > Jun 14 18:08:32 dalek kernel:  process_one_work+0x19e/0x390
> > Jun 14 18:08:32 dalek kernel:  worker_thread+0x1a6/0x310
> > Jun 14 18:08:32 dalek kernel:  ? __pfx_worker_thread+0x10/0x10
> > Jun 14 18:08:32 dalek kernel:  kthread+0xe4/0x120
> > Jun 14 18:08:32 dalek kernel:  ? __pfx_kthread+0x10/0x10
> > Jun 14 18:08:32 dalek kernel:  ret_from_fork+0x1a1/0x270
> > Jun 14 18:08:32 dalek kernel:  ? __pfx_kthread+0x10/0x10
> > Jun 14 18:08:32 dalek kernel:  ret_from_fork_asm+0x1a/0x30
> > Jun 14 18:08:32 dalek kernel:  </TASK>
> > Jun 14 18:08:32 dalek kernel: ---[ end trace 0000000000000000 ]---
> > Jun 14 18:08:32 dalek kernel: ------------[ cut here ]------------
> > Jun 14 18:08:32 dalek kernel: WARNING: drivers/scsi/scsi_lib.c:1164 at scsi_alloc_sgtables+0x38a/0x400, CPU#15: kworker/15:1/369
> > Jun 14 18:08:32 dalek kernel: Modules linked in: nft_masq nft_reject_ipv4 act_csum cls_u32 sch_htb nf_nat_tftp nf_conntrack_tftp bridge stp llc rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reje>
> > Jun 14 18:08:32 dalek kernel:  drm_panel_backlight_quirks gpu_sched drm_suballoc_helper video nvme drm_display_helper nvme_core cec nvme_keyring sp5100_tco nvme_auth wmi serio_raw fuse scsi_dh_alua i2c_dev scsi_dh_rdac scsi_dh_emc
> > Jun 14 18:08:32 dalek kernel: CPU: 15 UID: 0 PID: 369 Comm: kworker/15:1 Tainted: G        W           7.1.0-rc7+ #786 PREEMPT(lazy) 
> > Jun 14 18:08:32 dalek kernel: Tainted: [W]=WARN
> > Jun 14 18:08:32 dalek kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Pro4, BIOS P3.10 07/13/2020
> > Jun 14 18:08:32 dalek kernel: Workqueue: kmirrord do_mirror
> > Jun 14 18:08:32 dalek kernel: RIP: 0010:scsi_alloc_sgtables+0x38a/0x400
> > Jun 14 18:08:32 dalek kernel: Code: 8b 3d ba 2d a9 01 e9 d1 fd ff ff 48 8b 75 00 48 8d bb f0 fe ff ff e8 15 b7 b0 ff 48 89 ab e0 00 00 00 89 45 08 e9 30 ff ff ff <0f> 0b 4c 8b 6c 24 30 b8 0a 00 00 00 e9 21 ff ff ff b8 09 00 00 00
> > Jun 14 18:08:32 dalek kernel: RSP: 0018:ffffd1fb8176f7f0 EFLAGS: 00010246
> > Jun 14 18:08:32 dalek kernel: RAX: 0000000000000000 RBX: ffff8d1aedad0110 RCX: 0000000000000009
> > Jun 14 18:08:32 dalek kernel: RDX: 0000000000000000 RSI: ffffffff99c15960 RDI: ffff8d1aedad0110
> > Jun 14 18:08:32 dalek kernel: RBP: ffff8d1a93d17000 R08: ffff8d1aedad0110 R09: ffff8d1a818fa800
> > Jun 14 18:08:32 dalek kernel: R10: 7020676e69736961 R11: 0000000000000000 R12: 0000000000000000
> > Jun 14 18:08:32 dalek kernel: R13: 0000000000000000 R14: ffff8d1a93394000 R15: ffff8d1a93d17000
> > Jun 14 18:08:32 dalek kernel: FS:  0000000000000000(0000) GS:ffff8d29d161f000(0000) knlGS:0000000000000000
> > Jun 14 18:08:32 dalek kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Jun 14 18:08:32 dalek kernel: CR2: 00007f0ddcd7b9d0 CR3: 000000023dcbf000 CR4: 0000000000350ef0
> > Jun 14 18:08:32 dalek kernel: Call Trace:
> > Jun 14 18:08:32 dalek kernel:  <TASK>
> > Jun 14 18:08:32 dalek kernel:  ? srso_return_thunk+0x5/0x5f
> > Jun 14 18:08:32 dalek kernel:  sd_setup_read_write_cmnd+0x9d/0x740
> > Jun 14 18:08:32 dalek kernel:  ? srso_return_thunk+0x5/0x5f
> > Jun 14 18:08:32 dalek kernel:  scsi_queue_rq+0x4d2/0x890
> > Jun 14 18:08:32 dalek kernel:  blk_mq_dispatch_rq_list+0x241/0x530
> > Jun 14 18:08:32 dalek kernel:  ? srso_return_thunk+0x5/0x5f
> > Jun 14 18:08:32 dalek kernel:  ? sbitmap_get+0x61/0x100
> > Jun 14 18:08:32 dalek kernel:  __blk_mq_do_dispatch_sched+0x330/0x340
> > Jun 14 18:08:32 dalek kernel:  __blk_mq_sched_dispatch_requests+0x143/0x180
> > Jun 14 18:08:32 dalek kernel:  blk_mq_sched_dispatch_requests+0x2d/0x70
> > Jun 14 18:08:32 dalek kernel:  blk_mq_run_hw_queue+0x2bf/0x350
> > Jun 14 18:08:32 dalek kernel:  ? srso_return_thunk+0x5/0x5f
> > Jun 14 18:08:32 dalek kernel:  blk_mq_dispatch_list+0x172/0x350
> > Jun 14 18:08:32 dalek kernel:  blk_mq_flush_plug_list+0x51/0x1a0
> > Jun 14 18:08:32 dalek kernel:  ? blk_mq_submit_bio+0x71c/0x9f0
> > Jun 14 18:08:32 dalek kernel:  __blk_flush_plug+0x112/0x180
> > Jun 14 18:08:32 dalek kernel:  ? srso_return_thunk+0x5/0x5f
> > Jun 14 18:08:32 dalek kernel:  __submit_bio+0x19c/0x260
> > Jun 14 18:08:32 dalek kernel:  __submit_bio_noacct+0x8e/0x210
> > Jun 14 18:08:32 dalek kernel:  do_region+0x14c/0x2a0
> > Jun 14 18:08:32 dalek kernel:  dispatch_io+0xf1/0x150
> > Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_get_page+0x10/0x10
> > Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_next_page+0x10/0x10
> > Jun 14 18:08:32 dalek kernel:  ? __pfx_read_callback+0x10/0x10
> > Jun 14 18:08:32 dalek kernel:  dm_io+0x169/0x2d0
> > Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_get_page+0x10/0x10
> > Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_next_page+0x10/0x10
> > Jun 14 18:08:32 dalek kernel:  do_reads+0x149/0x230
> > Jun 14 18:08:32 dalek kernel:  ? __pfx_read_callback+0x10/0x10
> > Jun 14 18:08:32 dalek kernel:  do_mirror+0x11a/0x2b0
> > Jun 14 18:08:32 dalek kernel:  process_one_work+0x19e/0x390
> > Jun 14 18:08:32 dalek kernel:  worker_thread+0x1a6/0x310
> > Jun 14 18:08:32 dalek kernel:  ? __pfx_worker_thread+0x10/0x10
> > Jun 14 18:08:32 dalek kernel:  kthread+0xe4/0x120
> > Jun 14 18:08:32 dalek kernel:  ? __pfx_kthread+0x10/0x10
> > Jun 14 18:08:32 dalek kernel:  ret_from_fork+0x1a1/0x270
> > Jun 14 18:08:32 dalek kernel:  ? __pfx_kthread+0x10/0x10
> > Jun 14 18:08:32 dalek kernel:  ret_from_fork_asm+0x1a/0x30
> > Jun 14 18:08:32 dalek kernel:  </TASK>
> > Jun 14 18:08:32 dalek kernel: ---[ end trace 0000000000000000 ]---
> > Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > Jun 14 18:08:37 dalek kernel: blk_print_req_error: 241000 callbacks suppressed
> > Jun 14 18:08:37 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > 
> > 
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Repeatable, raid1+O_DIRECT, hang/warn
  2026-06-15 12:50   ` Dr. David Alan Gilbert
@ 2026-06-15 23:16     ` Vjaceslavs Klimovs
  2026-06-16  0:06       ` Keith Busch
  2026-06-16 15:55       ` Mikulas Patocka
  0 siblings, 2 replies; 22+ messages in thread
From: Vjaceslavs Klimovs @ 2026-06-15 23:16 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Thorsten Leemhuis, kbusch, trnka, linux-block, dm-devel,
	Linux kernel regressions list

Hi Dave, all,

I'm one of the original reporters and very much a user, not a block/dm
developer, so please sanity-check all of this.

Your trace looks like what the two earlier reports hit: a read reaching
a leaf device with sectors > 0 but phys_seg 0 (an empty bio). One aside
that may help read the trace: blk_io_trace.error is a __u16, so the
bracketed values on your C lines are errnos as u16 (65514 = -EINVAL,
65531 = -EIO).

The WARN itself is new, the bad bio isn't. bio_add_page() only started
rejecting len == 0 in 643893647cac ("block: reject zero length in
bio_add_page()", v7.1-rc1); on 7.0.8 the same empty bio tripped
scsi_alloc_sgtables()'s !nr_segs instead, which matches what you saw.
That fits your "not a recent regression": the condition is older, v7.1
just made it loud.

For Tomas's and my reports (QEMU O_DIRECT to the LV block device) the
origin looks like 5ff3f74e145a ("block: simplify direct io validity
check", v6.18): blkdev_dio_invalid() now checks only aggregate
ki_pos | count alignment and dropped the per-segment
bdev_iter_is_aligned() walk, so a degenerate or misaligned O_DIRECT no
longer gets -EINVAL at the fops boundary. But your reproducer reads a
file, which goes through the filesystem O_DIRECT path and never calls
blkdev_dio_invalid(), and still makes the empty bio. So it isn't only
that one entry point.

dm-mirror then hangs because Keith's f7b24c7b41f2 only covers md
raid1/raid10; legacy dm-mirror (dm-raid1.c) has no equivalent and
rebuilds the empty read onto the other leg. Note the leg's status isn't
even consistent (your SATA path returns BLK_STS_IOERR, not
BLK_STS_INVAL), so copying that status check into dm-mirror probably
wouldn't catch every case.

For what it's worth, that points me toward rejecting the empty or
misaligned bio once, at submission, with -EINVAL, rather than teaching
each consumer to tolerate it. But you'll know the tradeoffs far better
than I do.

I have a small QEMU + LVM raid1/mirror setup that reproduces the
block-device variant and bisects to 5ff3f74e. Happy to run your file
reproducer with some instrumentation at the dm-mirror read entry
(bi_size vs bio_sectors vs bvec lengths) to see whether the bio is
already empty on arrival or built that way on the retry, and to test
any patch.

Thanks,
Vjaceslavs


On Mon, Jun 15, 2026 at 5:50 AM Dr. David Alan Gilbert
<linux@treblig.org> wrote:
>
> * Thorsten Leemhuis (regressions@leemhuis.info) wrote:
> > On 6/14/26 19:57, Dr. David Alan Gilbert wrote:
> > >
> > >   I've got a repeatable raid hang/warn and would appreciate some pointers
> > > as where to debug.
> > >   (I've been logging stuff on  https://bugzilla.kernel.org/show_bug.cgi?id=221535 )
> >
> > Note: not my area of expertise, so I might be sending you totally
> > off-track with this comment. Feel free to ignore it. But FWIW:
>
> Hi Thorsten,
>   Thanks for the reply - these do seem to be related!
> (So copying in Keith, Vjaceslavs, and Tomáš )
> (Not my area either).
>
> > Have you seen these reports?
> > https://lore.kernel.org/all/2982107.4sosBPzcNG@electra/
> > https://lore.kernel.org/all/CAC_j7i1R7oy+nRhxEjCTba=DUgn02w9X+p94DCu0aHv5+5tKnQ@mail.gmail.com/
>
> I hadn't!  Those are both the problem I originally was trying to debug
> and stumbled into the WARN/BUG/hang with my test program.
>
> > The former lead to a fix in the mdraid code that should be in the kernel
> > version you are using. But in a reply to the latter report the repoter
> > claimed that that fix is not enough (claiming "this was obvious" and
> > also using dm), but things then stalled there.
>
> Yeh I see my world has Keith's f7b24c7b41f23
>
> I think the problem I'm seeing is zero length requests coming from somewhere.
>
> The WARN I'm seeing in 7.1.0-rc7+ is:
>
> [ 2681.597042] device-mapper: raid1: Mirror read failed from 252:25. Trying alternative device.
> [ 2681.631933] ------------[ cut here ]------------
> [ 2681.631939] WARNING: block/bio.c:1044 at bio_add_page+0x18b/0x250, CPU#22: kworker/22:0/18929
>
> 1039 int bio_add_page(struct bio *bio, struct page *page,
> 1040                  unsigned int len, unsigned int offset)
> 1041 {
> 1042         if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
> 1043                 return 0;
> 1044         if (WARN_ON_ONCE(len == 0))
> 1045                 return 0;
>
> So it's the ' if (WARN_ON_ONCE(len == 0))'
>
> and the warn I got on the older 7.0.8 was:
> [Sun May 17 17:22:52 2026] WARNING: drivers/scsi/scsi_lib.c:1140 at scsi_alloc_sgtables+0x38a/0x400, CPU#28: kworker/28:1H/3943
>
> which I *think* corresponds to:
> 1164         if (WARN_ON_ONCE(!nr_segs))
> 1165                 return BLK_STS_IOERR;
>
> so it sounds like we need to find where zero length requests are coming from??
>
> Thanks again,
>
> Dave
>
> > Ciao, Thorsten
> >
> > >   This started off as debugging a case where I'd get my RAID1 (on the host)
> > > getting a reliable 'rescheduling sector'/disk failure while running the qemu block test suite
> > > during a qemu build, but then I tried to build a smaller discrete
> > > test, and now I've got a simply triggerable warn and test hang.
> > > There's no errors from the underlying SATA layer on the storage,
> > > everything resyncs just fine.
> > >
> > > I've got an existing LVM vg ('main') with two mirrors on sda2, and sdb2
> > > which are SATA disks.
> > >
> > > # lvcreate --type mirror --mirrors 1 -L 1G main /dev/sda2 /dev/sdb2
> > > # mkfs.ext4 /dev/mapper/main-lvol0
> > > # mount /dev/mapper/main-lvol0 /mnt/tmp/
> > > # chmod a+rwx /mnt/tmp
> > >
> > > $ dd if=/dev/zero of=/mnt/tmp/testfile bs=1024k count=1
> > >
> > > (I then wait for the IO to stop)
> > >
> > > then we've got this little test program:
> > >
> > > <--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><-->
> > > #include <errno.h>
> > > #include <fcntl.h>
> > > #include <asm-generic/fcntl.h>
> > > #include <stdio.h>
> > > #include <unistd.h>
> > >
> > >
> > > const char* path="/mnt/tmp/testfile";
> > > static char buf[8192];
> > >
> > > int main()
> > > {
> > >   int fd=open(path, O_RDWR|O_DIRECT|O_CLOEXEC);
> > >
> > >   errno=0;
> > >   int res3=pread(fd, buf, 4096, 0);
> > >   printf("pread of 4096 said: %d (%m)\n", res3);
> > >
> > > }
> > > <--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><-->
> > >
> > > running that, either hangs or gets a 'pread of 4096 said: -1 (Input/output error)'
> > > when it hangs it's unkillable.
> > >
> > > at the moment (on 7.1.0-rc7) this is giving:
> > > Jun 14 18:08:32 dalek kernel: device-mapper: raid1: Mirror read failed from 252:24. Trying alternative device.
> > > Jun 14 18:08:32 dalek kernel: ------------[ cut here ]------------
> > > Jun 14 18:08:32 dalek dmeventd[1010]: Primary mirror device 252:24 read failed.
> > > Jun 14 18:08:32 dalek kernel: WARNING: block/bio.c:1044 at bio_add_page+0x18b/0x250, CPU#15: kworker/15:1/369
> > >
> > > (full backtrace below)
> > > (Note there is a moan in there about sdb IO error - repeated a lot - but
> > > again, there's no SATA level errors, and the drive is fine on smart, and
> > > I can read the whole of the underlying lvm mirrors, so I don't think it's
> > > physically there).
> > >
> > > I did a blktrace, although that gives me a 23G blkparse output, hmm
> > > (I see each event repeated a lot - maybe per thread?)
> > >
> > > 252,26  15        1     0.000000000  3435  Q  RS 264192 + 8 [dbf]
> > >   252,26 is /dev/mapper/main-lvol0
> > > 252,24  15        1     0.000005501  3435  A  RS 264192 + 8 <- (252,26) 264192
> > >   252,24 is main-lvol0_mimage_0
> > > 252,24  15        2     0.000005761  3435  Q  RS 264192 + 8 [dbf]
> > >   8,0   15        1     0.000008646  3435  A  RS 71634944 + 8 <- (252,24) 264192
> > >     so that's sda
> > >   8,0   15        2     0.000008787  3435  A  RS 73734144 + 8 <- (8,2) 71634944
> > >     I guess mapping down from sda2 to sda
> > >   8,0   15        3     0.000009037  3435  Q  RS 73734144 + 8 [dbf]
> > >   8,0   15        4     0.000009809  3435  C  RS 73734144 + 8 [65514]
> > >       ??? Hmm what's the 65514 there?
> > > 252,24  15        3     0.000010320  3435  C  RS 264192 + 8 [65514]
> > > 252,25  15        1     0.000290384   369  Q   R 264192 + 8 [kworker/15:1]
> > >    252,25 is main-lvol0_mimage_1
> > >
> > > and at this point I'm a bit lost as to what I'm looking for.
> > >
> > > Hints appreciated!
> > >
> > > (I don't believe this is a regression - or at least not recent)
> > >
> > > Dave
> > >
> > >
> > >
> > >
> > > Jun 14 18:08:32 dalek kernel: device-mapper: raid1: Mirror read failed from 252:24. Trying alternative device.
> > > Jun 14 18:08:32 dalek kernel: ------------[ cut here ]------------
> > > Jun 14 18:08:32 dalek dmeventd[1010]: Primary mirror device 252:24 read failed.
> > > Jun 14 18:08:32 dalek kernel: WARNING: block/bio.c:1044 at bio_add_page+0x18b/0x250, CPU#15: kworker/15:1/369
> > > Jun 14 18:08:32 dalek dmeventd[1010]: main-lvol0 is now in-sync.
> > > Jun 14 18:08:32 dalek kernel: Modules linked in: nft_masq nft_reject_ipv4 act_csum cls_u32 sch_htb nf_nat_tftp nf_conntrack_tftp bridge stp llc rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reje>
> > > Jun 14 18:08:32 dalek kernel:  drm_panel_backlight_quirks gpu_sched drm_suballoc_helper video nvme drm_display_helper nvme_core cec nvme_keyring sp5100_tco nvme_auth wmi serio_raw fuse scsi_dh_alua i2c_dev scsi_dh_rdac scsi_dh_emc
> > > Jun 14 18:08:32 dalek kernel: CPU: 15 UID: 0 PID: 369 Comm: kworker/15:1 Not tainted 7.1.0-rc7+ #786 PREEMPT(lazy)
> > > Jun 14 18:08:32 dalek kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Pro4, BIOS P3.10 07/13/2020
> > > Jun 14 18:08:32 dalek kernel: Workqueue: kmirrord do_mirror
> > > Jun 14 18:08:32 dalek kernel: RIP: 0010:bio_add_page+0x18b/0x250
> > > Jun 14 18:08:32 dalek kernel: Code: 24 10 4c 8b 04 24 84 c0 0f 85 c9 00 00 00 41 0f b7 40 78 48 8b 74 24 08 8b 4c 24 14 e9 b4 fe ff ff 0f 0b 31 c0 e9 55 d1 af 00 <0f> 0b eb f5 48 8b 7f 08 83 7f 60 05 0f 85 00 ff ff ff 49 8b 3b 4c
> > > Jun 14 18:08:32 dalek kernel: RSP: 0018:ffffd1fb8176fc10 EFLAGS: 00010246
> > > Jun 14 18:08:32 dalek kernel: RAX: 0000000000000000 RBX: ffffd1fb8176fd18 RCX: 0000000000000000
> > > Jun 14 18:08:32 dalek kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8d1a8eb28b00
> > > Jun 14 18:08:32 dalek kernel: RBP: 0000000000000000 R08: ffffd1fb8176fc38 R09: ffffd1fb8176fc40
> > > Jun 14 18:08:32 dalek kernel: R10: ffffd1fb8176fc34 R11: 0000000000000000 R12: 0000000000000000
> > > Jun 14 18:08:32 dalek kernel: R13: ffffd1fb8176fd90 R14: 0000000000000001 R15: ffff8d1a8eb28b00
> > > Jun 14 18:08:32 dalek kernel: FS:  0000000000000000(0000) GS:ffff8d29d161f000(0000) knlGS:0000000000000000
> > > Jun 14 18:08:32 dalek kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > Jun 14 18:08:32 dalek kernel: CR2: 00007f0ddcd7b9d0 CR3: 000000023dcbf000 CR4: 0000000000350ef0
> > > Jun 14 18:08:32 dalek kernel: Call Trace:
> > > Jun 14 18:08:32 dalek kernel:  <TASK>
> > > Jun 14 18:08:32 dalek kernel:  do_region+0x227/0x2a0
> > > Jun 14 18:08:32 dalek kernel:  dispatch_io+0xf1/0x150
> > > Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_get_page+0x10/0x10
> > > Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_next_page+0x10/0x10
> > > Jun 14 18:08:32 dalek kernel:  ? __pfx_read_callback+0x10/0x10
> > > Jun 14 18:08:32 dalek kernel:  dm_io+0x169/0x2d0
> > > Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_get_page+0x10/0x10
> > > Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_next_page+0x10/0x10
> > > Jun 14 18:08:32 dalek kernel:  do_reads+0x149/0x230
> > > Jun 14 18:08:32 dalek kernel:  ? __pfx_read_callback+0x10/0x10
> > > Jun 14 18:08:32 dalek kernel:  do_mirror+0x11a/0x2b0
> > > Jun 14 18:08:32 dalek kernel:  process_one_work+0x19e/0x390
> > > Jun 14 18:08:32 dalek kernel:  worker_thread+0x1a6/0x310
> > > Jun 14 18:08:32 dalek kernel:  ? __pfx_worker_thread+0x10/0x10
> > > Jun 14 18:08:32 dalek kernel:  kthread+0xe4/0x120
> > > Jun 14 18:08:32 dalek kernel:  ? __pfx_kthread+0x10/0x10
> > > Jun 14 18:08:32 dalek kernel:  ret_from_fork+0x1a1/0x270
> > > Jun 14 18:08:32 dalek kernel:  ? __pfx_kthread+0x10/0x10
> > > Jun 14 18:08:32 dalek kernel:  ret_from_fork_asm+0x1a/0x30
> > > Jun 14 18:08:32 dalek kernel:  </TASK>
> > > Jun 14 18:08:32 dalek kernel: ---[ end trace 0000000000000000 ]---
> > > Jun 14 18:08:32 dalek kernel: ------------[ cut here ]------------
> > > Jun 14 18:08:32 dalek kernel: WARNING: drivers/scsi/scsi_lib.c:1164 at scsi_alloc_sgtables+0x38a/0x400, CPU#15: kworker/15:1/369
> > > Jun 14 18:08:32 dalek kernel: Modules linked in: nft_masq nft_reject_ipv4 act_csum cls_u32 sch_htb nf_nat_tftp nf_conntrack_tftp bridge stp llc rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reje>
> > > Jun 14 18:08:32 dalek kernel:  drm_panel_backlight_quirks gpu_sched drm_suballoc_helper video nvme drm_display_helper nvme_core cec nvme_keyring sp5100_tco nvme_auth wmi serio_raw fuse scsi_dh_alua i2c_dev scsi_dh_rdac scsi_dh_emc
> > > Jun 14 18:08:32 dalek kernel: CPU: 15 UID: 0 PID: 369 Comm: kworker/15:1 Tainted: G        W           7.1.0-rc7+ #786 PREEMPT(lazy)
> > > Jun 14 18:08:32 dalek kernel: Tainted: [W]=WARN
> > > Jun 14 18:08:32 dalek kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Pro4, BIOS P3.10 07/13/2020
> > > Jun 14 18:08:32 dalek kernel: Workqueue: kmirrord do_mirror
> > > Jun 14 18:08:32 dalek kernel: RIP: 0010:scsi_alloc_sgtables+0x38a/0x400
> > > Jun 14 18:08:32 dalek kernel: Code: 8b 3d ba 2d a9 01 e9 d1 fd ff ff 48 8b 75 00 48 8d bb f0 fe ff ff e8 15 b7 b0 ff 48 89 ab e0 00 00 00 89 45 08 e9 30 ff ff ff <0f> 0b 4c 8b 6c 24 30 b8 0a 00 00 00 e9 21 ff ff ff b8 09 00 00 00
> > > Jun 14 18:08:32 dalek kernel: RSP: 0018:ffffd1fb8176f7f0 EFLAGS: 00010246
> > > Jun 14 18:08:32 dalek kernel: RAX: 0000000000000000 RBX: ffff8d1aedad0110 RCX: 0000000000000009
> > > Jun 14 18:08:32 dalek kernel: RDX: 0000000000000000 RSI: ffffffff99c15960 RDI: ffff8d1aedad0110
> > > Jun 14 18:08:32 dalek kernel: RBP: ffff8d1a93d17000 R08: ffff8d1aedad0110 R09: ffff8d1a818fa800
> > > Jun 14 18:08:32 dalek kernel: R10: 7020676e69736961 R11: 0000000000000000 R12: 0000000000000000
> > > Jun 14 18:08:32 dalek kernel: R13: 0000000000000000 R14: ffff8d1a93394000 R15: ffff8d1a93d17000
> > > Jun 14 18:08:32 dalek kernel: FS:  0000000000000000(0000) GS:ffff8d29d161f000(0000) knlGS:0000000000000000
> > > Jun 14 18:08:32 dalek kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > Jun 14 18:08:32 dalek kernel: CR2: 00007f0ddcd7b9d0 CR3: 000000023dcbf000 CR4: 0000000000350ef0
> > > Jun 14 18:08:32 dalek kernel: Call Trace:
> > > Jun 14 18:08:32 dalek kernel:  <TASK>
> > > Jun 14 18:08:32 dalek kernel:  ? srso_return_thunk+0x5/0x5f
> > > Jun 14 18:08:32 dalek kernel:  sd_setup_read_write_cmnd+0x9d/0x740
> > > Jun 14 18:08:32 dalek kernel:  ? srso_return_thunk+0x5/0x5f
> > > Jun 14 18:08:32 dalek kernel:  scsi_queue_rq+0x4d2/0x890
> > > Jun 14 18:08:32 dalek kernel:  blk_mq_dispatch_rq_list+0x241/0x530
> > > Jun 14 18:08:32 dalek kernel:  ? srso_return_thunk+0x5/0x5f
> > > Jun 14 18:08:32 dalek kernel:  ? sbitmap_get+0x61/0x100
> > > Jun 14 18:08:32 dalek kernel:  __blk_mq_do_dispatch_sched+0x330/0x340
> > > Jun 14 18:08:32 dalek kernel:  __blk_mq_sched_dispatch_requests+0x143/0x180
> > > Jun 14 18:08:32 dalek kernel:  blk_mq_sched_dispatch_requests+0x2d/0x70
> > > Jun 14 18:08:32 dalek kernel:  blk_mq_run_hw_queue+0x2bf/0x350
> > > Jun 14 18:08:32 dalek kernel:  ? srso_return_thunk+0x5/0x5f
> > > Jun 14 18:08:32 dalek kernel:  blk_mq_dispatch_list+0x172/0x350
> > > Jun 14 18:08:32 dalek kernel:  blk_mq_flush_plug_list+0x51/0x1a0
> > > Jun 14 18:08:32 dalek kernel:  ? blk_mq_submit_bio+0x71c/0x9f0
> > > Jun 14 18:08:32 dalek kernel:  __blk_flush_plug+0x112/0x180
> > > Jun 14 18:08:32 dalek kernel:  ? srso_return_thunk+0x5/0x5f
> > > Jun 14 18:08:32 dalek kernel:  __submit_bio+0x19c/0x260
> > > Jun 14 18:08:32 dalek kernel:  __submit_bio_noacct+0x8e/0x210
> > > Jun 14 18:08:32 dalek kernel:  do_region+0x14c/0x2a0
> > > Jun 14 18:08:32 dalek kernel:  dispatch_io+0xf1/0x150
> > > Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_get_page+0x10/0x10
> > > Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_next_page+0x10/0x10
> > > Jun 14 18:08:32 dalek kernel:  ? __pfx_read_callback+0x10/0x10
> > > Jun 14 18:08:32 dalek kernel:  dm_io+0x169/0x2d0
> > > Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_get_page+0x10/0x10
> > > Jun 14 18:08:32 dalek kernel:  ? __pfx_bio_next_page+0x10/0x10
> > > Jun 14 18:08:32 dalek kernel:  do_reads+0x149/0x230
> > > Jun 14 18:08:32 dalek kernel:  ? __pfx_read_callback+0x10/0x10
> > > Jun 14 18:08:32 dalek kernel:  do_mirror+0x11a/0x2b0
> > > Jun 14 18:08:32 dalek kernel:  process_one_work+0x19e/0x390
> > > Jun 14 18:08:32 dalek kernel:  worker_thread+0x1a6/0x310
> > > Jun 14 18:08:32 dalek kernel:  ? __pfx_worker_thread+0x10/0x10
> > > Jun 14 18:08:32 dalek kernel:  kthread+0xe4/0x120
> > > Jun 14 18:08:32 dalek kernel:  ? __pfx_kthread+0x10/0x10
> > > Jun 14 18:08:32 dalek kernel:  ret_from_fork+0x1a1/0x270
> > > Jun 14 18:08:32 dalek kernel:  ? __pfx_kthread+0x10/0x10
> > > Jun 14 18:08:32 dalek kernel:  ret_from_fork_asm+0x1a/0x30
> > > Jun 14 18:08:32 dalek kernel:  </TASK>
> > > Jun 14 18:08:32 dalek kernel: ---[ end trace 0000000000000000 ]---
> > > Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > > Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > > Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > > Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > > Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > > Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > > Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > > Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > > Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > > Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > > Jun 14 18:08:37 dalek kernel: blk_print_req_error: 241000 callbacks suppressed
> > > Jun 14 18:08:37 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
> > >
> > >
> >
> --
>  -----Open up your eyes, open up your mind, open up your code -------
> / Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \
> \        dave @ treblig.org |                               | In Hex /
>  \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Repeatable, raid1+O_DIRECT, hang/warn
  2026-06-15 23:16     ` Vjaceslavs Klimovs
@ 2026-06-16  0:06       ` Keith Busch
  2026-06-16  1:25         ` Vjaceslavs Klimovs
  2026-06-16 12:57         ` Dr. David Alan Gilbert
  2026-06-16 15:55       ` Mikulas Patocka
  1 sibling, 2 replies; 22+ messages in thread
From: Keith Busch @ 2026-06-16  0:06 UTC (permalink / raw)
  To: Vjaceslavs Klimovs
  Cc: Dr. David Alan Gilbert, Thorsten Leemhuis, trnka, linux-block,
	dm-devel, Linux kernel regressions list

On Mon, Jun 15, 2026 at 04:16:12PM -0700, Vjaceslavs Klimovs wrote:
> Your trace looks like what the two earlier reports hit: a read reaching
> a leaf device with sectors > 0 but phys_seg 0 (an empty bio). One aside
> that may help read the trace: blk_io_trace.error is a __u16, so the
> bracketed values on your C lines are errnos as u16 (65514 = -EINVAL,
> 65531 = -EIO).
> 
> The WARN itself is new, the bad bio isn't. bio_add_page() only started
> rejecting len == 0 in 643893647cac ("block: reject zero length in
> bio_add_page()", v7.1-rc1); on 7.0.8 the same empty bio tripped
> scsi_alloc_sgtables()'s !nr_segs instead, which matches what you saw.
> That fits your "not a recent regression": the condition is older, v7.1
> just made it loud.
> 
> For Tomas's and my reports (QEMU O_DIRECT to the LV block device) the
> origin looks like 5ff3f74e145a ("block: simplify direct io validity
> check", v6.18): blkdev_dio_invalid() now checks only aggregate
> ki_pos | count alignment and dropped the per-segment
> bdev_iter_is_aligned() walk, so a degenerate or misaligned O_DIRECT no
> longer gets -EINVAL at the fops boundary. But your reproducer reads a
> file, which goes through the filesystem O_DIRECT path and never calls
> blkdev_dio_invalid(), and still makes the empty bio. So it isn't only
> that one entry point.
> 
> dm-mirror then hangs because Keith's f7b24c7b41f2 only covers md
> raid1/raid10; legacy dm-mirror (dm-raid1.c) has no equivalent and
> rebuilds the empty read onto the other leg. Note the leg's status isn't
> even consistent (your SATA path returns BLK_STS_IOERR, not
> BLK_STS_INVAL), so copying that status check into dm-mirror probably
> wouldn't catch every case.
> 
> For what it's worth, that points me toward rejecting the empty or
> misaligned bio once, at submission, with -EINVAL, rather than teaching
> each consumer to tolerate it. But you'll know the tradeoffs far better
> than I do.
> 
> I have a small QEMU + LVM raid1/mirror setup that reproduces the
> block-device variant and bisects to 5ff3f74e. Happy to run your file
> reproducer with some instrumentation at the dm-mirror read entry
> (bi_size vs bio_sectors vs bvec lengths) to see whether the bio is
> already empty on arrival or built that way on the retry, and to test
> any patch.

Thanks for following up here. I didn't initially see your follow-up
until Thorsten linked it. I apologize for missing that, this feature is
important so I don't want to see anything regress for it.

There is a known bug fix I think future tests should include:

  https://lore.kernel.org/linux-block/20260612223205.465913-1-kbusch@meta.com/

This likely isn't the fix you're looking for, but including it rules out
conditions that are not important here.

After that, can we try this suggestion and see if the hang goes away?

  https://lore.kernel.org/linux-block/ajBb8tK-0aJBpIgF@kbusch-mbp/

I expect the original test case to still return an error (and I think it
was designed to), but it shouldn't produce the warn or bug splats with a
stuck uninterruptable task.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Repeatable, raid1+O_DIRECT, hang/warn
  2026-06-16  0:06       ` Keith Busch
@ 2026-06-16  1:25         ` Vjaceslavs Klimovs
  2026-06-16 12:57         ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 22+ messages in thread
From: Vjaceslavs Klimovs @ 2026-06-16  1:25 UTC (permalink / raw)
  To: Keith Busch
  Cc: Dr. David Alan Gilbert, Thorsten Leemhuis, trnka, linux-block,
	dm-devel, Linux kernel regressions list

Hi Keith,

Thanks. I tested both patches on current mainline
(v7.1-rc7-271-g424280953322) with my QEMU + LVM "--type mirror"
reproducer (virtio-blk, cache=none, aio=native).

With only the "block: check bio split for unaligned bvec" patch, the
hang still reproduces. The WARN fires from a kmirrord worker:

  WARNING: block/bio.c:1044 at bio_add_page+0x108/0x200
  Workqueue: kmirrord do_mirror
  Call Trace:
   bio_add_page+0x108/0x200
   do_region+0x21d/0x270
   dispatch_io+0xf1/0x150
   dm_io+0x136/0x240
   do_reads+0x13e/0x210
   do_mirror+0x117/0x2b0

and the VM then wedges.

With the dm-io.c clone patch applied on top, the WARN and the hang are
both gone. dm-mirror just fails the read instead:

  device-mapper: raid1: Mirror read failed from 252:0. Trying
alternative device.
  device-mapper: raid1: All sides of mirror have failed.
  device-mapper: raid1: Read failure on mirror device 252:1.  Failing I/O.

The guest still gets an I/O error, as you expected, but the host stays
up: no splat, no stuck task. For comparison, on the same kernel the
"--type raid1" case boots the guest and reads fine, and the 128 MB
mirror seed write goes through the clone path without trouble, so
normal I/O looks unaffected.

Thanks,
Vjaceslavs

On Mon, Jun 15, 2026 at 5:06 PM Keith Busch <kbusch@kernel.org> wrote:
>
> On Mon, Jun 15, 2026 at 04:16:12PM -0700, Vjaceslavs Klimovs wrote:
> > Your trace looks like what the two earlier reports hit: a read reaching
> > a leaf device with sectors > 0 but phys_seg 0 (an empty bio). One aside
> > that may help read the trace: blk_io_trace.error is a __u16, so the
> > bracketed values on your C lines are errnos as u16 (65514 = -EINVAL,
> > 65531 = -EIO).
> >
> > The WARN itself is new, the bad bio isn't. bio_add_page() only started
> > rejecting len == 0 in 643893647cac ("block: reject zero length in
> > bio_add_page()", v7.1-rc1); on 7.0.8 the same empty bio tripped
> > scsi_alloc_sgtables()'s !nr_segs instead, which matches what you saw.
> > That fits your "not a recent regression": the condition is older, v7.1
> > just made it loud.
> >
> > For Tomas's and my reports (QEMU O_DIRECT to the LV block device) the
> > origin looks like 5ff3f74e145a ("block: simplify direct io validity
> > check", v6.18): blkdev_dio_invalid() now checks only aggregate
> > ki_pos | count alignment and dropped the per-segment
> > bdev_iter_is_aligned() walk, so a degenerate or misaligned O_DIRECT no
> > longer gets -EINVAL at the fops boundary. But your reproducer reads a
> > file, which goes through the filesystem O_DIRECT path and never calls
> > blkdev_dio_invalid(), and still makes the empty bio. So it isn't only
> > that one entry point.
> >
> > dm-mirror then hangs because Keith's f7b24c7b41f2 only covers md
> > raid1/raid10; legacy dm-mirror (dm-raid1.c) has no equivalent and
> > rebuilds the empty read onto the other leg. Note the leg's status isn't
> > even consistent (your SATA path returns BLK_STS_IOERR, not
> > BLK_STS_INVAL), so copying that status check into dm-mirror probably
> > wouldn't catch every case.
> >
> > For what it's worth, that points me toward rejecting the empty or
> > misaligned bio once, at submission, with -EINVAL, rather than teaching
> > each consumer to tolerate it. But you'll know the tradeoffs far better
> > than I do.
> >
> > I have a small QEMU + LVM raid1/mirror setup that reproduces the
> > block-device variant and bisects to 5ff3f74e. Happy to run your file
> > reproducer with some instrumentation at the dm-mirror read entry
> > (bi_size vs bio_sectors vs bvec lengths) to see whether the bio is
> > already empty on arrival or built that way on the retry, and to test
> > any patch.
>
> Thanks for following up here. I didn't initially see your follow-up
> until Thorsten linked it. I apologize for missing that, this feature is
> important so I don't want to see anything regress for it.
>
> There is a known bug fix I think future tests should include:
>
>   https://lore.kernel.org/linux-block/20260612223205.465913-1-kbusch@meta.com/
>
> This likely isn't the fix you're looking for, but including it rules out
> conditions that are not important here.
>
> After that, can we try this suggestion and see if the hang goes away?
>
>   https://lore.kernel.org/linux-block/ajBb8tK-0aJBpIgF@kbusch-mbp/
>
> I expect the original test case to still return an error (and I think it
> was designed to), but it shouldn't produce the warn or bug splats with a
> stuck uninterruptable task.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Repeatable, raid1+O_DIRECT, hang/warn
  2026-06-16  0:06       ` Keith Busch
  2026-06-16  1:25         ` Vjaceslavs Klimovs
@ 2026-06-16 12:57         ` Dr. David Alan Gilbert
  2026-06-16 13:08           ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 22+ messages in thread
From: Dr. David Alan Gilbert @ 2026-06-16 12:57 UTC (permalink / raw)
  To: Keith Busch
  Cc: Vjaceslavs Klimovs, Thorsten Leemhuis, trnka, linux-block,
	dm-devel, Linux kernel regressions list

* Keith Busch (kbusch@kernel.org) wrote:
> On Mon, Jun 15, 2026 at 04:16:12PM -0700, Vjaceslavs Klimovs wrote:
> > Your trace looks like what the two earlier reports hit: a read reaching
> > a leaf device with sectors > 0 but phys_seg 0 (an empty bio). One aside
> > that may help read the trace: blk_io_trace.error is a __u16, so the
> > bracketed values on your C lines are errnos as u16 (65514 = -EINVAL,
> > 65531 = -EIO).
> > 
> > The WARN itself is new, the bad bio isn't. bio_add_page() only started
> > rejecting len == 0 in 643893647cac ("block: reject zero length in
> > bio_add_page()", v7.1-rc1); on 7.0.8 the same empty bio tripped
> > scsi_alloc_sgtables()'s !nr_segs instead, which matches what you saw.
> > That fits your "not a recent regression": the condition is older, v7.1
> > just made it loud.
> > 
> > For Tomas's and my reports (QEMU O_DIRECT to the LV block device) the
> > origin looks like 5ff3f74e145a ("block: simplify direct io validity
> > check", v6.18): blkdev_dio_invalid() now checks only aggregate
> > ki_pos | count alignment and dropped the per-segment
> > bdev_iter_is_aligned() walk, so a degenerate or misaligned O_DIRECT no
> > longer gets -EINVAL at the fops boundary. But your reproducer reads a
> > file, which goes through the filesystem O_DIRECT path and never calls
> > blkdev_dio_invalid(), and still makes the empty bio. So it isn't only
> > that one entry point.
> > 
> > dm-mirror then hangs because Keith's f7b24c7b41f2 only covers md
> > raid1/raid10; legacy dm-mirror (dm-raid1.c) has no equivalent and
> > rebuilds the empty read onto the other leg. Note the leg's status isn't
> > even consistent (your SATA path returns BLK_STS_IOERR, not
> > BLK_STS_INVAL), so copying that status check into dm-mirror probably
> > wouldn't catch every case.
> > 
> > For what it's worth, that points me toward rejecting the empty or
> > misaligned bio once, at submission, with -EINVAL, rather than teaching
> > each consumer to tolerate it. But you'll know the tradeoffs far better
> > than I do.
> > 
> > I have a small QEMU + LVM raid1/mirror setup that reproduces the
> > block-device variant and bisects to 5ff3f74e. Happy to run your file
> > reproducer with some instrumentation at the dm-mirror read entry
> > (bi_size vs bio_sectors vs bvec lengths) to see whether the bio is
> > already empty on arrival or built that way on the retry, and to test
> > any patch.
> 
> Thanks for following up here. I didn't initially see your follow-up
> until Thorsten linked it. I apologize for missing that, this feature is
> important so I don't want to see anything regress for it.
> 
> There is a known bug fix I think future tests should include:
> 
>   https://lore.kernel.org/linux-block/20260612223205.465913-1-kbusch@meta.com/

> This likely isn't the fix you're looking for, but including it rules out
> conditions that are not important here.
> 
> After that, can we try this suggestion and see if the hang goes away?
> 
>   https://lore.kernel.org/linux-block/ajBb8tK-0aJBpIgF@kbusch-mbp/

With just that one in, the machine survives - thanks!

It does give:

[  505.208354] device-mapper: raid1: Mirror read failed from 252:24. Trying alternative device.
[  505.239376] device-mapper: raid1: All sides of mirror have failed.
[  505.239389] device-mapper: raid1: Read failure on mirror device 252:25.  Failing I/O.
[  505.239394] device-mapper: raid1: Mirror read failed.

Although as far as I can tell the RAID hasn't errored and is still in sync.

If I turn the test case into a write (just s/pread/pwrite/ ) - the machine
still survives but then it does lose raid sync, and the raid resync
seems to stick until I do a 'lvchange --refresh main/lvol0'
which recovers after having spat out a:

[  865.319527] Buffer I/O error on dev dm-26, logical block 262128, async page read

> I expect the original test case to still return an error (and I think it
> was designed to), but it shouldn't produce the warn or bug splats with a
> stuck uninterruptable task.

It's not clear to me if it was designed to fail or not; I've not had
a chance to rerun the original qemu block tests yet, and I don't know
if old kernels succesfully used O_DIRECT in this case.

It still feels that my pwrite case above shouldn't cause a raid de-sync
(especially since a normal user can do it).

Dave
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Repeatable, raid1+O_DIRECT, hang/warn
  2026-06-16 12:57         ` Dr. David Alan Gilbert
@ 2026-06-16 13:08           ` Dr. David Alan Gilbert
  2026-06-16 14:04             ` Dr. David Alan Gilbert
  2026-06-16 14:19             ` Keith Busch
  0 siblings, 2 replies; 22+ messages in thread
From: Dr. David Alan Gilbert @ 2026-06-16 13:08 UTC (permalink / raw)
  To: Keith Busch, zkabelac
  Cc: Vjaceslavs Klimovs, Thorsten Leemhuis, trnka, linux-block,
	dm-devel, Linux kernel regressions list

* Dr. David Alan Gilbert (dave@treblig.org) wrote:
> * Keith Busch (kbusch@kernel.org) wrote:
> > On Mon, Jun 15, 2026 at 04:16:12PM -0700, Vjaceslavs Klimovs wrote:
> > > Your trace looks like what the two earlier reports hit: a read reaching
> > > a leaf device with sectors > 0 but phys_seg 0 (an empty bio). One aside
> > > that may help read the trace: blk_io_trace.error is a __u16, so the
> > > bracketed values on your C lines are errnos as u16 (65514 = -EINVAL,
> > > 65531 = -EIO).
> > > 
> > > The WARN itself is new, the bad bio isn't. bio_add_page() only started
> > > rejecting len == 0 in 643893647cac ("block: reject zero length in
> > > bio_add_page()", v7.1-rc1); on 7.0.8 the same empty bio tripped
> > > scsi_alloc_sgtables()'s !nr_segs instead, which matches what you saw.
> > > That fits your "not a recent regression": the condition is older, v7.1
> > > just made it loud.
> > > 
> > > For Tomas's and my reports (QEMU O_DIRECT to the LV block device) the
> > > origin looks like 5ff3f74e145a ("block: simplify direct io validity
> > > check", v6.18): blkdev_dio_invalid() now checks only aggregate
> > > ki_pos | count alignment and dropped the per-segment
> > > bdev_iter_is_aligned() walk, so a degenerate or misaligned O_DIRECT no
> > > longer gets -EINVAL at the fops boundary. But your reproducer reads a
> > > file, which goes through the filesystem O_DIRECT path and never calls
> > > blkdev_dio_invalid(), and still makes the empty bio. So it isn't only
> > > that one entry point.
> > > 
> > > dm-mirror then hangs because Keith's f7b24c7b41f2 only covers md
> > > raid1/raid10; legacy dm-mirror (dm-raid1.c) has no equivalent and
> > > rebuilds the empty read onto the other leg. Note the leg's status isn't
> > > even consistent (your SATA path returns BLK_STS_IOERR, not
> > > BLK_STS_INVAL), so copying that status check into dm-mirror probably
> > > wouldn't catch every case.
> > > 
> > > For what it's worth, that points me toward rejecting the empty or
> > > misaligned bio once, at submission, with -EINVAL, rather than teaching
> > > each consumer to tolerate it. But you'll know the tradeoffs far better
> > > than I do.
> > > 
> > > I have a small QEMU + LVM raid1/mirror setup that reproduces the
> > > block-device variant and bisects to 5ff3f74e. Happy to run your file
> > > reproducer with some instrumentation at the dm-mirror read entry
> > > (bi_size vs bio_sectors vs bvec lengths) to see whether the bio is
> > > already empty on arrival or built that way on the retry, and to test
> > > any patch.
> > 
> > Thanks for following up here. I didn't initially see your follow-up
> > until Thorsten linked it. I apologize for missing that, this feature is
> > important so I don't want to see anything regress for it.
> > 
> > There is a known bug fix I think future tests should include:
> > 
> >   https://lore.kernel.org/linux-block/20260612223205.465913-1-kbusch@meta.com/
> 
> > This likely isn't the fix you're looking for, but including it rules out
> > conditions that are not important here.
> > 
> > After that, can we try this suggestion and see if the hang goes away?
> > 
> >   https://lore.kernel.org/linux-block/ajBb8tK-0aJBpIgF@kbusch-mbp/
> 
> With just that one in, the machine survives - thanks!
> 
> It does give:
> 
> [  505.208354] device-mapper: raid1: Mirror read failed from 252:24. Trying alternative device.
> [  505.239376] device-mapper: raid1: All sides of mirror have failed.
> [  505.239389] device-mapper: raid1: Read failure on mirror device 252:25.  Failing I/O.
> [  505.239394] device-mapper: raid1: Mirror read failed.
> 
> Although as far as I can tell the RAID hasn't errored and is still in sync.
> 
> If I turn the test case into a write (just s/pread/pwrite/ ) - the machine
> still survives but then it does lose raid sync, and the raid resync
> seems to stick until I do a 'lvchange --refresh main/lvol0'
> which recovers after having spat out a:
> 
> [  865.319527] Buffer I/O error on dev dm-26, logical block 262128, async page read
> 
> > I expect the original test case to still return an error (and I think it
> > was designed to), but it shouldn't produce the warn or bug splats with a
> > stuck uninterruptable task.
> 
> It's not clear to me if it was designed to fail or not; I've not had
> a chance to rerun the original qemu block tests yet, and I don't know
> if old kernels succesfully used O_DIRECT in this case.
> 
> It still feels that my pwrite case above shouldn't cause a raid de-sync
> (especially since a normal user can do it).

Just to follow up on that;  if I use the modern lvm mode 
( lvcreate  -m 1 -L 1G main /dev/sda2 /dev/sdb2 ) rather than
the old mirror with the same patch, then:

  a) I get no log errors with either read or write
  b) read still gives EIO
  c) write apparently succeeds ?!

Dave

> Dave
> -- 
>  -----Open up your eyes, open up your mind, open up your code -------   
> / Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
> \        dave @ treblig.org |                               | In Hex /
>  \ _________________________|_____ http://www.treblig.org   |_______/
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Repeatable, raid1+O_DIRECT, hang/warn
  2026-06-16 13:08           ` Dr. David Alan Gilbert
@ 2026-06-16 14:04             ` Dr. David Alan Gilbert
  2026-06-16 14:19             ` Keith Busch
  1 sibling, 0 replies; 22+ messages in thread
From: Dr. David Alan Gilbert @ 2026-06-16 14:04 UTC (permalink / raw)
  To: Keith Busch, zkabelac
  Cc: Vjaceslavs Klimovs, Thorsten Leemhuis, trnka, linux-block,
	dm-devel, Linux kernel regressions list

* Dr. David Alan Gilbert (dave@treblig.org) wrote:
> * Dr. David Alan Gilbert (dave@treblig.org) wrote:
> > * Keith Busch (kbusch@kernel.org) wrote:
> > > On Mon, Jun 15, 2026 at 04:16:12PM -0700, Vjaceslavs Klimovs wrote:
> > > > Your trace looks like what the two earlier reports hit: a read reaching
> > > > a leaf device with sectors > 0 but phys_seg 0 (an empty bio). One aside
> > > > that may help read the trace: blk_io_trace.error is a __u16, so the
> > > > bracketed values on your C lines are errnos as u16 (65514 = -EINVAL,
> > > > 65531 = -EIO).
> > > > 
> > > > The WARN itself is new, the bad bio isn't. bio_add_page() only started
> > > > rejecting len == 0 in 643893647cac ("block: reject zero length in
> > > > bio_add_page()", v7.1-rc1); on 7.0.8 the same empty bio tripped
> > > > scsi_alloc_sgtables()'s !nr_segs instead, which matches what you saw.
> > > > That fits your "not a recent regression": the condition is older, v7.1
> > > > just made it loud.
> > > > 
> > > > For Tomas's and my reports (QEMU O_DIRECT to the LV block device) the
> > > > origin looks like 5ff3f74e145a ("block: simplify direct io validity
> > > > check", v6.18): blkdev_dio_invalid() now checks only aggregate
> > > > ki_pos | count alignment and dropped the per-segment
> > > > bdev_iter_is_aligned() walk, so a degenerate or misaligned O_DIRECT no
> > > > longer gets -EINVAL at the fops boundary. But your reproducer reads a
> > > > file, which goes through the filesystem O_DIRECT path and never calls
> > > > blkdev_dio_invalid(), and still makes the empty bio. So it isn't only
> > > > that one entry point.
> > > > 
> > > > dm-mirror then hangs because Keith's f7b24c7b41f2 only covers md
> > > > raid1/raid10; legacy dm-mirror (dm-raid1.c) has no equivalent and
> > > > rebuilds the empty read onto the other leg. Note the leg's status isn't
> > > > even consistent (your SATA path returns BLK_STS_IOERR, not
> > > > BLK_STS_INVAL), so copying that status check into dm-mirror probably
> > > > wouldn't catch every case.
> > > > 
> > > > For what it's worth, that points me toward rejecting the empty or
> > > > misaligned bio once, at submission, with -EINVAL, rather than teaching
> > > > each consumer to tolerate it. But you'll know the tradeoffs far better
> > > > than I do.
> > > > 
> > > > I have a small QEMU + LVM raid1/mirror setup that reproduces the
> > > > block-device variant and bisects to 5ff3f74e. Happy to run your file
> > > > reproducer with some instrumentation at the dm-mirror read entry
> > > > (bi_size vs bio_sectors vs bvec lengths) to see whether the bio is
> > > > already empty on arrival or built that way on the retry, and to test
> > > > any patch.
> > > 
> > > Thanks for following up here. I didn't initially see your follow-up
> > > until Thorsten linked it. I apologize for missing that, this feature is
> > > important so I don't want to see anything regress for it.
> > > 
> > > There is a known bug fix I think future tests should include:
> > > 
> > >   https://lore.kernel.org/linux-block/20260612223205.465913-1-kbusch@meta.com/
> > 
> > > This likely isn't the fix you're looking for, but including it rules out
> > > conditions that are not important here.
> > > 
> > > After that, can we try this suggestion and see if the hang goes away?
> > > 
> > >   https://lore.kernel.org/linux-block/ajBb8tK-0aJBpIgF@kbusch-mbp/
> > 
> > With just that one in, the machine survives - thanks!
> > 
> > It does give:
> > 
> > [  505.208354] device-mapper: raid1: Mirror read failed from 252:24. Trying alternative device.
> > [  505.239376] device-mapper: raid1: All sides of mirror have failed.
> > [  505.239389] device-mapper: raid1: Read failure on mirror device 252:25.  Failing I/O.
> > [  505.239394] device-mapper: raid1: Mirror read failed.
> > 
> > Although as far as I can tell the RAID hasn't errored and is still in sync.
> > 
> > If I turn the test case into a write (just s/pread/pwrite/ ) - the machine
> > still survives but then it does lose raid sync, and the raid resync
> > seems to stick until I do a 'lvchange --refresh main/lvol0'
> > which recovers after having spat out a:
> > 
> > [  865.319527] Buffer I/O error on dev dm-26, logical block 262128, async page read
> > 
> > > I expect the original test case to still return an error (and I think it
> > > was designed to), but it shouldn't produce the warn or bug splats with a
> > > stuck uninterruptable task.
> > 
> > It's not clear to me if it was designed to fail or not; I've not had
> > a chance to rerun the original qemu block tests yet, and I don't know
> > if old kernels succesfully used O_DIRECT in this case.
> > 
> > It still feels that my pwrite case above shouldn't cause a raid de-sync
> > (especially since a normal user can do it).
> 
> Just to follow up on that;  if I use the modern lvm mode 
> ( lvcreate  -m 1 -L 1G main /dev/sda2 /dev/sdb2 ) rather than
> the old mirror with the same patch, then:
> 
>   a) I get no log errors with either read or write
>   b) read still gives EIO
>   c) write apparently succeeds ?!

One more confirmation; running qemu's 'make check' during build passes
with no log errors (whether it skipped any tests due to it's detection
code I don't know).

Dave

> Dave
> 
> > Dave
> > -- 
> >  -----Open up your eyes, open up your mind, open up your code -------   
> > / Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
> > \        dave @ treblig.org |                               | In Hex /
> >  \ _________________________|_____ http://www.treblig.org   |_______/
> -- 
>  -----Open up your eyes, open up your mind, open up your code -------   
> / Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
> \        dave @ treblig.org |                               | In Hex /
>  \ _________________________|_____ http://www.treblig.org   |_______/
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Repeatable, raid1+O_DIRECT, hang/warn
  2026-06-16 13:08           ` Dr. David Alan Gilbert
  2026-06-16 14:04             ` Dr. David Alan Gilbert
@ 2026-06-16 14:19             ` Keith Busch
  2026-06-16 15:55               ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 22+ messages in thread
From: Keith Busch @ 2026-06-16 14:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zkabelac, Vjaceslavs Klimovs, Thorsten Leemhuis, trnka,
	linux-block, dm-devel, Linux kernel regressions list

On Tue, Jun 16, 2026 at 01:08:52PM +0000, Dr. David Alan Gilbert wrote:
> ( lvcreate  -m 1 -L 1G main /dev/sda2 /dev/sdb2 ) rather than
> the old mirror with the same patch, then:
> 
>   a) I get no log errors with either read or write
>   b) read still gives EIO

I've a follow up patch to handle the error properly. You want to see
EINVAL, not EIO, and that error shouldn't be considered for determining
the raid health. Something like what f7b24c7b41f23b5 does, but it's a
little more complicated in this path since it doesn't see the lower
level error status and just converts everything to EIO.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Repeatable, raid1+O_DIRECT, hang/warn
  2026-06-16 14:19             ` Keith Busch
@ 2026-06-16 15:55               ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 22+ messages in thread
From: Dr. David Alan Gilbert @ 2026-06-16 15:55 UTC (permalink / raw)
  To: Keith Busch
  Cc: zkabelac, Vjaceslavs Klimovs, Thorsten Leemhuis, trnka,
	linux-block, dm-devel, Linux kernel regressions list

* Keith Busch (kbusch@kernel.org) wrote:
> On Tue, Jun 16, 2026 at 01:08:52PM +0000, Dr. David Alan Gilbert wrote:
> > ( lvcreate  -m 1 -L 1G main /dev/sda2 /dev/sdb2 ) rather than
> > the old mirror with the same patch, then:
> > 
> >   a) I get no log errors with either read or write
> >   b) read still gives EIO
> 
> I've a follow up patch to handle the error properly. You want to see
> EINVAL, not EIO, and that error shouldn't be considered for determining
> the raid health. Something like what f7b24c7b41f23b5 does, but it's a
> little more complicated in this path since it doesn't see the lower
> level error status and just converts everything to EIO.

OK, thanks for your help, and I'll be happy to test that when it's done.

Dave
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Repeatable, raid1+O_DIRECT, hang/warn
  2026-06-15 23:16     ` Vjaceslavs Klimovs
  2026-06-16  0:06       ` Keith Busch
@ 2026-06-16 15:55       ` Mikulas Patocka
  2026-06-16 16:05         ` Keith Busch
  1 sibling, 1 reply; 22+ messages in thread
From: Mikulas Patocka @ 2026-06-16 15:55 UTC (permalink / raw)
  To: Vjaceslavs Klimovs
  Cc: Dr. David Alan Gilbert, Thorsten Leemhuis, kbusch, trnka,
	Zdenek Kabelac, linux-block, dm-devel,
	Linux kernel regressions list

Hi


On Mon, 15 Jun 2026, Vjaceslavs Klimovs wrote:

> Hi Dave, all,
> 
> I'm one of the original reporters and very much a user, not a block/dm
> developer, so please sanity-check all of this.
> 
> Your trace looks like what the two earlier reports hit: a read reaching
> a leaf device with sectors > 0 but phys_seg 0 (an empty bio). One aside
> that may help read the trace: blk_io_trace.error is a __u16, so the
> bracketed values on your C lines are errnos as u16 (65514 = -EINVAL,
> 65531 = -EIO).
> 
> The WARN itself is new, the bad bio isn't. bio_add_page() only started
> rejecting len == 0 in 643893647cac ("block: reject zero length in
> bio_add_page()", v7.1-rc1); on 7.0.8 the same empty bio tripped
> scsi_alloc_sgtables()'s !nr_segs instead, which matches what you saw.
> That fits your "not a recent regression": the condition is older, v7.1
> just made it loud.
> 
> For Tomas's and my reports (QEMU O_DIRECT to the LV block device) the
> origin looks like 5ff3f74e145a ("block: simplify direct io validity
> check", v6.18): blkdev_dio_invalid() now checks only aggregate
> ki_pos | count alignment and dropped the per-segment
> bdev_iter_is_aligned() walk, so a degenerate or misaligned O_DIRECT no
> longer gets -EINVAL at the fops boundary. But your reproducer reads a
> file, which goes through the filesystem O_DIRECT path and never calls
> blkdev_dio_invalid(), and still makes the empty bio. So it isn't only
> that one entry point.

I thought that reverting 5ff3f74e145a and re-introducing the alignment 
check in block/fops.c:blkdev_dio_invalid would fix it - but it wouldn't.

The same problem existed even before 5ff3f74e145a, with the pvmove 
command.

Suppose that the administrator needs to move a logical volume from one 
disk to another and uses pvmove. Pvmove inserts a new dm-mirror target 
underneath the logical volume and uses it to copy the data. Now, the 
dm-mirror target crashes whenever it receives bio with unaligned vectors.

So, I think that the proper way to fix this is to teach dm-mirror/dm-io to 
deal with unaligned bio vectors and handle them properly.

Mikulas


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Repeatable, raid1+O_DIRECT, hang/warn
  2026-06-16 15:55       ` Mikulas Patocka
@ 2026-06-16 16:05         ` Keith Busch
  0 siblings, 0 replies; 22+ messages in thread
From: Keith Busch @ 2026-06-16 16:05 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Vjaceslavs Klimovs, Dr. David Alan Gilbert, Thorsten Leemhuis,
	trnka, Zdenek Kabelac, linux-block, dm-devel,
	Linux kernel regressions list

On Tue, Jun 16, 2026 at 05:55:13PM +0200, Mikulas Patocka wrote:
> I thought that reverting 5ff3f74e145a and re-introducing the alignment 
> check in block/fops.c:blkdev_dio_invalid would fix it - but it wouldn't.
> 
> The same problem existed even before 5ff3f74e145a, with the pvmove 
> command.

Also before 5ff3f74e145a, you could still have devices that are
perfectly fine with dword aligned dma, so sub-sector vectors  would have
passed the checks and gone through to dm-raid, which would have
miscounted the remaining.
 
> So, I think that the proper way to fix this is to teach dm-mirror/dm-io to 
> deal with unaligned bio vectors and handle them properly.

The block layer already handles it, so I think just dispatch it and
check the bi_status is all the stacking drivers need to do.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Repeatable, raid1+O_DIRECT, hang/warn
  2026-06-14 17:57 Repeatable, raid1+O_DIRECT, hang/warn Dr. David Alan Gilbert
  2026-06-15 10:34 ` Thorsten Leemhuis
@ 2026-06-15 13:07 ` Zdenek Kabelac
  2026-06-15 13:20   ` Dr. David Alan Gilbert
  2026-06-15 15:20 ` Keith Busch
  2 siblings, 1 reply; 22+ messages in thread
From: Zdenek Kabelac @ 2026-06-15 13:07 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, linux-block, dm-devel

Dne 14. 06. 26 v 19:57 Dr. David Alan Gilbert napsal(a):
> Hi,
>    I've got a repeatable raid hang/warn and would appreciate some pointers
> as where to debug.
>    (I've been logging stuff on  https://bugzilla.kernel.org/show_bug.cgi?id=221535 )
> 
>    This started off as debugging a case where I'd get my RAID1 (on the host)
> getting a reliable 'rescheduling sector'/disk failure while running the qemu block test suite
> during a qemu build, but then I tried to build a smaller discrete
> test, and now I've got a simply triggerable warn and test hang.
> There's no errors from the underlying SATA layer on the storage,
> everything resyncs just fine.
> 
> I've got an existing LVM vg ('main') with two mirrors on sda2, and sdb2
> which are SATA disks.
> 
> # lvcreate --type mirror --mirrors 1 -L 1G main /dev/sda2 /dev/sdb2
>

Hi

It's probably worth to say here - the '--type mirror' is the OLD (historical) 
DM mirror target implementation - this target is now in the not so active 
development as users are supposed to be using newer (and faster) md wrapped 
'--type raid1'

So if you use   'lvcreate -m1 ....'   you get 'auto-magically' this mirroring 
target.

But this obviously doesn't fix the problem if old mirror target...

Regards

Zdenek


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Repeatable, raid1+O_DIRECT, hang/warn
  2026-06-15 13:07 ` Zdenek Kabelac
@ 2026-06-15 13:20   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 22+ messages in thread
From: Dr. David Alan Gilbert @ 2026-06-15 13:20 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: linux-block, dm-devel

* Zdenek Kabelac (zkabelac@redhat.com) wrote:
> Dne 14. 06. 26 v 19:57 Dr. David Alan Gilbert napsal(a):
> > Hi,
> >    I've got a repeatable raid hang/warn and would appreciate some pointers
> > as where to debug.
> >    (I've been logging stuff on  https://bugzilla.kernel.org/show_bug.cgi?id=221535 )
> > 
> >    This started off as debugging a case where I'd get my RAID1 (on the host)
> > getting a reliable 'rescheduling sector'/disk failure while running the qemu block test suite
> > during a qemu build, but then I tried to build a smaller discrete
> > test, and now I've got a simply triggerable warn and test hang.
> > There's no errors from the underlying SATA layer on the storage,
> > everything resyncs just fine.
> > 
> > I've got an existing LVM vg ('main') with two mirrors on sda2, and sdb2
> > which are SATA disks.
> > 
> > # lvcreate --type mirror --mirrors 1 -L 1G main /dev/sda2 /dev/sdb2
> > 
> 
> Hi
> 
> It's probably worth to say here - the '--type mirror' is the OLD
> (historical) DM mirror target implementation - this target is now in the not
> so active development as users are supposed to be using newer (and faster)
> md wrapped '--type raid1'

Ah, that might have been when I split off to using a separate test
LV rather than risking my LV containing actual useful data and had to
try and find the lvcreate command.

> So if you use   'lvcreate -m1 ....'   you get 'auto-magically' this
> mirroring target.

Thanks,

Dave

> But this obviously doesn't fix the problem if old mirror target...
> 
> Regards
> 
> Zdenek
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Repeatable, raid1+O_DIRECT, hang/warn
  2026-06-14 17:57 Repeatable, raid1+O_DIRECT, hang/warn Dr. David Alan Gilbert
  2026-06-15 10:34 ` Thorsten Leemhuis
  2026-06-15 13:07 ` Zdenek Kabelac
@ 2026-06-15 15:20 ` Keith Busch
  2026-06-15 15:35   ` Keith Busch
  2 siblings, 1 reply; 22+ messages in thread
From: Keith Busch @ 2026-06-15 15:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: linux-block, dm-devel

On Sun, Jun 14, 2026 at 05:57:48PM +0000, Dr. David Alan Gilbert wrote:
> Jun 14 18:08:32 dalek kernel: device-mapper: raid1: Mirror read failed from 252:24. Trying alternative device.
> Jun 14 18:08:32 dalek kernel: ------------[ cut here ]------------
> Jun 14 18:08:32 dalek dmeventd[1010]: Primary mirror device 252:24 read failed.
> Jun 14 18:08:32 dalek kernel: WARNING: block/bio.c:1044 at bio_add_page+0x18b/0x250, CPU#15: kworker/15:1/369
> Jun 14 18:08:32 dalek dmeventd[1010]: main-lvol0 is now in-sync.
> Jun 14 18:08:32 dalek kernel: Modules linked in: nft_masq nft_reject_ipv4 act_csum cls_u32 sch_htb nf_nat_tftp nf_conntrack_tftp bridge stp llc rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reje>
> Jun 14 18:08:32 dalek kernel:  drm_panel_backlight_quirks gpu_sched drm_suballoc_helper video nvme drm_display_helper nvme_core cec nvme_keyring sp5100_tco nvme_auth wmi serio_raw fuse scsi_dh_alua i2c_dev scsi_dh_rdac scsi_dh_emc
> Jun 14 18:08:32 dalek kernel: CPU: 15 UID: 0 PID: 369 Comm: kworker/15:1 Not tainted 7.1.0-rc7+ #786 PREEMPT(lazy) 
> Jun 14 18:08:32 dalek kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Pro4, BIOS P3.10 07/13/2020
> Jun 14 18:08:32 dalek kernel: Workqueue: kmirrord do_mirror
> Jun 14 18:08:32 dalek kernel: RIP: 0010:bio_add_page+0x18b/0x250
> Jun 14 18:08:32 dalek kernel: Code: 24 10 4c 8b 04 24 84 c0 0f 85 c9 00 00 00 41 0f b7 40 78 48 8b 74 24 08 8b 4c 24 14 e9 b4 fe ff ff 0f 0b 31 c0 e9 55 d1 af 00 <0f> 0b eb f5 48 8b 7f 08 83 7f 60 05 0f 85 00 ff ff ff 49 8b 3b 4c
> Jun 14 18:08:32 dalek kernel: RSP: 0018:ffffd1fb8176fc10 EFLAGS: 00010246
> Jun 14 18:08:32 dalek kernel: RAX: 0000000000000000 RBX: ffffd1fb8176fd18 RCX: 0000000000000000
> Jun 14 18:08:32 dalek kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8d1a8eb28b00
> Jun 14 18:08:32 dalek kernel: RBP: 0000000000000000 R08: ffffd1fb8176fc38 R09: ffffd1fb8176fc40
> Jun 14 18:08:32 dalek kernel: R10: ffffd1fb8176fc34 R11: 0000000000000000 R12: 0000000000000000
> Jun 14 18:08:32 dalek kernel: R13: ffffd1fb8176fd90 R14: 0000000000000001 R15: ffff8d1a8eb28b00
> Jun 14 18:08:32 dalek kernel: FS:  0000000000000000(0000) GS:ffff8d29d161f000(0000) knlGS:0000000000000000
> Jun 14 18:08:32 dalek kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jun 14 18:08:32 dalek kernel: CR2: 00007f0ddcd7b9d0 CR3: 000000023dcbf000 CR4: 0000000000350ef0
> Jun 14 18:08:32 dalek kernel: Call Trace:
> Jun 14 18:08:32 dalek kernel:  <TASK>
> Jun 14 18:08:32 dalek kernel:  do_region+0x227/0x2a0

I think the problem is that do_region is tracking the "remaining" in
sector granularity, but devices can have dma alignment such that it's
valid to have sub-sector vectors. Rounding the length appended
to_sectors() creates a 0 length subtraction, so the loop thinks no
progress is made and loops forever. If we track it in bytes instead of
sectors, then that should fix this observation.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Repeatable, raid1+O_DIRECT, hang/warn
  2026-06-15 15:20 ` Keith Busch
@ 2026-06-15 15:35   ` Keith Busch
  2026-06-15 16:37     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 22+ messages in thread
From: Keith Busch @ 2026-06-15 15:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: linux-block, dm-devel

On Mon, Jun 15, 2026 at 09:20:23AM -0600, Keith Busch wrote:
> On Sun, Jun 14, 2026 at 05:57:48PM +0000, Dr. David Alan Gilbert wrote:
> > Jun 14 18:08:32 dalek kernel: device-mapper: raid1: Mirror read failed from 252:24. Trying alternative device.
> > Jun 14 18:08:32 dalek kernel: ------------[ cut here ]------------
> > Jun 14 18:08:32 dalek dmeventd[1010]: Primary mirror device 252:24 read failed.
> > Jun 14 18:08:32 dalek kernel: WARNING: block/bio.c:1044 at bio_add_page+0x18b/0x250, CPU#15: kworker/15:1/369
> > Jun 14 18:08:32 dalek dmeventd[1010]: main-lvol0 is now in-sync.
> > Jun 14 18:08:32 dalek kernel: Modules linked in: nft_masq nft_reject_ipv4 act_csum cls_u32 sch_htb nf_nat_tftp nf_conntrack_tftp bridge stp llc rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reje>
> > Jun 14 18:08:32 dalek kernel:  drm_panel_backlight_quirks gpu_sched drm_suballoc_helper video nvme drm_display_helper nvme_core cec nvme_keyring sp5100_tco nvme_auth wmi serio_raw fuse scsi_dh_alua i2c_dev scsi_dh_rdac scsi_dh_emc
> > Jun 14 18:08:32 dalek kernel: CPU: 15 UID: 0 PID: 369 Comm: kworker/15:1 Not tainted 7.1.0-rc7+ #786 PREEMPT(lazy) 
> > Jun 14 18:08:32 dalek kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Pro4, BIOS P3.10 07/13/2020
> > Jun 14 18:08:32 dalek kernel: Workqueue: kmirrord do_mirror
> > Jun 14 18:08:32 dalek kernel: RIP: 0010:bio_add_page+0x18b/0x250
> > Jun 14 18:08:32 dalek kernel: Code: 24 10 4c 8b 04 24 84 c0 0f 85 c9 00 00 00 41 0f b7 40 78 48 8b 74 24 08 8b 4c 24 14 e9 b4 fe ff ff 0f 0b 31 c0 e9 55 d1 af 00 <0f> 0b eb f5 48 8b 7f 08 83 7f 60 05 0f 85 00 ff ff ff 49 8b 3b 4c
> > Jun 14 18:08:32 dalek kernel: RSP: 0018:ffffd1fb8176fc10 EFLAGS: 00010246
> > Jun 14 18:08:32 dalek kernel: RAX: 0000000000000000 RBX: ffffd1fb8176fd18 RCX: 0000000000000000
> > Jun 14 18:08:32 dalek kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8d1a8eb28b00
> > Jun 14 18:08:32 dalek kernel: RBP: 0000000000000000 R08: ffffd1fb8176fc38 R09: ffffd1fb8176fc40
> > Jun 14 18:08:32 dalek kernel: R10: ffffd1fb8176fc34 R11: 0000000000000000 R12: 0000000000000000
> > Jun 14 18:08:32 dalek kernel: R13: ffffd1fb8176fd90 R14: 0000000000000001 R15: ffff8d1a8eb28b00
> > Jun 14 18:08:32 dalek kernel: FS:  0000000000000000(0000) GS:ffff8d29d161f000(0000) knlGS:0000000000000000
> > Jun 14 18:08:32 dalek kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Jun 14 18:08:32 dalek kernel: CR2: 00007f0ddcd7b9d0 CR3: 000000023dcbf000 CR4: 0000000000350ef0
> > Jun 14 18:08:32 dalek kernel: Call Trace:
> > Jun 14 18:08:32 dalek kernel:  <TASK>
> > Jun 14 18:08:32 dalek kernel:  do_region+0x227/0x2a0
> 
> I think the problem is that do_region is tracking the "remaining" in
> sector granularity, but devices can have dma alignment such that it's
> valid to have sub-sector vectors. Rounding the length appended
> to_sectors() creates a 0 length subtraction, so the loop thinks no
> progress is made and loops forever. If we track it in bytes instead of
> sectors, then that should fix this observation.

I recreated your observation and this patch below appears to fix the
stuck behavior.

---
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 1db565b376200..d72b9331c2fd1 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -362,19 +362,26 @@ static void do_region(const blk_opf_t opf, unsigned int region,
                        bio->bi_iter.bi_size = num_sectors << SECTOR_SHIFT;
                        remaining -= num_sectors;
                } else {
-                       while (remaining) {
+                       unsigned long byte_remaining = to_bytes(remaining);
+
+                       while (byte_remaining) {
                                /*
                                 * Try and add as many pages as possible.
                                 */
                                dp->get_page(dp, &page, &len, &offset);
-                               len = min(len, to_bytes(remaining));
+                               len = min(len, byte_remaining);
                                if (!bio_add_page(bio, page, len, offset))
                                        break;

                                offset = 0;
-                               remaining -= to_sector(len);
+                               byte_remaining -= len;
                                dp->next_page(dp);
                        }
+                       remaining = to_sector(byte_remaining);
                }

                atomic_inc(&io->count);
--

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Repeatable, raid1+O_DIRECT, hang/warn
  2026-06-15 15:35   ` Keith Busch
@ 2026-06-15 16:37     ` Dr. David Alan Gilbert
  2026-06-15 17:19       ` Keith Busch
  0 siblings, 1 reply; 22+ messages in thread
From: Dr. David Alan Gilbert @ 2026-06-15 16:37 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block, dm-devel

* Keith Busch (kbusch@kernel.org) wrote:
> On Mon, Jun 15, 2026 at 09:20:23AM -0600, Keith Busch wrote:
> > On Sun, Jun 14, 2026 at 05:57:48PM +0000, Dr. David Alan Gilbert wrote:
> > > Jun 14 18:08:32 dalek kernel: device-mapper: raid1: Mirror read failed from 252:24. Trying alternative device.
> > > Jun 14 18:08:32 dalek kernel: ------------[ cut here ]------------
> > > Jun 14 18:08:32 dalek dmeventd[1010]: Primary mirror device 252:24 read failed.
> > > Jun 14 18:08:32 dalek kernel: WARNING: block/bio.c:1044 at bio_add_page+0x18b/0x250, CPU#15: kworker/15:1/369
> > > Jun 14 18:08:32 dalek dmeventd[1010]: main-lvol0 is now in-sync.
> > > Jun 14 18:08:32 dalek kernel: Modules linked in: nft_masq nft_reject_ipv4 act_csum cls_u32 sch_htb nf_nat_tftp nf_conntrack_tftp bridge stp llc rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reje>
> > > Jun 14 18:08:32 dalek kernel:  drm_panel_backlight_quirks gpu_sched drm_suballoc_helper video nvme drm_display_helper nvme_core cec nvme_keyring sp5100_tco nvme_auth wmi serio_raw fuse scsi_dh_alua i2c_dev scsi_dh_rdac scsi_dh_emc
> > > Jun 14 18:08:32 dalek kernel: CPU: 15 UID: 0 PID: 369 Comm: kworker/15:1 Not tainted 7.1.0-rc7+ #786 PREEMPT(lazy) 
> > > Jun 14 18:08:32 dalek kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Pro4, BIOS P3.10 07/13/2020
> > > Jun 14 18:08:32 dalek kernel: Workqueue: kmirrord do_mirror
> > > Jun 14 18:08:32 dalek kernel: RIP: 0010:bio_add_page+0x18b/0x250
> > > Jun 14 18:08:32 dalek kernel: Code: 24 10 4c 8b 04 24 84 c0 0f 85 c9 00 00 00 41 0f b7 40 78 48 8b 74 24 08 8b 4c 24 14 e9 b4 fe ff ff 0f 0b 31 c0 e9 55 d1 af 00 <0f> 0b eb f5 48 8b 7f 08 83 7f 60 05 0f 85 00 ff ff ff 49 8b 3b 4c
> > > Jun 14 18:08:32 dalek kernel: RSP: 0018:ffffd1fb8176fc10 EFLAGS: 00010246
> > > Jun 14 18:08:32 dalek kernel: RAX: 0000000000000000 RBX: ffffd1fb8176fd18 RCX: 0000000000000000
> > > Jun 14 18:08:32 dalek kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8d1a8eb28b00
> > > Jun 14 18:08:32 dalek kernel: RBP: 0000000000000000 R08: ffffd1fb8176fc38 R09: ffffd1fb8176fc40
> > > Jun 14 18:08:32 dalek kernel: R10: ffffd1fb8176fc34 R11: 0000000000000000 R12: 0000000000000000
> > > Jun 14 18:08:32 dalek kernel: R13: ffffd1fb8176fd90 R14: 0000000000000001 R15: ffff8d1a8eb28b00
> > > Jun 14 18:08:32 dalek kernel: FS:  0000000000000000(0000) GS:ffff8d29d161f000(0000) knlGS:0000000000000000
> > > Jun 14 18:08:32 dalek kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > Jun 14 18:08:32 dalek kernel: CR2: 00007f0ddcd7b9d0 CR3: 000000023dcbf000 CR4: 0000000000350ef0
> > > Jun 14 18:08:32 dalek kernel: Call Trace:
> > > Jun 14 18:08:32 dalek kernel:  <TASK>
> > > Jun 14 18:08:32 dalek kernel:  do_region+0x227/0x2a0
> > 
> > I think the problem is that do_region is tracking the "remaining" in
> > sector granularity, but devices can have dma alignment such that it's
> > valid to have sub-sector vectors. Rounding the length appended
> > to_sectors() creates a 0 length subtraction, so the loop thinks no
> > progress is made and loops forever. If we track it in bytes instead of
> > sectors, then that should fix this observation.
> 
> I recreated your observation and this patch below appears to fix the
> stuck behavior.

Hi Keith,
  Thanks for the patch, alas it doesn't seem to be helping here;
 the first warn is still the same
and it still hangs the test process hard and eventually BUGs at

void blk_mq_end_request(struct request *rq, blk_status_t error)
{
        if (blk_update_request(rq, error, blk_rq_bytes(rq)))
                BUG();

Dave

> ---
> diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
> index 1db565b376200..d72b9331c2fd1 100644
> --- a/drivers/md/dm-io.c
> +++ b/drivers/md/dm-io.c
> @@ -362,19 +362,26 @@ static void do_region(const blk_opf_t opf, unsigned int region,
>                         bio->bi_iter.bi_size = num_sectors << SECTOR_SHIFT;
>                         remaining -= num_sectors;
>                 } else {
> -                       while (remaining) {
> +                       unsigned long byte_remaining = to_bytes(remaining);
> +
> +                       while (byte_remaining) {
>                                 /*
>                                  * Try and add as many pages as possible.
>                                  */
>                                 dp->get_page(dp, &page, &len, &offset);
> -                               len = min(len, to_bytes(remaining));
> +                               len = min(len, byte_remaining);
>                                 if (!bio_add_page(bio, page, len, offset))
>                                         break;
> 
>                                 offset = 0;
> -                               remaining -= to_sector(len);
> +                               byte_remaining -= len;
>                                 dp->next_page(dp);
>                         }
> +                       remaining = to_sector(byte_remaining);
>                 }
> 
>                 atomic_inc(&io->count);
> --
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Repeatable, raid1+O_DIRECT, hang/warn
  2026-06-15 16:37     ` Dr. David Alan Gilbert
@ 2026-06-15 17:19       ` Keith Busch
  2026-06-15 17:42         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 22+ messages in thread
From: Keith Busch @ 2026-06-15 17:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: linux-block, dm-devel

On Mon, Jun 15, 2026 at 04:37:39PM +0000, Dr. David Alan Gilbert wrote:
> Hi Keith,
>   Thanks for the patch, alas it doesn't seem to be helping here;
>  the first warn is still the same
> and it still hangs the test process hard and eventually BUGs at
> 
> void blk_mq_end_request(struct request *rq, blk_status_t error)
> {
>         if (blk_update_request(rq, error, blk_rq_bytes(rq)))
>                 BUG();

Oh, that was not expected.

What is the dma alignment requirement of your backing devices? You can
find the attribute for sda at /sys/block/sda/queue/dma_alignment. I'm
expecting 511, but just want to double check.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Repeatable, raid1+O_DIRECT, hang/warn
  2026-06-15 17:19       ` Keith Busch
@ 2026-06-15 17:42         ` Dr. David Alan Gilbert
  2026-06-15 19:25           ` Keith Busch
  0 siblings, 1 reply; 22+ messages in thread
From: Dr. David Alan Gilbert @ 2026-06-15 17:42 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block, dm-devel

* Keith Busch (kbusch@kernel.org) wrote:
> On Mon, Jun 15, 2026 at 04:37:39PM +0000, Dr. David Alan Gilbert wrote:
> > Hi Keith,
> >   Thanks for the patch, alas it doesn't seem to be helping here;
> >  the first warn is still the same
> > and it still hangs the test process hard and eventually BUGs at
> > 
> > void blk_mq_end_request(struct request *rq, blk_status_t error)
> > {
> >         if (blk_update_request(rq, error, blk_rq_bytes(rq)))
> >                 BUG();
> 
> Oh, that was not expected.
> 
> What is the dma alignment requirement of your backing devices? You can
> find the attribute for sda at /sys/block/sda/queue/dma_alignment. I'm
> expecting 511, but just want to double check.

Yeh looks like it:
/sys/block/sda/queue/dma_alignment:511
/sys/block/sdb/queue/dma_alignment:511

all of the lvm also looks like it is.

Dave

-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Repeatable, raid1+O_DIRECT, hang/warn
  2026-06-15 17:42         ` Dr. David Alan Gilbert
@ 2026-06-15 19:25           ` Keith Busch
  2026-06-15 20:09             ` Keith Busch
  0 siblings, 1 reply; 22+ messages in thread
From: Keith Busch @ 2026-06-15 19:25 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: linux-block, dm-devel

On Mon, Jun 15, 2026 at 05:42:42PM +0000, Dr. David Alan Gilbert wrote:
> * Keith Busch (kbusch@kernel.org) wrote:
> > On Mon, Jun 15, 2026 at 04:37:39PM +0000, Dr. David Alan Gilbert wrote:
> > > Hi Keith,
> > >   Thanks for the patch, alas it doesn't seem to be helping here;
> > >  the first warn is still the same
> > > and it still hangs the test process hard and eventually BUGs at
> > > 
> > > void blk_mq_end_request(struct request *rq, blk_status_t error)
> > > {
> > >         if (blk_update_request(rq, error, blk_rq_bytes(rq)))
> > >                 BUG();
> > 
> > Oh, that was not expected.
> > 
> > What is the dma alignment requirement of your backing devices? You can
> > find the attribute for sda at /sys/block/sda/queue/dma_alignment. I'm
> > expecting 511, but just want to double check.
> 
> Yeh looks like it:
> /sys/block/sda/queue/dma_alignment:511
> /sys/block/sdb/queue/dma_alignment:511
> 
> all of the lvm also looks like it is.

Thanks for confirming.

I'm struggling to see how you're getting there with your reproducer with
the proposal included. I can see other short comings with preadv or
really large pread's, but not with a 4k pread. For those other issues
this patch can fix it:

  https://lore.kernel.org/linux-block/20260612223205.465913-1-kbusch@meta.com/

It is currently staged for upstream, so hasn't landed yet. But again, I
don't think those conditions apply to what you're seeing, but worth a
shot on top of the previous proposal to use byte units instead of
sectors.

In the meantime, since I so far can't reproduce this after including my
previous proposal, I may have to request trying out a debug patch to get
some more visibility on what's happening if that's okay.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Repeatable, raid1+O_DIRECT, hang/warn
  2026-06-15 19:25           ` Keith Busch
@ 2026-06-15 20:09             ` Keith Busch
  0 siblings, 0 replies; 22+ messages in thread
From: Keith Busch @ 2026-06-15 20:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: linux-block, dm-devel

On Mon, Jun 15, 2026 at 01:25:17PM -0600, Keith Busch wrote:
> In the meantime, since I so far can't reproduce this after including my
> previous proposal, I may have to request trying out a debug patch to get
> some more visibility on what's happening if that's okay.

Going in a different direction here, there's no reason to recreate the
lower level bio's from scratch when they originate from an incoming bio.
We can just clone it along with an iterator pointing to the original.

Can you try this one out? This was successful when I ran your reproducer
and cuts out a lot of code too with a performance bonus for large IO.

---
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 1db565b376200..28adfeb58f240 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -170,12 +170,11 @@ struct dpages {
 			 struct page **p, unsigned long *len, unsigned int *offset);
 	void (*next_page)(struct dpages *dp);
 
-	union {
-		unsigned int context_u;
-		struct bvec_iter context_bi;
-	};
+	unsigned int context_u;
 	void *context_ptr;
 
+	struct bio *orig_bio;
+
 	void *vma_invalidate_address;
 	unsigned long vma_invalidate_size;
 };
@@ -210,44 +209,6 @@ static void list_dp_init(struct dpages *dp, struct page_list *pl, unsigned int o
 	dp->context_ptr = pl;
 }
 
-/*
- * Functions for getting the pages from a bvec.
- */
-static void bio_get_page(struct dpages *dp, struct page **p,
-			 unsigned long *len, unsigned int *offset)
-{
-	struct bio_vec bvec = bvec_iter_bvec((struct bio_vec *)dp->context_ptr,
-					     dp->context_bi);
-
-	*p = bvec.bv_page;
-	*len = bvec.bv_len;
-	*offset = bvec.bv_offset;
-
-	/* avoid figuring it out again in bio_next_page() */
-	dp->context_bi.bi_sector = (sector_t)bvec.bv_len;
-}
-
-static void bio_next_page(struct dpages *dp)
-{
-	unsigned int len = (unsigned int)dp->context_bi.bi_sector;
-
-	bvec_iter_advance((struct bio_vec *)dp->context_ptr,
-			  &dp->context_bi, len);
-}
-
-static void bio_dp_init(struct dpages *dp, struct bio *bio)
-{
-	dp->get_page = bio_get_page;
-	dp->next_page = bio_next_page;
-
-	/*
-	 * We just use bvec iterator to retrieve pages, so it is ok to
-	 * access the bvec table directly here
-	 */
-	dp->context_ptr = bio->bi_io_vec;
-	dp->context_bi = bio->bi_iter;
-}
-
 /*
  * Functions for getting the pages from a VMA.
  */
@@ -332,6 +293,21 @@ static void do_region(const blk_opf_t opf, unsigned int region,
 		return;
 	}
 
+	if (dp->orig_bio) {
+		bio = bio_alloc_clone(where->bdev, dp->orig_bio, GFP_NOIO,
+				      &io->client->bios);
+		bio->bi_iter.bi_sector = where->sector;
+		bio->bi_iter.bi_size = where->count << SECTOR_SHIFT;
+		bio->bi_opf = opf;
+		bio->bi_end_io = endio;
+		bio->bi_ioprio = ioprio;
+		store_io_and_region_in_bio(bio, io, region);
+
+		atomic_inc(&io->count);
+		submit_bio(bio);
+		return;
+	}
+
 	/*
 	 * where->count may be zero if op holds a flush and we need to
 	 * send a zero-sized flush.
@@ -468,6 +444,7 @@ static int dp_init(struct dm_io_request *io_req, struct dpages *dp,
 
 	dp->vma_invalidate_address = NULL;
 	dp->vma_invalidate_size = 0;
+	dp->orig_bio = NULL;
 
 	switch (io_req->mem.type) {
 	case DM_IO_PAGE_LIST:
@@ -475,7 +452,11 @@ static int dp_init(struct dm_io_request *io_req, struct dpages *dp,
 		break;
 
 	case DM_IO_BIO:
-		bio_dp_init(dp, io_req->mem.ptr.bio);
+		/*
+		 * The destination bios clone this bio's biovec directly, so
+		 * there are no per-page accessors to set up here.
+		 */
+		dp->orig_bio = io_req->mem.ptr.bio;
 		break;
 
 	case DM_IO_VMA:
-- 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2026-06-16 16:05 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-14 17:57 Repeatable, raid1+O_DIRECT, hang/warn Dr. David Alan Gilbert
2026-06-15 10:34 ` Thorsten Leemhuis
2026-06-15 12:50   ` Dr. David Alan Gilbert
2026-06-15 23:16     ` Vjaceslavs Klimovs
2026-06-16  0:06       ` Keith Busch
2026-06-16  1:25         ` Vjaceslavs Klimovs
2026-06-16 12:57         ` Dr. David Alan Gilbert
2026-06-16 13:08           ` Dr. David Alan Gilbert
2026-06-16 14:04             ` Dr. David Alan Gilbert
2026-06-16 14:19             ` Keith Busch
2026-06-16 15:55               ` Dr. David Alan Gilbert
2026-06-16 15:55       ` Mikulas Patocka
2026-06-16 16:05         ` Keith Busch
2026-06-15 13:07 ` Zdenek Kabelac
2026-06-15 13:20   ` Dr. David Alan Gilbert
2026-06-15 15:20 ` Keith Busch
2026-06-15 15:35   ` Keith Busch
2026-06-15 16:37     ` Dr. David Alan Gilbert
2026-06-15 17:19       ` Keith Busch
2026-06-15 17:42         ` Dr. David Alan Gilbert
2026-06-15 19:25           ` Keith Busch
2026-06-15 20:09             ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox