* Repeatable, raid1+O_DIRECT, hang/warn
@ 2026-06-14 17:57 Dr. David Alan Gilbert
0 siblings, 0 replies; only message in thread
From: Dr. David Alan Gilbert @ 2026-06-14 17:57 UTC (permalink / raw)
To: linux-block, dm-devel
Hi,
I've got a repeatable raid hang/warn and would appreciate some pointers
as where to debug.
(I've been logging stuff on https://bugzilla.kernel.org/show_bug.cgi?id=221535 )
This started off as debugging a case where I'd get my RAID1 (on the host)
getting a reliable 'rescheduling sector'/disk failure while running the qemu block test suite
during a qemu build, but then I tried to build a smaller discrete
test, and now I've got a simply triggerable warn and test hang.
There's no errors from the underlying SATA layer on the storage,
everything resyncs just fine.
I've got an existing LVM vg ('main') with two mirrors on sda2, and sdb2
which are SATA disks.
# lvcreate --type mirror --mirrors 1 -L 1G main /dev/sda2 /dev/sdb2
# mkfs.ext4 /dev/mapper/main-lvol0
# mount /dev/mapper/main-lvol0 /mnt/tmp/
# chmod a+rwx /mnt/tmp
$ dd if=/dev/zero of=/mnt/tmp/testfile bs=1024k count=1
(I then wait for the IO to stop)
then we've got this little test program:
<--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><-->
#include <errno.h>
#include <fcntl.h>
#include <asm-generic/fcntl.h>
#include <stdio.h>
#include <unistd.h>
const char* path="/mnt/tmp/testfile";
static char buf[8192];
int main()
{
int fd=open(path, O_RDWR|O_DIRECT|O_CLOEXEC);
errno=0;
int res3=pread(fd, buf, 4096, 0);
printf("pread of 4096 said: %d (%m)\n", res3);
}
<--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><--><-->
running that, either hangs or gets a 'pread of 4096 said: -1 (Input/output error)'
when it hangs it's unkillable.
at the moment (on 7.1.0-rc7) this is giving:
Jun 14 18:08:32 dalek kernel: device-mapper: raid1: Mirror read failed from 252:24. Trying alternative device.
Jun 14 18:08:32 dalek kernel: ------------[ cut here ]------------
Jun 14 18:08:32 dalek dmeventd[1010]: Primary mirror device 252:24 read failed.
Jun 14 18:08:32 dalek kernel: WARNING: block/bio.c:1044 at bio_add_page+0x18b/0x250, CPU#15: kworker/15:1/369
(full backtrace below)
(Note there is a moan in there about sdb IO error - repeated a lot - but
again, there's no SATA level errors, and the drive is fine on smart, and
I can read the whole of the underlying lvm mirrors, so I don't think it's
physically there).
I did a blktrace, although that gives me a 23G blkparse output, hmm
(I see each event repeated a lot - maybe per thread?)
252,26 15 1 0.000000000 3435 Q RS 264192 + 8 [dbf]
252,26 is /dev/mapper/main-lvol0
252,24 15 1 0.000005501 3435 A RS 264192 + 8 <- (252,26) 264192
252,24 is main-lvol0_mimage_0
252,24 15 2 0.000005761 3435 Q RS 264192 + 8 [dbf]
8,0 15 1 0.000008646 3435 A RS 71634944 + 8 <- (252,24) 264192
so that's sda
8,0 15 2 0.000008787 3435 A RS 73734144 + 8 <- (8,2) 71634944
I guess mapping down from sda2 to sda
8,0 15 3 0.000009037 3435 Q RS 73734144 + 8 [dbf]
8,0 15 4 0.000009809 3435 C RS 73734144 + 8 [65514]
??? Hmm what's the 65514 there?
252,24 15 3 0.000010320 3435 C RS 264192 + 8 [65514]
252,25 15 1 0.000290384 369 Q R 264192 + 8 [kworker/15:1]
252,25 is main-lvol0_mimage_1
and at this point I'm a bit lost as to what I'm looking for.
Hints appreciated!
(I don't believe this is a regression - or at least not recent)
Dave
Jun 14 18:08:32 dalek kernel: device-mapper: raid1: Mirror read failed from 252:24. Trying alternative device.
Jun 14 18:08:32 dalek kernel: ------------[ cut here ]------------
Jun 14 18:08:32 dalek dmeventd[1010]: Primary mirror device 252:24 read failed.
Jun 14 18:08:32 dalek kernel: WARNING: block/bio.c:1044 at bio_add_page+0x18b/0x250, CPU#15: kworker/15:1/369
Jun 14 18:08:32 dalek dmeventd[1010]: main-lvol0 is now in-sync.
Jun 14 18:08:32 dalek kernel: Modules linked in: nft_masq nft_reject_ipv4 act_csum cls_u32 sch_htb nf_nat_tftp nf_conntrack_tftp bridge stp llc rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reje>
Jun 14 18:08:32 dalek kernel: drm_panel_backlight_quirks gpu_sched drm_suballoc_helper video nvme drm_display_helper nvme_core cec nvme_keyring sp5100_tco nvme_auth wmi serio_raw fuse scsi_dh_alua i2c_dev scsi_dh_rdac scsi_dh_emc
Jun 14 18:08:32 dalek kernel: CPU: 15 UID: 0 PID: 369 Comm: kworker/15:1 Not tainted 7.1.0-rc7+ #786 PREEMPT(lazy)
Jun 14 18:08:32 dalek kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Pro4, BIOS P3.10 07/13/2020
Jun 14 18:08:32 dalek kernel: Workqueue: kmirrord do_mirror
Jun 14 18:08:32 dalek kernel: RIP: 0010:bio_add_page+0x18b/0x250
Jun 14 18:08:32 dalek kernel: Code: 24 10 4c 8b 04 24 84 c0 0f 85 c9 00 00 00 41 0f b7 40 78 48 8b 74 24 08 8b 4c 24 14 e9 b4 fe ff ff 0f 0b 31 c0 e9 55 d1 af 00 <0f> 0b eb f5 48 8b 7f 08 83 7f 60 05 0f 85 00 ff ff ff 49 8b 3b 4c
Jun 14 18:08:32 dalek kernel: RSP: 0018:ffffd1fb8176fc10 EFLAGS: 00010246
Jun 14 18:08:32 dalek kernel: RAX: 0000000000000000 RBX: ffffd1fb8176fd18 RCX: 0000000000000000
Jun 14 18:08:32 dalek kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8d1a8eb28b00
Jun 14 18:08:32 dalek kernel: RBP: 0000000000000000 R08: ffffd1fb8176fc38 R09: ffffd1fb8176fc40
Jun 14 18:08:32 dalek kernel: R10: ffffd1fb8176fc34 R11: 0000000000000000 R12: 0000000000000000
Jun 14 18:08:32 dalek kernel: R13: ffffd1fb8176fd90 R14: 0000000000000001 R15: ffff8d1a8eb28b00
Jun 14 18:08:32 dalek kernel: FS: 0000000000000000(0000) GS:ffff8d29d161f000(0000) knlGS:0000000000000000
Jun 14 18:08:32 dalek kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 14 18:08:32 dalek kernel: CR2: 00007f0ddcd7b9d0 CR3: 000000023dcbf000 CR4: 0000000000350ef0
Jun 14 18:08:32 dalek kernel: Call Trace:
Jun 14 18:08:32 dalek kernel: <TASK>
Jun 14 18:08:32 dalek kernel: do_region+0x227/0x2a0
Jun 14 18:08:32 dalek kernel: dispatch_io+0xf1/0x150
Jun 14 18:08:32 dalek kernel: ? __pfx_bio_get_page+0x10/0x10
Jun 14 18:08:32 dalek kernel: ? __pfx_bio_next_page+0x10/0x10
Jun 14 18:08:32 dalek kernel: ? __pfx_read_callback+0x10/0x10
Jun 14 18:08:32 dalek kernel: dm_io+0x169/0x2d0
Jun 14 18:08:32 dalek kernel: ? __pfx_bio_get_page+0x10/0x10
Jun 14 18:08:32 dalek kernel: ? __pfx_bio_next_page+0x10/0x10
Jun 14 18:08:32 dalek kernel: do_reads+0x149/0x230
Jun 14 18:08:32 dalek kernel: ? __pfx_read_callback+0x10/0x10
Jun 14 18:08:32 dalek kernel: do_mirror+0x11a/0x2b0
Jun 14 18:08:32 dalek kernel: process_one_work+0x19e/0x390
Jun 14 18:08:32 dalek kernel: worker_thread+0x1a6/0x310
Jun 14 18:08:32 dalek kernel: ? __pfx_worker_thread+0x10/0x10
Jun 14 18:08:32 dalek kernel: kthread+0xe4/0x120
Jun 14 18:08:32 dalek kernel: ? __pfx_kthread+0x10/0x10
Jun 14 18:08:32 dalek kernel: ret_from_fork+0x1a1/0x270
Jun 14 18:08:32 dalek kernel: ? __pfx_kthread+0x10/0x10
Jun 14 18:08:32 dalek kernel: ret_from_fork_asm+0x1a/0x30
Jun 14 18:08:32 dalek kernel: </TASK>
Jun 14 18:08:32 dalek kernel: ---[ end trace 0000000000000000 ]---
Jun 14 18:08:32 dalek kernel: ------------[ cut here ]------------
Jun 14 18:08:32 dalek kernel: WARNING: drivers/scsi/scsi_lib.c:1164 at scsi_alloc_sgtables+0x38a/0x400, CPU#15: kworker/15:1/369
Jun 14 18:08:32 dalek kernel: Modules linked in: nft_masq nft_reject_ipv4 act_csum cls_u32 sch_htb nf_nat_tftp nf_conntrack_tftp bridge stp llc rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reje>
Jun 14 18:08:32 dalek kernel: drm_panel_backlight_quirks gpu_sched drm_suballoc_helper video nvme drm_display_helper nvme_core cec nvme_keyring sp5100_tco nvme_auth wmi serio_raw fuse scsi_dh_alua i2c_dev scsi_dh_rdac scsi_dh_emc
Jun 14 18:08:32 dalek kernel: CPU: 15 UID: 0 PID: 369 Comm: kworker/15:1 Tainted: G W 7.1.0-rc7+ #786 PREEMPT(lazy)
Jun 14 18:08:32 dalek kernel: Tainted: [W]=WARN
Jun 14 18:08:32 dalek kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Pro4, BIOS P3.10 07/13/2020
Jun 14 18:08:32 dalek kernel: Workqueue: kmirrord do_mirror
Jun 14 18:08:32 dalek kernel: RIP: 0010:scsi_alloc_sgtables+0x38a/0x400
Jun 14 18:08:32 dalek kernel: Code: 8b 3d ba 2d a9 01 e9 d1 fd ff ff 48 8b 75 00 48 8d bb f0 fe ff ff e8 15 b7 b0 ff 48 89 ab e0 00 00 00 89 45 08 e9 30 ff ff ff <0f> 0b 4c 8b 6c 24 30 b8 0a 00 00 00 e9 21 ff ff ff b8 09 00 00 00
Jun 14 18:08:32 dalek kernel: RSP: 0018:ffffd1fb8176f7f0 EFLAGS: 00010246
Jun 14 18:08:32 dalek kernel: RAX: 0000000000000000 RBX: ffff8d1aedad0110 RCX: 0000000000000009
Jun 14 18:08:32 dalek kernel: RDX: 0000000000000000 RSI: ffffffff99c15960 RDI: ffff8d1aedad0110
Jun 14 18:08:32 dalek kernel: RBP: ffff8d1a93d17000 R08: ffff8d1aedad0110 R09: ffff8d1a818fa800
Jun 14 18:08:32 dalek kernel: R10: 7020676e69736961 R11: 0000000000000000 R12: 0000000000000000
Jun 14 18:08:32 dalek kernel: R13: 0000000000000000 R14: ffff8d1a93394000 R15: ffff8d1a93d17000
Jun 14 18:08:32 dalek kernel: FS: 0000000000000000(0000) GS:ffff8d29d161f000(0000) knlGS:0000000000000000
Jun 14 18:08:32 dalek kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 14 18:08:32 dalek kernel: CR2: 00007f0ddcd7b9d0 CR3: 000000023dcbf000 CR4: 0000000000350ef0
Jun 14 18:08:32 dalek kernel: Call Trace:
Jun 14 18:08:32 dalek kernel: <TASK>
Jun 14 18:08:32 dalek kernel: ? srso_return_thunk+0x5/0x5f
Jun 14 18:08:32 dalek kernel: sd_setup_read_write_cmnd+0x9d/0x740
Jun 14 18:08:32 dalek kernel: ? srso_return_thunk+0x5/0x5f
Jun 14 18:08:32 dalek kernel: scsi_queue_rq+0x4d2/0x890
Jun 14 18:08:32 dalek kernel: blk_mq_dispatch_rq_list+0x241/0x530
Jun 14 18:08:32 dalek kernel: ? srso_return_thunk+0x5/0x5f
Jun 14 18:08:32 dalek kernel: ? sbitmap_get+0x61/0x100
Jun 14 18:08:32 dalek kernel: __blk_mq_do_dispatch_sched+0x330/0x340
Jun 14 18:08:32 dalek kernel: __blk_mq_sched_dispatch_requests+0x143/0x180
Jun 14 18:08:32 dalek kernel: blk_mq_sched_dispatch_requests+0x2d/0x70
Jun 14 18:08:32 dalek kernel: blk_mq_run_hw_queue+0x2bf/0x350
Jun 14 18:08:32 dalek kernel: ? srso_return_thunk+0x5/0x5f
Jun 14 18:08:32 dalek kernel: blk_mq_dispatch_list+0x172/0x350
Jun 14 18:08:32 dalek kernel: blk_mq_flush_plug_list+0x51/0x1a0
Jun 14 18:08:32 dalek kernel: ? blk_mq_submit_bio+0x71c/0x9f0
Jun 14 18:08:32 dalek kernel: __blk_flush_plug+0x112/0x180
Jun 14 18:08:32 dalek kernel: ? srso_return_thunk+0x5/0x5f
Jun 14 18:08:32 dalek kernel: __submit_bio+0x19c/0x260
Jun 14 18:08:32 dalek kernel: __submit_bio_noacct+0x8e/0x210
Jun 14 18:08:32 dalek kernel: do_region+0x14c/0x2a0
Jun 14 18:08:32 dalek kernel: dispatch_io+0xf1/0x150
Jun 14 18:08:32 dalek kernel: ? __pfx_bio_get_page+0x10/0x10
Jun 14 18:08:32 dalek kernel: ? __pfx_bio_next_page+0x10/0x10
Jun 14 18:08:32 dalek kernel: ? __pfx_read_callback+0x10/0x10
Jun 14 18:08:32 dalek kernel: dm_io+0x169/0x2d0
Jun 14 18:08:32 dalek kernel: ? __pfx_bio_get_page+0x10/0x10
Jun 14 18:08:32 dalek kernel: ? __pfx_bio_next_page+0x10/0x10
Jun 14 18:08:32 dalek kernel: do_reads+0x149/0x230
Jun 14 18:08:32 dalek kernel: ? __pfx_read_callback+0x10/0x10
Jun 14 18:08:32 dalek kernel: do_mirror+0x11a/0x2b0
Jun 14 18:08:32 dalek kernel: process_one_work+0x19e/0x390
Jun 14 18:08:32 dalek kernel: worker_thread+0x1a6/0x310
Jun 14 18:08:32 dalek kernel: ? __pfx_worker_thread+0x10/0x10
Jun 14 18:08:32 dalek kernel: kthread+0xe4/0x120
Jun 14 18:08:32 dalek kernel: ? __pfx_kthread+0x10/0x10
Jun 14 18:08:32 dalek kernel: ret_from_fork+0x1a1/0x270
Jun 14 18:08:32 dalek kernel: ? __pfx_kthread+0x10/0x10
Jun 14 18:08:32 dalek kernel: ret_from_fork_asm+0x1a/0x30
Jun 14 18:08:32 dalek kernel: </TASK>
Jun 14 18:08:32 dalek kernel: ---[ end trace 0000000000000000 ]---
Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
Jun 14 18:08:32 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
Jun 14 18:08:37 dalek kernel: blk_print_req_error: 241000 callbacks suppressed
Jun 14 18:08:37 dalek kernel: I/O error, dev sdb, sector 50606087 op 0x0:(READ) flags 0x0 phys_seg 0 prio class 2
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2026-06-14 18:19 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-14 17:57 Repeatable, raid1+O_DIRECT, hang/warn Dr. David Alan Gilbert
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox