From: Keith Busch <kbusch@kernel.org>
To: "Dr. David Alan Gilbert" <linux@treblig.org>
Cc: linux-block@vger.kernel.org, dm-devel@lists.linux.dev
Subject: Re: Repeatable, raid1+O_DIRECT, hang/warn
Date: Mon, 15 Jun 2026 09:35:46 -0600 [thread overview]
Message-ID: <ajAb0m9cNraQn2Pw@kbusch-mbp> (raw)
In-Reply-To: <ajAYN_mmjzYBAimV@kbusch-mbp>
On Mon, Jun 15, 2026 at 09:20:23AM -0600, Keith Busch wrote:
> On Sun, Jun 14, 2026 at 05:57:48PM +0000, Dr. David Alan Gilbert wrote:
> > Jun 14 18:08:32 dalek kernel: device-mapper: raid1: Mirror read failed from 252:24. Trying alternative device.
> > Jun 14 18:08:32 dalek kernel: ------------[ cut here ]------------
> > Jun 14 18:08:32 dalek dmeventd[1010]: Primary mirror device 252:24 read failed.
> > Jun 14 18:08:32 dalek kernel: WARNING: block/bio.c:1044 at bio_add_page+0x18b/0x250, CPU#15: kworker/15:1/369
> > Jun 14 18:08:32 dalek dmeventd[1010]: main-lvol0 is now in-sync.
> > Jun 14 18:08:32 dalek kernel: Modules linked in: nft_masq nft_reject_ipv4 act_csum cls_u32 sch_htb nf_nat_tftp nf_conntrack_tftp bridge stp llc rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reje>
> > Jun 14 18:08:32 dalek kernel: drm_panel_backlight_quirks gpu_sched drm_suballoc_helper video nvme drm_display_helper nvme_core cec nvme_keyring sp5100_tco nvme_auth wmi serio_raw fuse scsi_dh_alua i2c_dev scsi_dh_rdac scsi_dh_emc
> > Jun 14 18:08:32 dalek kernel: CPU: 15 UID: 0 PID: 369 Comm: kworker/15:1 Not tainted 7.1.0-rc7+ #786 PREEMPT(lazy)
> > Jun 14 18:08:32 dalek kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Pro4, BIOS P3.10 07/13/2020
> > Jun 14 18:08:32 dalek kernel: Workqueue: kmirrord do_mirror
> > Jun 14 18:08:32 dalek kernel: RIP: 0010:bio_add_page+0x18b/0x250
> > Jun 14 18:08:32 dalek kernel: Code: 24 10 4c 8b 04 24 84 c0 0f 85 c9 00 00 00 41 0f b7 40 78 48 8b 74 24 08 8b 4c 24 14 e9 b4 fe ff ff 0f 0b 31 c0 e9 55 d1 af 00 <0f> 0b eb f5 48 8b 7f 08 83 7f 60 05 0f 85 00 ff ff ff 49 8b 3b 4c
> > Jun 14 18:08:32 dalek kernel: RSP: 0018:ffffd1fb8176fc10 EFLAGS: 00010246
> > Jun 14 18:08:32 dalek kernel: RAX: 0000000000000000 RBX: ffffd1fb8176fd18 RCX: 0000000000000000
> > Jun 14 18:08:32 dalek kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8d1a8eb28b00
> > Jun 14 18:08:32 dalek kernel: RBP: 0000000000000000 R08: ffffd1fb8176fc38 R09: ffffd1fb8176fc40
> > Jun 14 18:08:32 dalek kernel: R10: ffffd1fb8176fc34 R11: 0000000000000000 R12: 0000000000000000
> > Jun 14 18:08:32 dalek kernel: R13: ffffd1fb8176fd90 R14: 0000000000000001 R15: ffff8d1a8eb28b00
> > Jun 14 18:08:32 dalek kernel: FS: 0000000000000000(0000) GS:ffff8d29d161f000(0000) knlGS:0000000000000000
> > Jun 14 18:08:32 dalek kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > Jun 14 18:08:32 dalek kernel: CR2: 00007f0ddcd7b9d0 CR3: 000000023dcbf000 CR4: 0000000000350ef0
> > Jun 14 18:08:32 dalek kernel: Call Trace:
> > Jun 14 18:08:32 dalek kernel: <TASK>
> > Jun 14 18:08:32 dalek kernel: do_region+0x227/0x2a0
>
> I think the problem is that do_region is tracking the "remaining" in
> sector granularity, but devices can have dma alignment such that it's
> valid to have sub-sector vectors. Rounding the length appended
> to_sectors() creates a 0 length subtraction, so the loop thinks no
> progress is made and loops forever. If we track it in bytes instead of
> sectors, then that should fix this observation.
I recreated your observation and this patch below appears to fix the
stuck behavior.
---
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 1db565b376200..d72b9331c2fd1 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -362,19 +362,26 @@ static void do_region(const blk_opf_t opf, unsigned int region,
bio->bi_iter.bi_size = num_sectors << SECTOR_SHIFT;
remaining -= num_sectors;
} else {
- while (remaining) {
+ unsigned long byte_remaining = to_bytes(remaining);
+
+ while (byte_remaining) {
/*
* Try and add as many pages as possible.
*/
dp->get_page(dp, &page, &len, &offset);
- len = min(len, to_bytes(remaining));
+ len = min(len, byte_remaining);
if (!bio_add_page(bio, page, len, offset))
break;
offset = 0;
- remaining -= to_sector(len);
+ byte_remaining -= len;
dp->next_page(dp);
}
+ remaining = to_sector(byte_remaining);
}
atomic_inc(&io->count);
--
next prev parent reply other threads:[~2026-06-15 15:35 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-14 17:57 Repeatable, raid1+O_DIRECT, hang/warn Dr. David Alan Gilbert
2026-06-15 10:34 ` Thorsten Leemhuis
2026-06-15 12:50 ` Dr. David Alan Gilbert
2026-06-15 13:07 ` Zdenek Kabelac
2026-06-15 13:20 ` Dr. David Alan Gilbert
2026-06-15 15:20 ` Keith Busch
2026-06-15 15:35 ` Keith Busch [this message]
2026-06-15 16:37 ` Dr. David Alan Gilbert
2026-06-15 17:19 ` Keith Busch
2026-06-15 17:42 ` Dr. David Alan Gilbert
2026-06-15 19:25 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajAb0m9cNraQn2Pw@kbusch-mbp \
--to=kbusch@kernel.org \
--cc=dm-devel@lists.linux.dev \
--cc=linux-block@vger.kernel.org \
--cc=linux@treblig.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox