public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [REGRESSION][BISECTED] Spurious raid1 device failure triggered by qemu direct IO on 6.18+
@ 2026-04-15 12:18 Tomáš Trnka
  2026-04-15 15:20 ` Keith Busch
  0 siblings, 1 reply; 5+ messages in thread
From: Tomáš Trnka @ 2026-04-15 12:18 UTC (permalink / raw)
  To: Jens Axboe, linux-kernel; +Cc: regressions, linux-block, Keith Busch

Since 6.18, booting a VM that is backed by a raid1 LVM LV makes that LV 
immediately eject one of the devices. This is apparently because of a direct 
IO read by QEMU failing. I have bisected the issue to the following commit and 
confirmed that reverting that commit (plus dependencies 
9eab1d4e0d15b633adc170c458c51e8be3b1c553 and 
b475272f03ca5d0c437c8f899ff229b21010ec83) on top of 6.19.11 fixes the issue.

commit 5ff3f74e145adc79b49668adb8de276446acf6be
Author: Keith Busch <kbusch@kernel.org>
Date:   Wed Aug 27 07:12:54 2025 -0700

    block: simplify direct io validity check
    
    The block layer checks all the segments for validity later, so no need
    for an early check. Just reduce it to a simple position and total length
    check, and defer the more invasive segment checks to the block layer.
    
    Signed-off-by: Keith Busch <kbusch@kernel.org>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

The issue looks like this:

md/raid1:mdX: dm-17: rescheduling sector 0
md/raid1:mdX: redirecting sector 0 to other mirror: dm-17
(snipped 9 repeats of the preceding two lines)
md/raid1:mdX: dm-17: Raid device exceeded read_error threshold [cur 21:max 20]
md/raid1:mdX: dm-17: Failing raid device
md/raid1:mdX: Disk failure on dm-17, disabling device.
md/raid1:mdX: Operation continuing on 1 devices.

There's absolutely nothing wrong with the HW, the issue persists even when I 
move the mirrors to a different pair of PVs (SAS HDD vs SATA SSD).

The following command is enough to trigger the issue:

/usr/bin/qemu-system-x86_64 -blockdev '{"driver":"host_device","filename":"/
dev/vg_mintaka/lv_test","aio":"native","node-name":"libvirt-1-storage","read-
only":false,"discard":"unmap","cache":{"direct":true,"no-flush":false}}'

According to blktrace below, this seems to be an ordinary direct IO read of 
sectors 0-7, but I can't reproduce the issue emulating such a read with dd.

The beginning of blktrace for dm-20 (mirror LV):

252,20   0        3     0.050436097 17815  Q  RS 0 + 8 [qemu-system-x86]
252,20   4        1     0.053590884 17179  C  RS 0 + 8 [65531]
252,20   9        1     0.071942534 17843  Q  RS 0 + 1 [worker]
252,20  10        1     0.077792770 10803  C  RS 0 + 1 [0]

for dm-17 (one of the legs of the raid1 mirror):

252,17   0        3     0.050441207 17815  Q  RS 0 + 8 [qemu-system-x86]
252,17   0        4     0.050465318 17815  C  RS 0 + 8 [65514]
252,17   4        1     0.050491948 17179  Q  RS 0 + 8 [mdX_raid1]
252,17  12        1     0.050695772 12662  C  RS 0 + 8 [0]

for sda1 that holds that leg (raid1 LV on dm-crypt on sda1; bfq messages 
snipped):

  8,0    0        7     0.050453828 17815  A  RS 902334464 + 8 <- (252,5) 
902301696
  8,0    0        8     0.050453988 17815  A  RS 902336512 + 8 <- (8,1) 
902334464
  8,1    0        9     0.050454158 17815  Q  RS 902336512 + 8 [qemu-system-
x86]
  8,1    0       10     0.050455058 17815  C  RS 902336512 + 8 [65514]
  8,0    4        1     0.050490699 17179  A  RS 902334464 + 8 <- (252,5) 
902301696
  8,0    4        2     0.050490849 17179  A  RS 902336512 + 8 <- (8,1) 
902334464
  8,1    4        3     0.050491009 17179  Q  RS 902336512 + 8 [mdX_raid1]
  8,1    4        4     0.050498089 17179  G  RS 902336512 + 8 [mdX_raid1]
  8,1    4        5     0.050500129 17179  P   N [mdX_raid1]
  8,1    4        6     0.050500939 17179 UT   N [mdX_raid1] 1
  8,1    4        7     0.050507439 17179  I  RS 902336512 + 8 [mdX_raid1]
  8,1    4        8     0.050531999   387  D  RS 902336512 + 8 [kworker/4:1H]
  8,1   15        1     0.050668902     0  C  RS 902336512 + 8 [0]

for sdb1 (backing the other leg of the mirror):

  8,16   4        1     0.053558754 17179  A  RS 902334464 + 8 <- (252,4) 
902301696
  8,16   4        2     0.053558884 17179  A  RS 902336512 + 8 <- (8,17) 
902334464
  8,17   4        3     0.053559024 17179  Q  RS 902336512 + 8 [mdX_raid1]
  8,17   4        4     0.053559364 17179  C  RS 902336512 + 8 [65514]
  8,17   4        0     0.053570484 17179 1,0  m   N bfq [bfq_limit_depth] 
wr_busy 0 sync 1 depth 48
  8,17   4        5     0.053578104   387  D  FN [kworker/4:1H]
  8,17  15        1     0.053647696 17192  C  FN 0 [0]
  8,17  15        2     0.053815039   567  D  FN [kworker/15:1H]
  8,17  15        3     0.053872560 17192  C  FN 0 [0]

Full logs can be downloaded from:
https://is.muni.cz/de/ttrnka/qemu-dio-raid1-fail/dmesg.log
https://is.muni.cz/de/ttrnka/qemu-dio-raid1-fail/mapped-devs.lst
https://is.muni.cz/de/ttrnka/qemu-dio-raid1-fail/blktrace.tar.gz

lsblk output (from a different boot, minors might not match mapped-devs.lst):

https://is.muni.cz/de/ttrnka/qemu-dio-raid1-fail/lsblk-t.out
https://is.muni.cz/de/ttrnka/qemu-dio-raid1-fail/lsblk.out

I can share any other info or logs, test patches, or poke around with ftrace 
or systemtap as needed.

#regzbot introduced: 5ff3f74e145adc79b49668adb8de276446acf6be

Best regards,

Tomáš
-- 
Tomáš Trnka
Software for Chemistry & Materials B.V.
De Boelelaan 1109
1081 HV Amsterdam, The Netherlands
https://www.scm.com




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-16 10:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-15 12:18 [REGRESSION][BISECTED] Spurious raid1 device failure triggered by qemu direct IO on 6.18+ Tomáš Trnka
2026-04-15 15:20 ` Keith Busch
2026-04-15 15:52   ` Keith Busch
2026-04-15 22:59     ` Keith Busch
2026-04-16 10:13     ` Tomáš Trnka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox