[REGRESSION][BISECTED] Spurious raid1 device failure triggered by qemu direct IO on 6.18+

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

From: "Tomáš Trnka" <trnka@scm.com>
To: Jens Axboe <axboe@kernel.dk>, linux-kernel@vger.kernel.org
Cc: regressions@lists.linux.dev, linux-block@vger.kernel.org,
	Keith Busch <kbusch@kernel.org>
Subject: [REGRESSION][BISECTED] Spurious raid1 device failure triggered by qemu direct IO on 6.18+
Date: Wed, 15 Apr 2026 14:18:59 +0200	[thread overview]
Message-ID: <2982107.4sosBPzcNG@electra> (raw)

Since 6.18, booting a VM that is backed by a raid1 LVM LV makes that LV 
immediately eject one of the devices. This is apparently because of a direct 
IO read by QEMU failing. I have bisected the issue to the following commit and 
confirmed that reverting that commit (plus dependencies 
9eab1d4e0d15b633adc170c458c51e8be3b1c553 and 
b475272f03ca5d0c437c8f899ff229b21010ec83) on top of 6.19.11 fixes the issue.

commit 5ff3f74e145adc79b49668adb8de276446acf6be
Author: Keith Busch <kbusch@kernel.org>
Date:   Wed Aug 27 07:12:54 2025 -0700

    block: simplify direct io validity check
    
    The block layer checks all the segments for validity later, so no need
    for an early check. Just reduce it to a simple position and total length
    check, and defer the more invasive segment checks to the block layer.
    
    Signed-off-by: Keith Busch <kbusch@kernel.org>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

The issue looks like this:

md/raid1:mdX: dm-17: rescheduling sector 0
md/raid1:mdX: redirecting sector 0 to other mirror: dm-17
(snipped 9 repeats of the preceding two lines)
md/raid1:mdX: dm-17: Raid device exceeded read_error threshold [cur 21:max 20]
md/raid1:mdX: dm-17: Failing raid device
md/raid1:mdX: Disk failure on dm-17, disabling device.
md/raid1:mdX: Operation continuing on 1 devices.

There's absolutely nothing wrong with the HW, the issue persists even when I 
move the mirrors to a different pair of PVs (SAS HDD vs SATA SSD).

The following command is enough to trigger the issue:

/usr/bin/qemu-system-x86_64 -blockdev '{"driver":"host_device","filename":"/
dev/vg_mintaka/lv_test","aio":"native","node-name":"libvirt-1-storage","read-
only":false,"discard":"unmap","cache":{"direct":true,"no-flush":false}}'

According to blktrace below, this seems to be an ordinary direct IO read of 
sectors 0-7, but I can't reproduce the issue emulating such a read with dd.

The beginning of blktrace for dm-20 (mirror LV):

252,20   0        3     0.050436097 17815  Q  RS 0 + 8 [qemu-system-x86]
252,20   4        1     0.053590884 17179  C  RS 0 + 8 [65531]
252,20   9        1     0.071942534 17843  Q  RS 0 + 1 [worker]
252,20  10        1     0.077792770 10803  C  RS 0 + 1 [0]

for dm-17 (one of the legs of the raid1 mirror):

252,17   0        3     0.050441207 17815  Q  RS 0 + 8 [qemu-system-x86]
252,17   0        4     0.050465318 17815  C  RS 0 + 8 [65514]
252,17   4        1     0.050491948 17179  Q  RS 0 + 8 [mdX_raid1]
252,17  12        1     0.050695772 12662  C  RS 0 + 8 [0]

for sda1 that holds that leg (raid1 LV on dm-crypt on sda1; bfq messages 
snipped):

  8,0    0        7     0.050453828 17815  A  RS 902334464 + 8 <- (252,5) 
902301696
  8,0    0        8     0.050453988 17815  A  RS 902336512 + 8 <- (8,1) 
902334464
  8,1    0        9     0.050454158 17815  Q  RS 902336512 + 8 [qemu-system-
x86]
  8,1    0       10     0.050455058 17815  C  RS 902336512 + 8 [65514]
  8,0    4        1     0.050490699 17179  A  RS 902334464 + 8 <- (252,5) 
902301696
  8,0    4        2     0.050490849 17179  A  RS 902336512 + 8 <- (8,1) 
902334464
  8,1    4        3     0.050491009 17179  Q  RS 902336512 + 8 [mdX_raid1]
  8,1    4        4     0.050498089 17179  G  RS 902336512 + 8 [mdX_raid1]
  8,1    4        5     0.050500129 17179  P   N [mdX_raid1]
  8,1    4        6     0.050500939 17179 UT   N [mdX_raid1] 1
  8,1    4        7     0.050507439 17179  I  RS 902336512 + 8 [mdX_raid1]
  8,1    4        8     0.050531999   387  D  RS 902336512 + 8 [kworker/4:1H]
  8,1   15        1     0.050668902     0  C  RS 902336512 + 8 [0]

for sdb1 (backing the other leg of the mirror):

  8,16   4        1     0.053558754 17179  A  RS 902334464 + 8 <- (252,4) 
902301696
  8,16   4        2     0.053558884 17179  A  RS 902336512 + 8 <- (8,17) 
902334464
  8,17   4        3     0.053559024 17179  Q  RS 902336512 + 8 [mdX_raid1]
  8,17   4        4     0.053559364 17179  C  RS 902336512 + 8 [65514]
  8,17   4        0     0.053570484 17179 1,0  m   N bfq [bfq_limit_depth] 
wr_busy 0 sync 1 depth 48
  8,17   4        5     0.053578104   387  D  FN [kworker/4:1H]
  8,17  15        1     0.053647696 17192  C  FN 0 [0]
  8,17  15        2     0.053815039   567  D  FN [kworker/15:1H]
  8,17  15        3     0.053872560 17192  C  FN 0 [0]

Full logs can be downloaded from:
https://is.muni.cz/de/ttrnka/qemu-dio-raid1-fail/dmesg.log
https://is.muni.cz/de/ttrnka/qemu-dio-raid1-fail/mapped-devs.lst
https://is.muni.cz/de/ttrnka/qemu-dio-raid1-fail/blktrace.tar.gz

lsblk output (from a different boot, minors might not match mapped-devs.lst):

https://is.muni.cz/de/ttrnka/qemu-dio-raid1-fail/lsblk-t.out
https://is.muni.cz/de/ttrnka/qemu-dio-raid1-fail/lsblk.out

I can share any other info or logs, test patches, or poke around with ftrace 
or systemtap as needed.

#regzbot introduced: 5ff3f74e145adc79b49668adb8de276446acf6be

Best regards,

Tomáš
-- 
Tomáš Trnka
Software for Chemistry & Materials B.V.
De Boelelaan 1109
1081 HV Amsterdam, The Netherlands
https://www.scm.com

next             reply	other threads:[~2026-04-15 12:26 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-15 12:18 Tomáš Trnka [this message]
2026-04-15 15:20 ` [REGRESSION][BISECTED] Spurious raid1 device failure triggered by qemu direct IO on 6.18+ Keith Busch
2026-04-15 15:52   ` Keith Busch
2026-04-15 22:59     ` Keith Busch
2026-04-16 10:13     ` Tomáš Trnka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2982107.4sosBPzcNG@electra \
    --to=trnka@scm.com \
    --cc=axboe@kernel.dk \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox