All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Tomáš Trnka" <trnka@scm.com>
To: Jens Axboe <axboe@kernel.dk>, linux-kernel@vger.kernel.org
Cc: regressions@lists.linux.dev, linux-block@vger.kernel.org,
	Keith Busch <kbusch@kernel.org>
Subject: [REGRESSION][BISECTED] Spurious raid1 device failure triggered by qemu direct IO on 6.18+
Date: Wed, 15 Apr 2026 14:18:59 +0200	[thread overview]
Message-ID: <2982107.4sosBPzcNG@electra> (raw)

Since 6.18, booting a VM that is backed by a raid1 LVM LV makes that LV 
immediately eject one of the devices. This is apparently because of a direct 
IO read by QEMU failing. I have bisected the issue to the following commit and 
confirmed that reverting that commit (plus dependencies 
9eab1d4e0d15b633adc170c458c51e8be3b1c553 and 
b475272f03ca5d0c437c8f899ff229b21010ec83) on top of 6.19.11 fixes the issue.

commit 5ff3f74e145adc79b49668adb8de276446acf6be
Author: Keith Busch <kbusch@kernel.org>
Date:   Wed Aug 27 07:12:54 2025 -0700

    block: simplify direct io validity check
    
    The block layer checks all the segments for validity later, so no need
    for an early check. Just reduce it to a simple position and total length
    check, and defer the more invasive segment checks to the block layer.
    
    Signed-off-by: Keith Busch <kbusch@kernel.org>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

The issue looks like this:

md/raid1:mdX: dm-17: rescheduling sector 0
md/raid1:mdX: redirecting sector 0 to other mirror: dm-17
(snipped 9 repeats of the preceding two lines)
md/raid1:mdX: dm-17: Raid device exceeded read_error threshold [cur 21:max 20]
md/raid1:mdX: dm-17: Failing raid device
md/raid1:mdX: Disk failure on dm-17, disabling device.
md/raid1:mdX: Operation continuing on 1 devices.

There's absolutely nothing wrong with the HW, the issue persists even when I 
move the mirrors to a different pair of PVs (SAS HDD vs SATA SSD).

The following command is enough to trigger the issue:

/usr/bin/qemu-system-x86_64 -blockdev '{"driver":"host_device","filename":"/
dev/vg_mintaka/lv_test","aio":"native","node-name":"libvirt-1-storage","read-
only":false,"discard":"unmap","cache":{"direct":true,"no-flush":false}}'

According to blktrace below, this seems to be an ordinary direct IO read of 
sectors 0-7, but I can't reproduce the issue emulating such a read with dd.

The beginning of blktrace for dm-20 (mirror LV):

252,20   0        3     0.050436097 17815  Q  RS 0 + 8 [qemu-system-x86]
252,20   4        1     0.053590884 17179  C  RS 0 + 8 [65531]
252,20   9        1     0.071942534 17843  Q  RS 0 + 1 [worker]
252,20  10        1     0.077792770 10803  C  RS 0 + 1 [0]

for dm-17 (one of the legs of the raid1 mirror):

252,17   0        3     0.050441207 17815  Q  RS 0 + 8 [qemu-system-x86]
252,17   0        4     0.050465318 17815  C  RS 0 + 8 [65514]
252,17   4        1     0.050491948 17179  Q  RS 0 + 8 [mdX_raid1]
252,17  12        1     0.050695772 12662  C  RS 0 + 8 [0]

for sda1 that holds that leg (raid1 LV on dm-crypt on sda1; bfq messages 
snipped):

  8,0    0        7     0.050453828 17815  A  RS 902334464 + 8 <- (252,5) 
902301696
  8,0    0        8     0.050453988 17815  A  RS 902336512 + 8 <- (8,1) 
902334464
  8,1    0        9     0.050454158 17815  Q  RS 902336512 + 8 [qemu-system-
x86]
  8,1    0       10     0.050455058 17815  C  RS 902336512 + 8 [65514]
  8,0    4        1     0.050490699 17179  A  RS 902334464 + 8 <- (252,5) 
902301696
  8,0    4        2     0.050490849 17179  A  RS 902336512 + 8 <- (8,1) 
902334464
  8,1    4        3     0.050491009 17179  Q  RS 902336512 + 8 [mdX_raid1]
  8,1    4        4     0.050498089 17179  G  RS 902336512 + 8 [mdX_raid1]
  8,1    4        5     0.050500129 17179  P   N [mdX_raid1]
  8,1    4        6     0.050500939 17179 UT   N [mdX_raid1] 1
  8,1    4        7     0.050507439 17179  I  RS 902336512 + 8 [mdX_raid1]
  8,1    4        8     0.050531999   387  D  RS 902336512 + 8 [kworker/4:1H]
  8,1   15        1     0.050668902     0  C  RS 902336512 + 8 [0]

for sdb1 (backing the other leg of the mirror):

  8,16   4        1     0.053558754 17179  A  RS 902334464 + 8 <- (252,4) 
902301696
  8,16   4        2     0.053558884 17179  A  RS 902336512 + 8 <- (8,17) 
902334464
  8,17   4        3     0.053559024 17179  Q  RS 902336512 + 8 [mdX_raid1]
  8,17   4        4     0.053559364 17179  C  RS 902336512 + 8 [65514]
  8,17   4        0     0.053570484 17179 1,0  m   N bfq [bfq_limit_depth] 
wr_busy 0 sync 1 depth 48
  8,17   4        5     0.053578104   387  D  FN [kworker/4:1H]
  8,17  15        1     0.053647696 17192  C  FN 0 [0]
  8,17  15        2     0.053815039   567  D  FN [kworker/15:1H]
  8,17  15        3     0.053872560 17192  C  FN 0 [0]

Full logs can be downloaded from:
https://is.muni.cz/de/ttrnka/qemu-dio-raid1-fail/dmesg.log
https://is.muni.cz/de/ttrnka/qemu-dio-raid1-fail/mapped-devs.lst
https://is.muni.cz/de/ttrnka/qemu-dio-raid1-fail/blktrace.tar.gz

lsblk output (from a different boot, minors might not match mapped-devs.lst):

https://is.muni.cz/de/ttrnka/qemu-dio-raid1-fail/lsblk-t.out
https://is.muni.cz/de/ttrnka/qemu-dio-raid1-fail/lsblk.out

I can share any other info or logs, test patches, or poke around with ftrace 
or systemtap as needed.

#regzbot introduced: 5ff3f74e145adc79b49668adb8de276446acf6be

Best regards,

Tomáš
-- 
Tomáš Trnka
Software for Chemistry & Materials B.V.
De Boelelaan 1109
1081 HV Amsterdam, The Netherlands
https://www.scm.com




             reply	other threads:[~2026-04-15 12:26 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-15 12:18 Tomáš Trnka [this message]
2026-04-15 15:20 ` [REGRESSION][BISECTED] Spurious raid1 device failure triggered by qemu direct IO on 6.18+ Keith Busch
2026-04-15 15:52   ` Keith Busch
2026-04-15 22:59     ` Keith Busch
2026-04-16 10:13     ` Tomáš Trnka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2982107.4sosBPzcNG@electra \
    --to=trnka@scm.com \
    --cc=axboe@kernel.dk \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.