From: Keith Busch <kbusch@kernel.org>
To: "Tomás Trnka" <trnka@scm.com>
Cc: Jens Axboe <axboe@kernel.dk>,
linux-kernel@vger.kernel.org, regressions@lists.linux.dev,
linux-block@vger.kernel.org
Subject: Re: [REGRESSION][BISECTED] Spurious raid1 device failure triggered by qemu direct IO on 6.18+
Date: Wed, 15 Apr 2026 09:20:06 -0600 [thread overview]
Message-ID: <ad-sprBWPA20iola@kbusch-mbp> (raw)
In-Reply-To: <2982107.4sosBPzcNG@electra>
On Wed, Apr 15, 2026 at 02:18:59PM +0200, Tomás Trnka wrote:
> Since 6.18, booting a VM that is backed by a raid1 LVM LV makes that LV
> immediately eject one of the devices. This is apparently because of a direct
> IO read by QEMU failing. I have bisected the issue to the following commit and
> confirmed that reverting that commit (plus dependencies
> 9eab1d4e0d15b633adc170c458c51e8be3b1c553 and
> b475272f03ca5d0c437c8f899ff229b21010ec83) on top of 6.19.11 fixes the issue.
>
> commit 5ff3f74e145adc79b49668adb8de276446acf6be
> Author: Keith Busch <kbusch@kernel.org>
> Date: Wed Aug 27 07:12:54 2025 -0700
>
> block: simplify direct io validity check
>
> The block layer checks all the segments for validity later, so no need
> for an early check. Just reduce it to a simple position and total length
> check, and defer the more invasive segment checks to the block layer.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>
> The issue looks like this:
>
> md/raid1:mdX: dm-17: rescheduling sector 0
> md/raid1:mdX: redirecting sector 0 to other mirror: dm-17
> (snipped 9 repeats of the preceding two lines)
> md/raid1:mdX: dm-17: Raid device exceeded read_error threshold [cur 21:max 20]
> md/raid1:mdX: dm-17: Failing raid device
> md/raid1:mdX: Disk failure on dm-17, disabling device.
> md/raid1:mdX: Operation continuing on 1 devices.
>
> There's absolutely nothing wrong with the HW, the issue persists even when I
> move the mirrors to a different pair of PVs (SAS HDD vs SATA SSD).
Thanks for the notice and the logs. Sounds likek something is getting
sent to the block layer that can't form a viable IO. The commit you
identified should have still error'ed though, just much earlier in the
call stack.
Anyway, I'll take a look if there's something I missed handling with the
stacking setup you're describing.
next prev parent reply other threads:[~2026-04-15 15:20 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-15 12:18 [REGRESSION][BISECTED] Spurious raid1 device failure triggered by qemu direct IO on 6.18+ Tomáš Trnka
2026-04-15 15:20 ` Keith Busch [this message]
2026-04-15 15:52 ` Keith Busch
2026-04-15 22:59 ` Keith Busch
2026-04-16 10:13 ` Tomáš Trnka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ad-sprBWPA20iola@kbusch-mbp \
--to=kbusch@kernel.org \
--cc=axboe@kernel.dk \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=regressions@lists.linux.dev \
--cc=trnka@scm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.