From: "Dr. David Alan Gilbert" <dave@treblig.org>
To: Keith Busch <kbusch@kernel.org>, zkabelac@redhat.com
Cc: Vjaceslavs Klimovs <vklimovs@gmail.com>,
Thorsten Leemhuis <regressions@leemhuis.info>,
trnka@scm.com, linux-block@vger.kernel.org,
dm-devel@lists.linux.dev,
Linux kernel regressions list <regressions@lists.linux.dev>
Subject: Re: Repeatable, raid1+O_DIRECT, hang/warn
Date: Tue, 16 Jun 2026 13:08:52 +0000 [thread overview]
Message-ID: <ajFK5NXkxd6jU5zu@gallifrey> (raw)
In-Reply-To: <ajFISH9bvyWjLOM6@gallifrey>
* Dr. David Alan Gilbert (dave@treblig.org) wrote:
> * Keith Busch (kbusch@kernel.org) wrote:
> > On Mon, Jun 15, 2026 at 04:16:12PM -0700, Vjaceslavs Klimovs wrote:
> > > Your trace looks like what the two earlier reports hit: a read reaching
> > > a leaf device with sectors > 0 but phys_seg 0 (an empty bio). One aside
> > > that may help read the trace: blk_io_trace.error is a __u16, so the
> > > bracketed values on your C lines are errnos as u16 (65514 = -EINVAL,
> > > 65531 = -EIO).
> > >
> > > The WARN itself is new, the bad bio isn't. bio_add_page() only started
> > > rejecting len == 0 in 643893647cac ("block: reject zero length in
> > > bio_add_page()", v7.1-rc1); on 7.0.8 the same empty bio tripped
> > > scsi_alloc_sgtables()'s !nr_segs instead, which matches what you saw.
> > > That fits your "not a recent regression": the condition is older, v7.1
> > > just made it loud.
> > >
> > > For Tomas's and my reports (QEMU O_DIRECT to the LV block device) the
> > > origin looks like 5ff3f74e145a ("block: simplify direct io validity
> > > check", v6.18): blkdev_dio_invalid() now checks only aggregate
> > > ki_pos | count alignment and dropped the per-segment
> > > bdev_iter_is_aligned() walk, so a degenerate or misaligned O_DIRECT no
> > > longer gets -EINVAL at the fops boundary. But your reproducer reads a
> > > file, which goes through the filesystem O_DIRECT path and never calls
> > > blkdev_dio_invalid(), and still makes the empty bio. So it isn't only
> > > that one entry point.
> > >
> > > dm-mirror then hangs because Keith's f7b24c7b41f2 only covers md
> > > raid1/raid10; legacy dm-mirror (dm-raid1.c) has no equivalent and
> > > rebuilds the empty read onto the other leg. Note the leg's status isn't
> > > even consistent (your SATA path returns BLK_STS_IOERR, not
> > > BLK_STS_INVAL), so copying that status check into dm-mirror probably
> > > wouldn't catch every case.
> > >
> > > For what it's worth, that points me toward rejecting the empty or
> > > misaligned bio once, at submission, with -EINVAL, rather than teaching
> > > each consumer to tolerate it. But you'll know the tradeoffs far better
> > > than I do.
> > >
> > > I have a small QEMU + LVM raid1/mirror setup that reproduces the
> > > block-device variant and bisects to 5ff3f74e. Happy to run your file
> > > reproducer with some instrumentation at the dm-mirror read entry
> > > (bi_size vs bio_sectors vs bvec lengths) to see whether the bio is
> > > already empty on arrival or built that way on the retry, and to test
> > > any patch.
> >
> > Thanks for following up here. I didn't initially see your follow-up
> > until Thorsten linked it. I apologize for missing that, this feature is
> > important so I don't want to see anything regress for it.
> >
> > There is a known bug fix I think future tests should include:
> >
> > https://lore.kernel.org/linux-block/20260612223205.465913-1-kbusch@meta.com/
>
> > This likely isn't the fix you're looking for, but including it rules out
> > conditions that are not important here.
> >
> > After that, can we try this suggestion and see if the hang goes away?
> >
> > https://lore.kernel.org/linux-block/ajBb8tK-0aJBpIgF@kbusch-mbp/
>
> With just that one in, the machine survives - thanks!
>
> It does give:
>
> [ 505.208354] device-mapper: raid1: Mirror read failed from 252:24. Trying alternative device.
> [ 505.239376] device-mapper: raid1: All sides of mirror have failed.
> [ 505.239389] device-mapper: raid1: Read failure on mirror device 252:25. Failing I/O.
> [ 505.239394] device-mapper: raid1: Mirror read failed.
>
> Although as far as I can tell the RAID hasn't errored and is still in sync.
>
> If I turn the test case into a write (just s/pread/pwrite/ ) - the machine
> still survives but then it does lose raid sync, and the raid resync
> seems to stick until I do a 'lvchange --refresh main/lvol0'
> which recovers after having spat out a:
>
> [ 865.319527] Buffer I/O error on dev dm-26, logical block 262128, async page read
>
> > I expect the original test case to still return an error (and I think it
> > was designed to), but it shouldn't produce the warn or bug splats with a
> > stuck uninterruptable task.
>
> It's not clear to me if it was designed to fail or not; I've not had
> a chance to rerun the original qemu block tests yet, and I don't know
> if old kernels succesfully used O_DIRECT in this case.
>
> It still feels that my pwrite case above shouldn't cause a raid de-sync
> (especially since a normal user can do it).
Just to follow up on that; if I use the modern lvm mode
( lvcreate -m 1 -L 1G main /dev/sda2 /dev/sdb2 ) rather than
the old mirror with the same patch, then:
a) I get no log errors with either read or write
b) read still gives EIO
c) write apparently succeeds ?!
Dave
> Dave
> --
> -----Open up your eyes, open up your mind, open up your code -------
> / Dr. David Alan Gilbert | Running GNU/Linux | Happy \
> \ dave @ treblig.org | | In Hex /
> \ _________________________|_____ http://www.treblig.org |_______/
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
next prev parent reply other threads:[~2026-06-16 13:08 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-14 17:57 Repeatable, raid1+O_DIRECT, hang/warn Dr. David Alan Gilbert
2026-06-15 10:34 ` Thorsten Leemhuis
2026-06-15 12:50 ` Dr. David Alan Gilbert
2026-06-15 23:16 ` Vjaceslavs Klimovs
2026-06-16 0:06 ` Keith Busch
2026-06-16 1:25 ` Vjaceslavs Klimovs
2026-06-16 12:57 ` Dr. David Alan Gilbert
2026-06-16 13:08 ` Dr. David Alan Gilbert [this message]
2026-06-16 14:04 ` Dr. David Alan Gilbert
2026-06-16 14:19 ` Keith Busch
2026-06-16 15:55 ` Dr. David Alan Gilbert
2026-06-16 15:55 ` Mikulas Patocka
2026-06-16 16:05 ` Keith Busch
2026-06-15 13:07 ` Zdenek Kabelac
2026-06-15 13:20 ` Dr. David Alan Gilbert
2026-06-15 15:20 ` Keith Busch
2026-06-15 15:35 ` Keith Busch
2026-06-15 16:37 ` Dr. David Alan Gilbert
2026-06-15 17:19 ` Keith Busch
2026-06-15 17:42 ` Dr. David Alan Gilbert
2026-06-15 19:25 ` Keith Busch
2026-06-15 20:09 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajFK5NXkxd6jU5zu@gallifrey \
--to=dave@treblig.org \
--cc=dm-devel@lists.linux.dev \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=regressions@leemhuis.info \
--cc=regressions@lists.linux.dev \
--cc=trnka@scm.com \
--cc=vklimovs@gmail.com \
--cc=zkabelac@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox