* [PATCH 1/2] dm-io: clone the source bio instead of copying its biovec
@ 2026-06-16 15:05 Keith Busch
2026-06-16 15:05 ` [PATCH 2/2] dm-raid1: don't fail the mirror for invalid I/O errors Keith Busch
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Keith Busch @ 2026-06-16 15:05 UTC (permalink / raw)
To: dm-devel
Cc: linux-block, mpatocka, Keith Busch, Dr. David Alan Gilbert,
Vjaceslavs Klimovs
From: Keith Busch <kbusch@kernel.org>
For DM_IO_BIO requests, do_region() built each destination bio by walking
the source bio's biovec and re-adding the pages one at a time, tracking
the remaining transfer in sectors. The vector lengths are byte granular
and need not be sector aligned (e.g. a misaligned O_DIRECT buffer split
across pages), so the sector-based accounting could lose a sub-sector
fragment: to_sector() truncated the remainder and the outer loop spun
forever submitting empty bios, hanging the I/O.
There is no need to rebuild the biovec at all. The destination reads into
(or writes from) exactly the same pages as the source bio, so the bio can
simply clone the source's biovec with bio_alloc_clone() and remap it to
the target device. The clone inherits the source's iterator and alignment,
and the block layer splits it to the target's limits on submission, so the
whole region maps to a single cloned bio with no manual page copying or
sector accounting.
This removes the per-page copy path (and its open-coded bvec dpages
helpers) for bio-backed I/O and fixes the hang on misaligned direct I/O to
a dm-mirror device. Page-list, vma and kmem sources keep the existing copy
path.
Fixes: 7eac33186957 ("iomap: simplify direct io validity check")
Fixes: 5ff3f74e145a ("block: simplify direct io validity check")
Reported-by: Dr. David Alan Gilbert <linux@treblig.org>
Reported-by: Vjaceslavs Klimovs <vklimovs@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/md/dm-io.c | 67 +++++++++++++++++-----------------------------
1 file changed, 24 insertions(+), 43 deletions(-)
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 1db565b376200..28adfeb58f240 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -170,12 +170,11 @@ struct dpages {
struct page **p, unsigned long *len, unsigned int *offset);
void (*next_page)(struct dpages *dp);
- union {
- unsigned int context_u;
- struct bvec_iter context_bi;
- };
+ unsigned int context_u;
void *context_ptr;
+ struct bio *orig_bio;
+
void *vma_invalidate_address;
unsigned long vma_invalidate_size;
};
@@ -210,44 +209,6 @@ static void list_dp_init(struct dpages *dp, struct page_list *pl, unsigned int o
dp->context_ptr = pl;
}
-/*
- * Functions for getting the pages from a bvec.
- */
-static void bio_get_page(struct dpages *dp, struct page **p,
- unsigned long *len, unsigned int *offset)
-{
- struct bio_vec bvec = bvec_iter_bvec((struct bio_vec *)dp->context_ptr,
- dp->context_bi);
-
- *p = bvec.bv_page;
- *len = bvec.bv_len;
- *offset = bvec.bv_offset;
-
- /* avoid figuring it out again in bio_next_page() */
- dp->context_bi.bi_sector = (sector_t)bvec.bv_len;
-}
-
-static void bio_next_page(struct dpages *dp)
-{
- unsigned int len = (unsigned int)dp->context_bi.bi_sector;
-
- bvec_iter_advance((struct bio_vec *)dp->context_ptr,
- &dp->context_bi, len);
-}
-
-static void bio_dp_init(struct dpages *dp, struct bio *bio)
-{
- dp->get_page = bio_get_page;
- dp->next_page = bio_next_page;
-
- /*
- * We just use bvec iterator to retrieve pages, so it is ok to
- * access the bvec table directly here
- */
- dp->context_ptr = bio->bi_io_vec;
- dp->context_bi = bio->bi_iter;
-}
-
/*
* Functions for getting the pages from a VMA.
*/
@@ -332,6 +293,21 @@ static void do_region(const blk_opf_t opf, unsigned int region,
return;
}
+ if (dp->orig_bio) {
+ bio = bio_alloc_clone(where->bdev, dp->orig_bio, GFP_NOIO,
+ &io->client->bios);
+ bio->bi_iter.bi_sector = where->sector;
+ bio->bi_iter.bi_size = where->count << SECTOR_SHIFT;
+ bio->bi_opf = opf;
+ bio->bi_end_io = endio;
+ bio->bi_ioprio = ioprio;
+ store_io_and_region_in_bio(bio, io, region);
+
+ atomic_inc(&io->count);
+ submit_bio(bio);
+ return;
+ }
+
/*
* where->count may be zero if op holds a flush and we need to
* send a zero-sized flush.
@@ -468,6 +444,7 @@ static int dp_init(struct dm_io_request *io_req, struct dpages *dp,
dp->vma_invalidate_address = NULL;
dp->vma_invalidate_size = 0;
+ dp->orig_bio = NULL;
switch (io_req->mem.type) {
case DM_IO_PAGE_LIST:
@@ -475,7 +452,11 @@ static int dp_init(struct dm_io_request *io_req, struct dpages *dp,
break;
case DM_IO_BIO:
- bio_dp_init(dp, io_req->mem.ptr.bio);
+ /*
+ * The destination bios clone this bio's biovec directly, so
+ * there are no per-page accessors to set up here.
+ */
+ dp->orig_bio = io_req->mem.ptr.bio;
break;
case DM_IO_VMA:
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/2] dm-raid1: don't fail the mirror for invalid I/O errors
2026-06-16 15:05 [PATCH 1/2] dm-io: clone the source bio instead of copying its biovec Keith Busch
@ 2026-06-16 15:05 ` Keith Busch
2026-06-16 17:54 ` Dr. David Alan Gilbert
2026-06-16 15:40 ` Keith Busch
2026-06-16 15:58 ` Keith Busch
2 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2026-06-16 15:05 UTC (permalink / raw)
To: dm-devel
Cc: linux-block, mpatocka, Keith Busch, Dr. David Alan Gilbert,
Vjaceslavs Klimovs
From: Keith Busch <kbusch@kernel.org>
BLK_STS_INVAL indicates the I/O request itself was invalid (for example a
misaligned direct I/O), not that the device has failed. dm-raid1 treated
any read or write completion error as a device failure: it failed the
mirror leg, retried on the alternatives - which fail identically - and
eventually returned EIO while spuriously degrading the array.
Since commit 5ff3f74e145a ("block: simplify direct io validity check") the
direct I/O path no longer rejects misaligned buffers up front, so an
invalid bio now reaches the lower block layers, which fail it with
BLK_STS_INVAL. dm-io collapses the block status into a per-region error
bit before invoking the completion callback, so record BLK_STS_INVAL on
the originating bio and have the dm-raid1 read, write and end_io paths
propagate it instead of failing the device.
This mirrors the raid1/raid10 fix in commit f7b24c7b41f23
("md/raid1,raid10: don't fail devices for invalid IO errors") for the
device-mapper mirror target.
Fixes: 7eac33186957 ("iomap: simplify direct io validity check")
Fixes: 5ff3f74e145a ("block: simplify direct io validity check")
Reported-by: Dr. David Alan Gilbert <linux@treblig.org>
Reported-by: Vjaceslavs Klimovs <vklimovs@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/md/dm-io.c | 14 +++++++++++++-
drivers/md/dm-raid1.c | 28 +++++++++++++++++++++++++++-
2 files changed, 40 insertions(+), 2 deletions(-)
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 28adfeb58f240..f382e9f9be059 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -37,6 +37,7 @@ struct io {
struct dm_io_client *client;
io_notify_fn callback;
void *context;
+ struct bio *orig_bio;
void *vma_invalidate_address;
unsigned long vma_invalidate_size;
} __aligned(DM_IO_MAX_REGIONS);
@@ -132,8 +133,18 @@ static void complete_io(struct io *io)
static void dec_count(struct io *io, unsigned int region, blk_status_t error)
{
- if (error)
+ if (error) {
set_bit(region, &io->error_bits);
+ /*
+ * BLK_STS_INVAL means the bio was not valid for the underlying
+ * device (e.g. a misaligned direct I/O), which is a caller error
+ * rather than a device failure. Record it on the original bio so
+ * bio-based targets can propagate it instead of treating it as a
+ * media error and failing the device.
+ */
+ if (error == BLK_STS_INVAL && io->orig_bio)
+ io->orig_bio->bi_status = error;
+ }
if (atomic_dec_and_test(&io->count))
complete_io(io);
@@ -398,6 +409,7 @@ static void async_io(struct dm_io_client *client, unsigned int num_regions,
io->client = client;
io->callback = fn;
io->context = context;
+ io->orig_bio = dp->orig_bio;
io->vma_invalidate_address = dp->vma_invalidate_address;
io->vma_invalidate_size = dp->vma_invalidate_size;
diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index de5c00704e69c..022ad791c2957 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -524,6 +524,17 @@ static void read_callback(unsigned long error, void *context)
return;
}
+ /*
+ * BLK_STS_INVAL means the bio was not valid for the underlying device,
+ * e.g. a misaligned direct I/O. That is a caller error, not a device
+ * failure, so propagate it rather than failing the mirror and retrying
+ * on the other legs, which would fail the same way.
+ */
+ if (bio->bi_status == BLK_STS_INVAL) {
+ bio_endio(bio);
+ return;
+ }
+
fail_mirror(m, DM_RAID1_READ_ERROR);
if (likely(default_ok(m)) || mirror_available(m->ms, bio)) {
@@ -622,6 +633,16 @@ static void write_callback(unsigned long error, void *context)
return;
}
+ /*
+ * BLK_STS_INVAL means the bio was not valid for the underlying device,
+ * e.g. a misaligned direct I/O. Propagate the error without degrading
+ * the array.
+ */
+ if (bio->bi_status == BLK_STS_INVAL) {
+ bio_endio(bio);
+ return;
+ }
+
/*
* If the bio is discard, return an error, but do not
* degrade the array.
@@ -1262,7 +1283,12 @@ static int mirror_end_io(struct dm_target *ti, struct bio *bio,
return DM_ENDIO_DONE;
}
- if (*error == BLK_STS_NOTSUPP)
+ /*
+ * BLK_STS_INVAL means the bio was not valid for the underlying device,
+ * e.g. a misaligned direct I/O. Propagate it rather than failing the
+ * mirror and retrying, which would fail the same way on every leg.
+ */
+ if (*error == BLK_STS_NOTSUPP || *error == BLK_STS_INVAL)
goto out;
if (bio->bi_opf & REQ_RAHEAD)
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/2] dm-raid1: don't fail the mirror for invalid I/O errors
2026-06-16 15:05 [PATCH 1/2] dm-io: clone the source bio instead of copying its biovec Keith Busch
2026-06-16 15:05 ` [PATCH 2/2] dm-raid1: don't fail the mirror for invalid I/O errors Keith Busch
@ 2026-06-16 15:40 ` Keith Busch
2026-06-16 15:58 ` Keith Busch
2 siblings, 0 replies; 15+ messages in thread
From: Keith Busch @ 2026-06-16 15:40 UTC (permalink / raw)
To: dm-devel; +Cc: linux-block, mpatocka, kbusch, linux, vklimovs
From: Keith Busch <kbusch@kernel.org>
BLK_STS_INVAL indicates the I/O request itself was invalid (for example a
misaligned direct I/O), not that the device has failed. dm-raid1 treated
any read or write completion error as a device failure: it failed the
mirror leg, retried on the alternatives - which fail identically - and
eventually returned EIO while spuriously degrading the array.
Since commit 5ff3f74e145a ("block: simplify direct io validity check") the
direct I/O path no longer rejects misaligned buffers up front, so an
invalid bio now reaches the lower block layers, which fail it with
BLK_STS_INVAL. dm-io collapses the block status into a per-region error
bit before invoking the completion callback, so record BLK_STS_INVAL on
the originating bio and have the dm-raid1 read, write and end_io paths
propagate it instead of failing the device.
This mirrors the raid1/raid10 fix in commit f7b24c7b41f23
("md/raid1,raid10: don't fail devices for invalid IO errors") for the
device-mapper mirror target.
Fixes: 7eac33186957 ("iomap: simplify direct io validity check")
Fixes: 5ff3f74e145a ("block: simplify direct io validity check")
Reported-by: Dr. David Alan Gilbert <linux@treblig.org>
Reported-by: Vjaceslavs Klimovs <vklimovs@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
drivers/md/dm-io.c | 14 +++++++++++++-
drivers/md/dm-raid1.c | 28 +++++++++++++++++++++++++++-
2 files changed, 40 insertions(+), 2 deletions(-)
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 28adfeb58f240..f382e9f9be059 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -37,6 +37,7 @@ struct io {
struct dm_io_client *client;
io_notify_fn callback;
void *context;
+ struct bio *orig_bio;
void *vma_invalidate_address;
unsigned long vma_invalidate_size;
} __aligned(DM_IO_MAX_REGIONS);
@@ -132,8 +133,18 @@ static void complete_io(struct io *io)
static void dec_count(struct io *io, unsigned int region, blk_status_t error)
{
- if (error)
+ if (error) {
set_bit(region, &io->error_bits);
+ /*
+ * BLK_STS_INVAL means the bio was not valid for the underlying
+ * device (e.g. a misaligned direct I/O), which is a caller error
+ * rather than a device failure. Record it on the original bio so
+ * bio-based targets can propagate it instead of treating it as a
+ * media error and failing the device.
+ */
+ if (error == BLK_STS_INVAL && io->orig_bio)
+ io->orig_bio->bi_status = error;
+ }
if (atomic_dec_and_test(&io->count))
complete_io(io);
@@ -398,6 +409,7 @@ static void async_io(struct dm_io_client *client, unsigned int num_regions,
io->client = client;
io->callback = fn;
io->context = context;
+ io->orig_bio = dp->orig_bio;
io->vma_invalidate_address = dp->vma_invalidate_address;
io->vma_invalidate_size = dp->vma_invalidate_size;
diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index de5c00704e69c..022ad791c2957 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -524,6 +524,17 @@ static void read_callback(unsigned long error, void *context)
return;
}
+ /*
+ * BLK_STS_INVAL means the bio was not valid for the underlying device,
+ * e.g. a misaligned direct I/O. That is a caller error, not a device
+ * failure, so propagate it rather than failing the mirror and retrying
+ * on the other legs, which would fail the same way.
+ */
+ if (bio->bi_status == BLK_STS_INVAL) {
+ bio_endio(bio);
+ return;
+ }
+
fail_mirror(m, DM_RAID1_READ_ERROR);
if (likely(default_ok(m)) || mirror_available(m->ms, bio)) {
@@ -622,6 +633,16 @@ static void write_callback(unsigned long error, void *context)
return;
}
+ /*
+ * BLK_STS_INVAL means the bio was not valid for the underlying device,
+ * e.g. a misaligned direct I/O. Propagate the error without degrading
+ * the array.
+ */
+ if (bio->bi_status == BLK_STS_INVAL) {
+ bio_endio(bio);
+ return;
+ }
+
/*
* If the bio is discard, return an error, but do not
* degrade the array.
@@ -1262,7 +1283,12 @@ static int mirror_end_io(struct dm_target *ti, struct bio *bio,
return DM_ENDIO_DONE;
}
- if (*error == BLK_STS_NOTSUPP)
+ /*
+ * BLK_STS_INVAL means the bio was not valid for the underlying device,
+ * e.g. a misaligned direct I/O. Propagate it rather than failing the
+ * mirror and retrying, which would fail the same way on every leg.
+ */
+ if (*error == BLK_STS_NOTSUPP || *error == BLK_STS_INVAL)
goto out;
if (bio->bi_opf & REQ_RAHEAD)
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/2] dm-raid1: don't fail the mirror for invalid I/O errors
2026-06-16 15:05 [PATCH 1/2] dm-io: clone the source bio instead of copying its biovec Keith Busch
2026-06-16 15:05 ` [PATCH 2/2] dm-raid1: don't fail the mirror for invalid I/O errors Keith Busch
2026-06-16 15:40 ` Keith Busch
@ 2026-06-16 15:58 ` Keith Busch
2 siblings, 0 replies; 15+ messages in thread
From: Keith Busch @ 2026-06-16 15:58 UTC (permalink / raw)
To: Keith Busch
Cc: dm-devel, linux-block, mpatocka, Dr. David Alan Gilbert,
Vjaceslavs Klimovs
BLK_STS_INVAL indicates the I/O request itself was invalid (for example a
misaligned direct I/O), not that the device has failed. dm-raid1 treated
any read or write completion error as a device failure: it failed the
mirror leg, retried on the alternatives - which fail identically - and
eventually returned EIO while spuriously degrading the array.
Since commit 5ff3f74e145a ("block: simplify direct io validity check") the
direct I/O path no longer rejects misaligned buffers up front, so an
invalid bio now reaches the lower block layers, which fail it with
BLK_STS_INVAL. dm-io collapses the block status into a per-region error
bit before invoking the completion callback, so record BLK_STS_INVAL on
the originating bio and have the dm-raid1 read, write and end_io paths
propagate it instead of failing the device.
This mirrors the raid1/raid10 fix in commit f7b24c7b41f23
("md/raid1,raid10: don't fail devices for invalid IO errors") for the
device-mapper mirror target.
Fixes: 7eac33186957 ("iomap: simplify direct io validity check")
Fixes: 5ff3f74e145a ("block: simplify direct io validity check")
Reported-by: Dr. David Alan Gilbert <linux@treblig.org>
Reported-by: Vjaceslavs Klimovs <vklimovs@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
Resending patch 2/2 from a different machine. For some reason, only 1/2
is getting through with git-send-email, so manually replying to the
thread with the missing second patch.
drivers/md/dm-io.c | 14 +++++++++++++-
drivers/md/dm-raid1.c | 28 +++++++++++++++++++++++++++-
2 files changed, 40 insertions(+), 2 deletions(-)
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 28adfeb58f240..f382e9f9be059 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -37,6 +37,7 @@ struct io {
struct dm_io_client *client;
io_notify_fn callback;
void *context;
+ struct bio *orig_bio;
void *vma_invalidate_address;
unsigned long vma_invalidate_size;
} __aligned(DM_IO_MAX_REGIONS);
@@ -132,8 +133,18 @@ static void complete_io(struct io *io)
static void dec_count(struct io *io, unsigned int region, blk_status_t error)
{
- if (error)
+ if (error) {
set_bit(region, &io->error_bits);
+ /*
+ * BLK_STS_INVAL means the bio was not valid for the underlying
+ * device (e.g. a misaligned direct I/O), which is a caller error
+ * rather than a device failure. Record it on the original bio so
+ * bio-based targets can propagate it instead of treating it as a
+ * media error and failing the device.
+ */
+ if (error == BLK_STS_INVAL && io->orig_bio)
+ io->orig_bio->bi_status = error;
+ }
if (atomic_dec_and_test(&io->count))
complete_io(io);
@@ -398,6 +409,7 @@ static void async_io(struct dm_io_client *client, unsigned int num_regions,
io->client = client;
io->callback = fn;
io->context = context;
+ io->orig_bio = dp->orig_bio;
io->vma_invalidate_address = dp->vma_invalidate_address;
io->vma_invalidate_size = dp->vma_invalidate_size;
diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index de5c00704e69c..022ad791c2957 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -524,6 +524,17 @@ static void read_callback(unsigned long error, void *context)
return;
}
+ /*
+ * BLK_STS_INVAL means the bio was not valid for the underlying device,
+ * e.g. a misaligned direct I/O. That is a caller error, not a device
+ * failure, so propagate it rather than failing the mirror and retrying
+ * on the other legs, which would fail the same way.
+ */
+ if (bio->bi_status == BLK_STS_INVAL) {
+ bio_endio(bio);
+ return;
+ }
+
fail_mirror(m, DM_RAID1_READ_ERROR);
if (likely(default_ok(m)) || mirror_available(m->ms, bio)) {
@@ -622,6 +633,16 @@ static void write_callback(unsigned long error, void *context)
return;
}
+ /*
+ * BLK_STS_INVAL means the bio was not valid for the underlying device,
+ * e.g. a misaligned direct I/O. Propagate the error without degrading
+ * the array.
+ */
+ if (bio->bi_status == BLK_STS_INVAL) {
+ bio_endio(bio);
+ return;
+ }
+
/*
* If the bio is discard, return an error, but do not
* degrade the array.
@@ -1262,7 +1283,12 @@ static int mirror_end_io(struct dm_target *ti, struct bio *bio,
return DM_ENDIO_DONE;
}
- if (*error == BLK_STS_NOTSUPP)
+ /*
+ * BLK_STS_INVAL means the bio was not valid for the underlying device,
+ * e.g. a misaligned direct I/O. Propagate it rather than failing the
+ * mirror and retrying, which would fail the same way on every leg.
+ */
+ if (*error == BLK_STS_NOTSUPP || *error == BLK_STS_INVAL)
goto out;
if (bio->bi_opf & REQ_RAHEAD)
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] dm-raid1: don't fail the mirror for invalid I/O errors
2026-06-16 15:05 ` [PATCH 2/2] dm-raid1: don't fail the mirror for invalid I/O errors Keith Busch
@ 2026-06-16 17:54 ` Dr. David Alan Gilbert
2026-06-16 18:48 ` Keith Busch
0 siblings, 1 reply; 15+ messages in thread
From: Dr. David Alan Gilbert @ 2026-06-16 17:54 UTC (permalink / raw)
To: Keith Busch
Cc: dm-devel, linux-block, mpatocka, Keith Busch, Vjaceslavs Klimovs
* Keith Busch (kbusch@meta.com) wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> BLK_STS_INVAL indicates the I/O request itself was invalid (for example a
> misaligned direct I/O), not that the device has failed. dm-raid1 treated
> any read or write completion error as a device failure: it failed the
> mirror leg, retried on the alternatives - which fail identically - and
> eventually returned EIO while spuriously degrading the array.
>
> Since commit 5ff3f74e145a ("block: simplify direct io validity check") the
> direct I/O path no longer rejects misaligned buffers up front, so an
> invalid bio now reaches the lower block layers, which fail it with
> BLK_STS_INVAL. dm-io collapses the block status into a per-region error
> bit before invoking the completion callback, so record BLK_STS_INVAL on
> the originating bio and have the dm-raid1 read, write and end_io paths
> propagate it instead of failing the device.
>
> This mirrors the raid1/raid10 fix in commit f7b24c7b41f23
> ("md/raid1,raid10: don't fail devices for invalid IO errors") for the
> device-mapper mirror target.
>
> Fixes: 7eac33186957 ("iomap: simplify direct io validity check")
> Fixes: 5ff3f74e145a ("block: simplify direct io validity check")
> Reported-by: Dr. David Alan Gilbert <linux@treblig.org>
> Reported-by: Vjaceslavs Klimovs <vklimovs@gmail.com>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
OK, for this pair I think would be fair for a Tested-by me as well;
they certainly resolve the hang and the WARN/BUGs.
I still see the errors as EIO on my tests, and on the older mirror type
get the stuck resync on write, and on the newer mirror I see the write
apparently succeed (did it really?)
I suggest given the BUG/WARN and the number of people who've tripped over
this, and it's triggerable as a normal user, that it's a candidate for stable.
Thanks again,
Dave
> ---
> drivers/md/dm-io.c | 14 +++++++++++++-
> drivers/md/dm-raid1.c | 28 +++++++++++++++++++++++++++-
> 2 files changed, 40 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
> index 28adfeb58f240..f382e9f9be059 100644
> --- a/drivers/md/dm-io.c
> +++ b/drivers/md/dm-io.c
> @@ -37,6 +37,7 @@ struct io {
> struct dm_io_client *client;
> io_notify_fn callback;
> void *context;
> + struct bio *orig_bio;
> void *vma_invalidate_address;
> unsigned long vma_invalidate_size;
> } __aligned(DM_IO_MAX_REGIONS);
> @@ -132,8 +133,18 @@ static void complete_io(struct io *io)
>
> static void dec_count(struct io *io, unsigned int region, blk_status_t error)
> {
> - if (error)
> + if (error) {
> set_bit(region, &io->error_bits);
> + /*
> + * BLK_STS_INVAL means the bio was not valid for the underlying
> + * device (e.g. a misaligned direct I/O), which is a caller error
> + * rather than a device failure. Record it on the original bio so
> + * bio-based targets can propagate it instead of treating it as a
> + * media error and failing the device.
> + */
> + if (error == BLK_STS_INVAL && io->orig_bio)
> + io->orig_bio->bi_status = error;
> + }
>
> if (atomic_dec_and_test(&io->count))
> complete_io(io);
> @@ -398,6 +409,7 @@ static void async_io(struct dm_io_client *client, unsigned int num_regions,
> io->client = client;
> io->callback = fn;
> io->context = context;
> + io->orig_bio = dp->orig_bio;
>
> io->vma_invalidate_address = dp->vma_invalidate_address;
> io->vma_invalidate_size = dp->vma_invalidate_size;
> diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
> index de5c00704e69c..022ad791c2957 100644
> --- a/drivers/md/dm-raid1.c
> +++ b/drivers/md/dm-raid1.c
> @@ -524,6 +524,17 @@ static void read_callback(unsigned long error, void *context)
> return;
> }
>
> + /*
> + * BLK_STS_INVAL means the bio was not valid for the underlying device,
> + * e.g. a misaligned direct I/O. That is a caller error, not a device
> + * failure, so propagate it rather than failing the mirror and retrying
> + * on the other legs, which would fail the same way.
> + */
> + if (bio->bi_status == BLK_STS_INVAL) {
> + bio_endio(bio);
> + return;
> + }
> +
> fail_mirror(m, DM_RAID1_READ_ERROR);
>
> if (likely(default_ok(m)) || mirror_available(m->ms, bio)) {
> @@ -622,6 +633,16 @@ static void write_callback(unsigned long error, void *context)
> return;
> }
>
> + /*
> + * BLK_STS_INVAL means the bio was not valid for the underlying device,
> + * e.g. a misaligned direct I/O. Propagate the error without degrading
> + * the array.
> + */
> + if (bio->bi_status == BLK_STS_INVAL) {
> + bio_endio(bio);
> + return;
> + }
> +
> /*
> * If the bio is discard, return an error, but do not
> * degrade the array.
> @@ -1262,7 +1283,12 @@ static int mirror_end_io(struct dm_target *ti, struct bio *bio,
> return DM_ENDIO_DONE;
> }
>
> - if (*error == BLK_STS_NOTSUPP)
> + /*
> + * BLK_STS_INVAL means the bio was not valid for the underlying device,
> + * e.g. a misaligned direct I/O. Propagate it rather than failing the
> + * mirror and retrying, which would fail the same way on every leg.
> + */
> + if (*error == BLK_STS_NOTSUPP || *error == BLK_STS_INVAL)
> goto out;
>
> if (bio->bi_opf & REQ_RAHEAD)
> --
> 2.52.0
>
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] dm-raid1: don't fail the mirror for invalid I/O errors
2026-06-16 17:54 ` Dr. David Alan Gilbert
@ 2026-06-16 18:48 ` Keith Busch
2026-06-16 20:09 ` Dr. David Alan Gilbert
0 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2026-06-16 18:48 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Keith Busch, dm-devel, linux-block, mpatocka, Vjaceslavs Klimovs
On Tue, Jun 16, 2026 at 05:54:28PM +0000, Dr. David Alan Gilbert wrote:
> OK, for this pair I think would be fair for a Tested-by me as well;
> they certainly resolve the hang and the WARN/BUGs.
> I still see the errors as EIO on my tests, and on the older mirror type
Could you share your reproducer? I'm just using the original recipe you
sent here:
https://lore.kernel.org/linux-block/ai7rnH20IYeSmY8s@gallifrey/
And I'm seeing EINVAL instead EIO.
> get the stuck resync on write, and on the newer mirror I see the write
> apparently succeed (did it really?)
There was a time when ext4 used to fallback to buffered io for writes
but not for reads, but looks like that was fixed since 6.18, so should
be returning error.
I tried testing it with a modification to your original read test, and
it is still failing with EINVAL for me:
pread of 4096 said: -1 (Invalid argument)
pwrite of 4096 said: -1 (Invalid argument)
---
#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
const char* path="/mnt/tmp/testfile";
static char buf[8192];
int main()
{
int fd=open(path, O_RDWR|O_DIRECT|O_CLOEXEC);
errno=0;
int res3=pread(fd, buf, 4096, 0);
printf("pread of 4096 said: %d (%m)\n", res3);
res3=pwrite(fd, buf, 4096, 0);
printf("pwrite of 4096 said: %d (%m)\n", res3);
}
--
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] dm-raid1: don't fail the mirror for invalid I/O errors
2026-06-16 18:48 ` Keith Busch
@ 2026-06-16 20:09 ` Dr. David Alan Gilbert
2026-06-16 23:45 ` Keith Busch
2026-06-17 15:08 ` Keith Busch
0 siblings, 2 replies; 15+ messages in thread
From: Dr. David Alan Gilbert @ 2026-06-16 20:09 UTC (permalink / raw)
To: Keith Busch
Cc: Keith Busch, dm-devel, linux-block, mpatocka, Vjaceslavs Klimovs
* Keith Busch (kbusch@kernel.org) wrote:
> On Tue, Jun 16, 2026 at 05:54:28PM +0000, Dr. David Alan Gilbert wrote:
> > OK, for this pair I think would be fair for a Tested-by me as well;
> > they certainly resolve the hang and the WARN/BUGs.
> > I still see the errors as EIO on my tests, and on the older mirror type
>
> Could you share your reproducer? I'm just using the original recipe you
> sent here:
>
> https://lore.kernel.org/linux-block/ai7rnH20IYeSmY8s@gallifrey/
>
> And I'm seeing EINVAL instead EIO.
Interesting; I've got your:
dm-raid1: don't fail the mirror for invalid I/O errors
For DM_IO_BIO requests, do_region() built each destination bio by walking..
ontop of e21ee273e6fa3879aec9a27251cfce98156e07c4 which is just before 7.1
I've not your https://lore.kernel.org/linux-block/20260612223205.465913-1-kbusch@meta.com/
root@dalek:/home/dg# lvcreate --mirrors 1 -L 1G main /dev/sda2 /dev/sdb2
root@dalek:/home/dg# mkfs.ext4 /dev/mapper/main-lvol0
root@dalek:/home/dg# mount /dev/mapper/main-lvol0 /mnt/tmp/
root@dalek:/home/dg# chmod a+rwx /mnt/tmp
dg@dalek:~$ dd if=/dev/zero of=/mnt/tmp/testfile bs=1024k count=1
my two tests are separate tests:
{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}
dg@dalek:~$ cat dbf.c
#include <errno.h>
#include <fcntl.h>
#include <asm-generic/fcntl.h>
#include <stdio.h>
#include <unistd.h>
const char* path="/mnt/tmp/testfile";
static char buf[8192];
int main()
{
int fd=open(path, O_RDWR|O_DIRECT|O_CLOEXEC);
errno=0;
int res3=pread(fd, buf, 4096, 0);
printf("pread of 4096 said: %d (%m)\n", res3);
}
{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}
dg@dalek:~$ cat dbf-write.c
#include <errno.h>
#include <fcntl.h>
#include <asm-generic/fcntl.h>
#include <stdio.h>
#include <unistd.h>
const char* path="/mnt/tmp/testfile";
static char buf[8192];
int main()
{
int fd=open(path, O_RDWR|O_DIRECT|O_CLOEXEC);
errno=0;
int res3=pwrite(fd, buf, 4096, 0);
printf("pwrite of 4096 said: %d (%m)\n", res3);
}
{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}{--}
> > get the stuck resync on write, and on the newer mirror I see the write
> > apparently succeed (did it really?)
>
> There was a time when ext4 used to fallback to buffered io for writes
> but not for reads, but looks like that was fixed since 6.18, so should
> be returning error.
>
> I tried testing it with a modification to your original read test, and
> it is still failing with EINVAL for me:
>
> pread of 4096 said: -1 (Invalid argument)
> pwrite of 4096 said: -1 (Invalid argument)
Your double test gives me:
dg@dalek:~$ ./dbf-joint
pread of 4096 said: -1 (Input/output error)
pwrite of 4096 said: 4096 (Input/output error)
> ---
> #define _GNU_SOURCE
> #include <errno.h>
> #include <fcntl.h>
> #include <stdio.h>
> #include <unistd.h>
>
> const char* path="/mnt/tmp/testfile";
> static char buf[8192];
>
> int main()
> {
> int fd=open(path, O_RDWR|O_DIRECT|O_CLOEXEC);
>
> errno=0;
> int res3=pread(fd, buf, 4096, 0);
> printf("pread of 4096 said: %d (%m)\n", res3);
errno=0;
> res3=pwrite(fd, buf, 4096, 0);
> printf("pwrite of 4096 said: %d (%m)\n", res3);
> }
Dave
> --
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] dm-raid1: don't fail the mirror for invalid I/O errors
2026-06-16 20:09 ` Dr. David Alan Gilbert
@ 2026-06-16 23:45 ` Keith Busch
2026-06-16 23:47 ` Dr. David Alan Gilbert
2026-06-17 15:08 ` Keith Busch
1 sibling, 1 reply; 15+ messages in thread
From: Keith Busch @ 2026-06-16 23:45 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Keith Busch, dm-devel, linux-block, mpatocka, Vjaceslavs Klimovs
On Tue, Jun 16, 2026 at 08:09:18PM +0000, Dr. David Alan Gilbert wrote:
> Interesting; I've got your:
> dm-raid1: don't fail the mirror for invalid I/O errors
> For DM_IO_BIO requests, do_region() built each destination bio by walking..
> ontop of e21ee273e6fa3879aec9a27251cfce98156e07c4 which is just before 7.1
> I've not your https://lore.kernel.org/linux-block/20260612223205.465913-1-kbusch@meta.com/
>
> root@dalek:/home/dg# lvcreate --mirrors 1 -L 1G main /dev/sda2 /dev/sdb2
> root@dalek:/home/dg# mkfs.ext4 /dev/mapper/main-lvol0
> root@dalek:/home/dg# mount /dev/mapper/main-lvol0 /mnt/tmp/
> root@dalek:/home/dg# chmod a+rwx /mnt/tmp
>
> dg@dalek:~$ dd if=/dev/zero of=/mnt/tmp/testfile bs=1024k count=1
>
> my two tests are separate tests:
Goodness, I'm struggling here. Unpatched, I have no problem recreating
the issues you've described, but everything I've tried with the
proposals included gets the expected results. I'm running out of ideas
on replicating your results with hardware I have, but it's getting late,
so I'll try to have new ideas tomorrow.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] dm-raid1: don't fail the mirror for invalid I/O errors
2026-06-16 23:45 ` Keith Busch
@ 2026-06-16 23:47 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 15+ messages in thread
From: Dr. David Alan Gilbert @ 2026-06-16 23:47 UTC (permalink / raw)
To: Keith Busch
Cc: Keith Busch, dm-devel, linux-block, mpatocka, Vjaceslavs Klimovs
* Keith Busch (kbusch@kernel.org) wrote:
> On Tue, Jun 16, 2026 at 08:09:18PM +0000, Dr. David Alan Gilbert wrote:
> > Interesting; I've got your:
> > dm-raid1: don't fail the mirror for invalid I/O errors
> > For DM_IO_BIO requests, do_region() built each destination bio by walking..
> > ontop of e21ee273e6fa3879aec9a27251cfce98156e07c4 which is just before 7.1
> > I've not your https://lore.kernel.org/linux-block/20260612223205.465913-1-kbusch@meta.com/
> >
> > root@dalek:/home/dg# lvcreate --mirrors 1 -L 1G main /dev/sda2 /dev/sdb2
> > root@dalek:/home/dg# mkfs.ext4 /dev/mapper/main-lvol0
> > root@dalek:/home/dg# mount /dev/mapper/main-lvol0 /mnt/tmp/
> > root@dalek:/home/dg# chmod a+rwx /mnt/tmp
> >
> > dg@dalek:~$ dd if=/dev/zero of=/mnt/tmp/testfile bs=1024k count=1
> >
> > my two tests are separate tests:
>
> Goodness, I'm struggling here. Unpatched, I have no problem recreating
> the issues you've described, but everything I've tried with the
> proposals included gets the expected results. I'm running out of ideas
> on replicating your results with hardware I have, but it's getting late,
> so I'll try to have new ideas tomorrow.
OK, no problem - let me know if there's any useful diags I can gather;
would blktrace or function tracing or something help?
Sleep tight!
Dave
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] dm-raid1: don't fail the mirror for invalid I/O errors
2026-06-16 20:09 ` Dr. David Alan Gilbert
2026-06-16 23:45 ` Keith Busch
@ 2026-06-17 15:08 ` Keith Busch
2026-06-17 15:33 ` Dr. David Alan Gilbert
1 sibling, 1 reply; 15+ messages in thread
From: Keith Busch @ 2026-06-17 15:08 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Keith Busch, dm-devel, linux-block, mpatocka, Vjaceslavs Klimovs
On Tue, Jun 16, 2026 at 08:09:18PM +0000, Dr. David Alan Gilbert wrote:
> root@dalek:/home/dg# lvcreate --mirrors 1 -L 1G main /dev/sda2 /dev/sdb2
So this is a subtle difference from your original report which ran
lvcreate a little differently:
# lvcreate --type mirror --mirrors 1 -L 1G main /dev/sda2 /dev/sdb2
This patch series address problems with the original report with the
"--type mirror" parameter, which uses dm-raid1.c instead of md/raid1.c.
Knowing that detail makes this a trivial matter to fix now, so I'll send
a separate patch for that. But this series should be good to go for the
original issue on the legacy dm mirror.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] dm-raid1: don't fail the mirror for invalid I/O errors
2026-06-17 15:08 ` Keith Busch
@ 2026-06-17 15:33 ` Dr. David Alan Gilbert
2026-06-17 16:21 ` Keith Busch
0 siblings, 1 reply; 15+ messages in thread
From: Dr. David Alan Gilbert @ 2026-06-17 15:33 UTC (permalink / raw)
To: Keith Busch
Cc: Keith Busch, dm-devel, linux-block, mpatocka, Vjaceslavs Klimovs
* Keith Busch (kbusch@kernel.org) wrote:
> On Tue, Jun 16, 2026 at 08:09:18PM +0000, Dr. David Alan Gilbert wrote:
> > root@dalek:/home/dg# lvcreate --mirrors 1 -L 1G main /dev/sda2 /dev/sdb2
>
> So this is a subtle difference from your original report which ran
> lvcreate a little differently:
>
> # lvcreate --type mirror --mirrors 1 -L 1G main /dev/sda2 /dev/sdb2
>
> This patch series address problems with the original report with the
> "--type mirror" parameter, which uses dm-raid1.c instead of md/raid1.c.
Ah OK.
(I think I think I did say that somewhere, hmm ajFK5NXkxd6jU5zu@gallifrey ? )
> Knowing that detail makes this a trivial matter to fix now, so I'll send
> a separate patch for that. But this series should be good to go for the
> original issue on the legacy dm mirror.
Great! Thanks again,
Dave
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] dm-raid1: don't fail the mirror for invalid I/O errors
2026-06-17 15:33 ` Dr. David Alan Gilbert
@ 2026-06-17 16:21 ` Keith Busch
2026-06-17 16:44 ` Dr. David Alan Gilbert
0 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2026-06-17 16:21 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Keith Busch, dm-devel, linux-block, mpatocka, Vjaceslavs Klimovs
On Wed, Jun 17, 2026 at 03:33:55PM +0000, Dr. David Alan Gilbert wrote:
> * Keith Busch (kbusch@kernel.org) wrote:
> > On Tue, Jun 16, 2026 at 08:09:18PM +0000, Dr. David Alan Gilbert wrote:
> > > root@dalek:/home/dg# lvcreate --mirrors 1 -L 1G main /dev/sda2 /dev/sdb2
> >
> > So this is a subtle difference from your original report which ran
> > lvcreate a little differently:
> >
> > # lvcreate --type mirror --mirrors 1 -L 1G main /dev/sda2 /dev/sdb2
> >
> > This patch series address problems with the original report with the
> > "--type mirror" parameter, which uses dm-raid1.c instead of md/raid1.c.
>
> Ah OK.
> (I think I think I did say that somewhere, hmm ajFK5NXkxd6jU5zu@gallifrey ? )
I see. This will fix that setup:
---
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 5b9368bd9e700..17a5f0d98aacc 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -322,7 +322,9 @@ static void call_bio_endio(struct r1bio *r1_bio)
{
struct bio *bio = r1_bio->master_bio;
- if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
+ if (test_bit(R1BIO_Invalid, &r1_bio->state))
+ bio->bi_status = BLK_STS_INVAL;
+ else if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
bio->bi_status = BLK_STS_IOERR;
bio_endio(bio);
@@ -403,6 +405,8 @@ static void raid1_end_read_request(struct bio *bio)
;
} else if (!raid1_should_handle_error(bio)) {
uptodate = 1;
+ if (bio->bi_status == BLK_STS_INVAL)
+ set_bit(R1BIO_Invalid, &r1_bio->state);
} else {
/* If all other devices have failed, we want to return
* the error upwards rather than fail the last device.
@@ -519,6 +523,14 @@ static void raid1_end_write_request(struct bio *bio)
*/
r1_bio->bios[mirror] = NULL;
to_put = bio;
+ /*
+ * An invalid I/O (e.g. a misaligned bio rejected by the lower
+ * device) was ignored above rather than faulting the device.
+ * It is not a successful write, though, so report the error to
+ * the caller instead of completing the master bio as uptodate.
+ */
+ if (bio->bi_status == BLK_STS_INVAL)
+ set_bit(R1BIO_Invalid, &r1_bio->state);
/*
* Do not set R1BIO_Uptodate if the current device is
* rebuilding or Faulty. This is because we cannot use
diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
index c98d43a7ae993..21e837db5b25e 100644
--- a/drivers/md/raid1.h
+++ b/drivers/md/raid1.h
@@ -184,6 +184,12 @@ enum r1bio_state {
R1BIO_MadeGood,
R1BIO_WriteError,
R1BIO_FailFast,
+/* An invalid I/O (e.g. a bio rejected by the lower device because it does
+ * not meet that device's dma_alignment) is not a device failure. Report
+ * the error to the caller without faulting the device or retrying, and do
+ * not complete a write as if it had succeeded.
+ */
+ R1BIO_Invalid,
};
static inline int sector_to_idx(sector_t sector)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index cee5a253a281d..3cee9612be26d 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -323,7 +323,9 @@ static void raid_end_bio_io(struct r10bio *r10_bio)
struct r10conf *conf = r10_bio->mddev->private;
if (!test_and_set_bit(R10BIO_Returned, &r10_bio->state)) {
- if (!test_bit(R10BIO_Uptodate, &r10_bio->state))
+ if (test_bit(R10BIO_Invalid, &r10_bio->state))
+ bio->bi_status = BLK_STS_INVAL;
+ else if (!test_bit(R10BIO_Uptodate, &r10_bio->state))
bio->bi_status = BLK_STS_IOERR;
bio_endio(bio);
}
@@ -403,6 +405,8 @@ static void raid10_end_read_request(struct bio *bio)
set_bit(R10BIO_Uptodate, &r10_bio->state);
} else if (!raid1_should_handle_error(bio)) {
uptodate = 1;
+ if (bio->bi_status == BLK_STS_INVAL)
+ set_bit(R10BIO_Invalid, &r10_bio->state);
} else {
/* If all other devices that store this block have
* failed, we want to return the error upwards rather
@@ -523,6 +527,8 @@ static void raid10_end_write_request(struct bio *bio)
* before rdev->recovery_offset, but for simplicity we don't
* check this here.
*/
+ if (bio->bi_status == BLK_STS_INVAL)
+ set_bit(R10BIO_Invalid, &r10_bio->state);
if (test_bit(In_sync, &rdev->flags) &&
!test_bit(Faulty, &rdev->flags))
set_bit(R10BIO_Uptodate, &r10_bio->state);
diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h
index ec79d87fb92f6..a1adad3acafe1 100644
--- a/drivers/md/raid10.h
+++ b/drivers/md/raid10.h
@@ -175,5 +175,11 @@ enum r10bio_state {
/* failfast devices did receive failfast requests. */
R10BIO_FailFast,
R10BIO_Discard,
+/* An invalid I/O (e.g. a bio rejected by the lower device because it does not
+ * meet that device's queue_limits) is not a device failure. Report the error
+ * to the caller without faulting the device or retrying, and do not complete a
+ * write as if it had succeeded.
+ */
+ R10BIO_Invalid,
};
#endif
--
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] dm-raid1: don't fail the mirror for invalid I/O errors
2026-06-17 16:21 ` Keith Busch
@ 2026-06-17 16:44 ` Dr. David Alan Gilbert
2026-06-17 16:54 ` Keith Busch
0 siblings, 1 reply; 15+ messages in thread
From: Dr. David Alan Gilbert @ 2026-06-17 16:44 UTC (permalink / raw)
To: Keith Busch
Cc: Keith Busch, dm-devel, linux-block, mpatocka, Vjaceslavs Klimovs
* Keith Busch (kbusch@kernel.org) wrote:
> On Wed, Jun 17, 2026 at 03:33:55PM +0000, Dr. David Alan Gilbert wrote:
> > * Keith Busch (kbusch@kernel.org) wrote:
> > > On Tue, Jun 16, 2026 at 08:09:18PM +0000, Dr. David Alan Gilbert wrote:
> > > > root@dalek:/home/dg# lvcreate --mirrors 1 -L 1G main /dev/sda2 /dev/sdb2
> > >
> > > So this is a subtle difference from your original report which ran
> > > lvcreate a little differently:
> > >
> > > # lvcreate --type mirror --mirrors 1 -L 1G main /dev/sda2 /dev/sdb2
> > >
> > > This patch series address problems with the original report with the
> > > "--type mirror" parameter, which uses dm-raid1.c instead of md/raid1.c.
> >
> > Ah OK.
> > (I think I think I did say that somewhere, hmm ajFK5NXkxd6jU5zu@gallifrey ? )
>
> I see. This will fix that setup:
And it does;
dg@dalek:~$ ./dbf
pread of 4096 said: -1 (Invalid argument)
dg@dalek:~$ ./dbf-write
pwrite of 4096 said: -1 (Invalid argument)
dg@dalek:~$ ./dbf-joint
pread of 4096 said: -1 (Invalid argument)
pwrite of 4096 said: -1 (Invalid argument)
and the log is clean.
Tested-by: Dr. David Alan Gilbert <linux@treblig.org>
(It's a bit scary you're having to go around quite
a few places and make similar fixes; I assume there
are others that do similar things).
Thanks again,
Dave
>
> ---
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 5b9368bd9e700..17a5f0d98aacc 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -322,7 +322,9 @@ static void call_bio_endio(struct r1bio *r1_bio)
> {
> struct bio *bio = r1_bio->master_bio;
>
> - if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
> + if (test_bit(R1BIO_Invalid, &r1_bio->state))
> + bio->bi_status = BLK_STS_INVAL;
> + else if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
> bio->bi_status = BLK_STS_IOERR;
>
> bio_endio(bio);
> @@ -403,6 +405,8 @@ static void raid1_end_read_request(struct bio *bio)
> ;
> } else if (!raid1_should_handle_error(bio)) {
> uptodate = 1;
> + if (bio->bi_status == BLK_STS_INVAL)
> + set_bit(R1BIO_Invalid, &r1_bio->state);
> } else {
> /* If all other devices have failed, we want to return
> * the error upwards rather than fail the last device.
> @@ -519,6 +523,14 @@ static void raid1_end_write_request(struct bio *bio)
> */
> r1_bio->bios[mirror] = NULL;
> to_put = bio;
> + /*
> + * An invalid I/O (e.g. a misaligned bio rejected by the lower
> + * device) was ignored above rather than faulting the device.
> + * It is not a successful write, though, so report the error to
> + * the caller instead of completing the master bio as uptodate.
> + */
> + if (bio->bi_status == BLK_STS_INVAL)
> + set_bit(R1BIO_Invalid, &r1_bio->state);
> /*
> * Do not set R1BIO_Uptodate if the current device is
> * rebuilding or Faulty. This is because we cannot use
> diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
> index c98d43a7ae993..21e837db5b25e 100644
> --- a/drivers/md/raid1.h
> +++ b/drivers/md/raid1.h
> @@ -184,6 +184,12 @@ enum r1bio_state {
> R1BIO_MadeGood,
> R1BIO_WriteError,
> R1BIO_FailFast,
> +/* An invalid I/O (e.g. a bio rejected by the lower device because it does
> + * not meet that device's dma_alignment) is not a device failure. Report
> + * the error to the caller without faulting the device or retrying, and do
> + * not complete a write as if it had succeeded.
> + */
> + R1BIO_Invalid,
> };
>
> static inline int sector_to_idx(sector_t sector)
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index cee5a253a281d..3cee9612be26d 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -323,7 +323,9 @@ static void raid_end_bio_io(struct r10bio *r10_bio)
> struct r10conf *conf = r10_bio->mddev->private;
>
> if (!test_and_set_bit(R10BIO_Returned, &r10_bio->state)) {
> - if (!test_bit(R10BIO_Uptodate, &r10_bio->state))
> + if (test_bit(R10BIO_Invalid, &r10_bio->state))
> + bio->bi_status = BLK_STS_INVAL;
> + else if (!test_bit(R10BIO_Uptodate, &r10_bio->state))
> bio->bi_status = BLK_STS_IOERR;
> bio_endio(bio);
> }
> @@ -403,6 +405,8 @@ static void raid10_end_read_request(struct bio *bio)
> set_bit(R10BIO_Uptodate, &r10_bio->state);
> } else if (!raid1_should_handle_error(bio)) {
> uptodate = 1;
> + if (bio->bi_status == BLK_STS_INVAL)
> + set_bit(R10BIO_Invalid, &r10_bio->state);
> } else {
> /* If all other devices that store this block have
> * failed, we want to return the error upwards rather
> @@ -523,6 +527,8 @@ static void raid10_end_write_request(struct bio *bio)
> * before rdev->recovery_offset, but for simplicity we don't
> * check this here.
> */
> + if (bio->bi_status == BLK_STS_INVAL)
> + set_bit(R10BIO_Invalid, &r10_bio->state);
> if (test_bit(In_sync, &rdev->flags) &&
> !test_bit(Faulty, &rdev->flags))
> set_bit(R10BIO_Uptodate, &r10_bio->state);
> diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h
> index ec79d87fb92f6..a1adad3acafe1 100644
> --- a/drivers/md/raid10.h
> +++ b/drivers/md/raid10.h
> @@ -175,5 +175,11 @@ enum r10bio_state {
> /* failfast devices did receive failfast requests. */
> R10BIO_FailFast,
> R10BIO_Discard,
> +/* An invalid I/O (e.g. a bio rejected by the lower device because it does not
> + * meet that device's queue_limits) is not a device failure. Report the error
> + * to the caller without faulting the device or retrying, and do not complete a
> + * write as if it had succeeded.
> + */
> + R10BIO_Invalid,
> };
> #endif
> --
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] dm-raid1: don't fail the mirror for invalid I/O errors
2026-06-17 16:44 ` Dr. David Alan Gilbert
@ 2026-06-17 16:54 ` Keith Busch
2026-06-17 16:59 ` Dr. David Alan Gilbert
0 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2026-06-17 16:54 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Keith Busch, dm-devel, linux-block, mpatocka, Vjaceslavs Klimovs
On Wed, Jun 17, 2026 at 04:44:35PM +0000, Dr. David Alan Gilbert wrote:
> (It's a bit scary you're having to go around quite
> a few places and make similar fixes; I assume there
> are others that do similar things).
Yes, I understand that. I'm looking into a common way to validate this.
The md raid doesn't have this problem because they always call
bio_split_to_limits() first, but that's not an optimal thing to do for
dm raid in the normal read/write path, so perhaps a common checker needs
to happen generically in the block layer. Yeah, I know I removed the
previous higher level validation ... I'll try find something less costly
than what we had before.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] dm-raid1: don't fail the mirror for invalid I/O errors
2026-06-17 16:54 ` Keith Busch
@ 2026-06-17 16:59 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 15+ messages in thread
From: Dr. David Alan Gilbert @ 2026-06-17 16:59 UTC (permalink / raw)
To: Keith Busch, regressions
Cc: Keith Busch, dm-devel, linux-block, mpatocka, Vjaceslavs Klimovs
* Keith Busch (kbusch@kernel.org) wrote:
> On Wed, Jun 17, 2026 at 04:44:35PM +0000, Dr. David Alan Gilbert wrote:
> > (It's a bit scary you're having to go around quite
> > a few places and make similar fixes; I assume there
> > are others that do similar things).
>
> Yes, I understand that. I'm looking into a common way to validate this.
> The md raid doesn't have this problem because they always call
> bio_split_to_limits() first, but that's not an optimal thing to do for
> dm raid in the normal read/write path, so perhaps a common checker needs
> to happen generically in the block layer. Yeah, I know I removed the
> previous higher level validation ... I'll try find something less costly
> than what we had before.
OK, thanks again
(and to Thomas for gluing my query to those other two which got this
moving!)
Dave.
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-06-17 16:59 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-16 15:05 [PATCH 1/2] dm-io: clone the source bio instead of copying its biovec Keith Busch
2026-06-16 15:05 ` [PATCH 2/2] dm-raid1: don't fail the mirror for invalid I/O errors Keith Busch
2026-06-16 17:54 ` Dr. David Alan Gilbert
2026-06-16 18:48 ` Keith Busch
2026-06-16 20:09 ` Dr. David Alan Gilbert
2026-06-16 23:45 ` Keith Busch
2026-06-16 23:47 ` Dr. David Alan Gilbert
2026-06-17 15:08 ` Keith Busch
2026-06-17 15:33 ` Dr. David Alan Gilbert
2026-06-17 16:21 ` Keith Busch
2026-06-17 16:44 ` Dr. David Alan Gilbert
2026-06-17 16:54 ` Keith Busch
2026-06-17 16:59 ` Dr. David Alan Gilbert
2026-06-16 15:40 ` Keith Busch
2026-06-16 15:58 ` Keith Busch
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.