* [PATCH v2 1/3] md/raid1,raid10: fix deadlock in read error recovery path
2026-05-01 11:46 [PATCH v2 0/3] md/raid1,raid10: fix deadlock and bio accounting in read error path Abd-Alrhman Masalkhi
@ 2026-05-01 11:46 ` Abd-Alrhman Masalkhi
2026-05-01 11:46 ` [PATCH v2 2/3] md/raid1,raid10: fix error-path detection with md_cloned_bio() Abd-Alrhman Masalkhi
2026-05-01 11:46 ` [PATCH v2 3/3] md/raid1,raid10: fix bio accounting for split md cloned bios Abd-Alrhman Masalkhi
2 siblings, 0 replies; 4+ messages in thread
From: Abd-Alrhman Masalkhi @ 2026-05-01 11:46 UTC (permalink / raw)
To: song, yukuai, xni, neilb, shli
Cc: linux-raid, linux-kernel, Abd-Alrhman Masalkhi
raid1d and raid10d may resubmit a split md cloned bio while handling
a read error. In this case, resubmitting the bio can lead to a deadlock
if the array is suspended before md_handle_request() acquires an
active_io reference via percpu_ref_tryget_live().
Since the cloned bio already holds an active_io reference,
trying to acquire another reference via percpu_ref_tryget_live()
can lead to a deadlock while the array is suspended.
Fix this by using percpu_ref_get() for md cloned bios.
Fixes: bb2a9acefaf9 ("md/raid1: switch to use md_account_bio() for io accounting")
Fixes: 820455238366 ("md/raid10: switch to use md_account_bio() for io accounting")
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
---
Changes in v2:
- Use md_cloned_bio() consistently to detect cloned bios.
- Recognize that raid10 has the same issue and fix it in this series
- Allow splitting bios.
- Handle md cloned bios explicitly in md_handle_request()
- Link v1: https://lore.kernel.org/linux-raid/20260427103446.300378-1-abd.masalkhi@gmail.com/
Please let me know if I should add a Suggested-by tag for Yu Kuai,
as the solution approach was suggested during review.
Link to Yu Kuai' email: https://lore.kernel.org/linux-raid/m2lde74dtw.fsf@gmail.com/T/#m714020a38b60fc5f84b9a24f0c46acbe5d7342d6
Thanks
Abd-alrhman
---
drivers/md/md.c | 25 ++++++++++++++++---------
drivers/md/md.h | 5 +++++
2 files changed, 21 insertions(+), 9 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index e926aef9ec43..96db1e7850e9 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -396,17 +396,24 @@ static bool is_suspended(struct mddev *mddev, struct bio *bio)
bool md_handle_request(struct mddev *mddev, struct bio *bio)
{
check_suspended:
- if (is_suspended(mddev, bio)) {
- /* Bail out if REQ_NOWAIT is set for the bio */
- if (bio->bi_opf & REQ_NOWAIT) {
- bio_wouldblock_error(bio);
- return true;
+ if (unlikely(md_cloned_bio(mddev, bio))) {
+ /*
+ * This bio is an MD cloned bio and already holds an
+ * active_io reference, so percpu_ref_get() is safe here.
+ */
+ percpu_ref_get(&mddev->active_io);
+ } else {
+ if (is_suspended(mddev, bio)) {
+ /* Bail out if REQ_NOWAIT is set for the bio */
+ if (bio->bi_opf & REQ_NOWAIT) {
+ bio_wouldblock_error(bio);
+ return true;
+ }
+ wait_event(mddev->sb_wait, !is_suspended(mddev, bio));
}
- wait_event(mddev->sb_wait, !is_suspended(mddev, bio));
+ if (!percpu_ref_tryget_live(&mddev->active_io))
+ goto check_suspended;
}
- if (!percpu_ref_tryget_live(&mddev->active_io))
- goto check_suspended;
-
if (!mddev->pers->make_request(mddev, bio)) {
percpu_ref_put(&mddev->active_io);
if (mddev_is_dm(mddev) && mddev->pers->prepare_suspend)
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 3bfbee595156..e44074d30cf9 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -1038,6 +1038,11 @@ void mddev_update_io_opt(struct mddev *mddev, unsigned int nr_stripes);
extern const struct block_device_operations md_fops;
+static inline bool md_cloned_bio(struct mddev *mddev, struct bio *bio)
+{
+ return bio->bi_pool == &mddev->io_clone_set;
+}
+
/*
* MD devices can be used undeneath by DM, in which case ->gendisk is NULL.
*/
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* [PATCH v2 2/3] md/raid1,raid10: fix error-path detection with md_cloned_bio()
2026-05-01 11:46 [PATCH v2 0/3] md/raid1,raid10: fix deadlock and bio accounting in read error path Abd-Alrhman Masalkhi
2026-05-01 11:46 ` [PATCH v2 1/3] md/raid1,raid10: fix deadlock in read error recovery path Abd-Alrhman Masalkhi
@ 2026-05-01 11:46 ` Abd-Alrhman Masalkhi
2026-05-01 11:46 ` [PATCH v2 3/3] md/raid1,raid10: fix bio accounting for split md cloned bios Abd-Alrhman Masalkhi
2 siblings, 0 replies; 4+ messages in thread
From: Abd-Alrhman Masalkhi @ 2026-05-01 11:46 UTC (permalink / raw)
To: song, yukuai, xni, neilb, shli
Cc: linux-raid, linux-kernel, Abd-Alrhman Masalkhi
Detect the error path using md_cloned_bio() instead of relying
on r1_bio in raid1 or r10_bio->read_slot in raid10, which may be
NULL or -1 after splitting and resubmitting a failed bio.
As a result, the error path may not be recognized and memory
allocations can incorrectly use GFP_NOIO instead of
(GFP_NOIO | __GFP_HIGH), which can lead to a deadlock under
memory pressure.
Fixes: 689389a06ce7 ("md/raid1: simplify handle_read_error().")
Fixes: 545250f24809 ("md/raid10: simplify handle_read_error()")
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
---
This patch depends on patch 1.
Changes in v2:
- New patch.
---
drivers/md/raid1.c | 13 ++++++++++---
drivers/md/raid10.c | 20 ++++++++++++++------
2 files changed, 24 insertions(+), 9 deletions(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index cc9914bd15c1..c52ecd38c163 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1321,11 +1321,18 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio,
bool r1bio_existed = !!r1_bio;
/*
- * If r1_bio is set, we are blocking the raid1d thread
- * so there is a tiny risk of deadlock. So ask for
+ * An md cloned bio indicates we are in the error path.
+ * This is more reliable than checking r1_bio, which might
+ * be NULL even in the error path if a failed bio was split.
+ */
+ bool err_path = md_cloned_bio(mddev, bio);
+
+ /*
+ * If we are in the error path, we are blocking the raid1d
+ * thread so there is a tiny risk of deadlock. So ask for
* emergency memory if needed.
*/
- gfp_t gfp = r1_bio ? (GFP_NOIO | __GFP_HIGH) : GFP_NOIO;
+ gfp_t gfp = err_path ? (GFP_NOIO | __GFP_HIGH) : GFP_NOIO;
/*
* Still need barrier for READ in case that whole
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 3a591e60a144..8c6fc398260e 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1155,7 +1155,20 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
char b[BDEVNAME_SIZE];
int slot = r10_bio->read_slot;
struct md_rdev *err_rdev = NULL;
- gfp_t gfp = GFP_NOIO;
+
+ /*
+ * An md cloned bio indicates we are in the error path.
+ * This is more reliable than checking slot, which might
+ * be -1 even in the error path if a failed bio was split.
+ */
+ bool err_path = md_cloned_bio(mddev, bio);
+
+ /*
+ * If we are in the error path, we are blocking the raid10d
+ * thread so there is a tiny risk of deadlock. So ask for
+ * emergency memory if needed.
+ */
+ gfp_t gfp = err_path ? (GFP_NOIO | __GFP_HIGH) : GFP_NOIO;
if (slot >= 0 && r10_bio->devs[slot].rdev) {
/*
@@ -1166,11 +1179,6 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
* we lose the device name in error messages.
*/
int disk;
- /*
- * As we are blocking raid10, it is a little safer to
- * use __GFP_HIGH.
- */
- gfp = GFP_NOIO | __GFP_HIGH;
disk = r10_bio->devs[slot].devnum;
err_rdev = conf->mirrors[disk].rdev;
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* [PATCH v2 3/3] md/raid1,raid10: fix bio accounting for split md cloned bios
2026-05-01 11:46 [PATCH v2 0/3] md/raid1,raid10: fix deadlock and bio accounting in read error path Abd-Alrhman Masalkhi
2026-05-01 11:46 ` [PATCH v2 1/3] md/raid1,raid10: fix deadlock in read error recovery path Abd-Alrhman Masalkhi
2026-05-01 11:46 ` [PATCH v2 2/3] md/raid1,raid10: fix error-path detection with md_cloned_bio() Abd-Alrhman Masalkhi
@ 2026-05-01 11:46 ` Abd-Alrhman Masalkhi
2 siblings, 0 replies; 4+ messages in thread
From: Abd-Alrhman Masalkhi @ 2026-05-01 11:46 UTC (permalink / raw)
To: song, yukuai, xni, neilb, shli
Cc: linux-raid, linux-kernel, Abd-Alrhman Masalkhi
Use md_cloned_bio() to control bio accounting instead of relying
on r1bio_existed in raid1 or the io_accounting flag in raid10.
The previous logic does not reliably reflect whether a bio is an
md cloned bio. When a failed bio is split and resubmitted via
bio_submit_split_bioset() on the error path, this can lead to either
double accounting for md cloned bios, or missing accounting for bios
returned from bio_submit_split_bioset()
Fix this by using md_cloned_bio() to detect md cloned bios and
skip accounting accordingly.
Fixes: bb2a9acefaf9 ("md/raid1: switch to use md_account_bio() for io accounting")
Fixes: 820455238366 ("md/raid10: switch to use md_account_bio() for io accounting")
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
---
This patch depends on patch 1.
Changes in v2:
- New patch.
---
drivers/md/raid1.c | 2 +-
drivers/md/raid10.c | 8 ++++----
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index c52ecd38c163..dfaf34141325 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1396,7 +1396,7 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio,
}
r1_bio->read_disk = rdisk;
- if (!r1bio_existed) {
+ if (likely(!md_cloned_bio(mddev, bio))) {
md_account_bio(mddev, &bio);
r1_bio->master_bio = bio;
}
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 8c6fc398260e..93af7bbc9005 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1146,7 +1146,7 @@ static bool regular_request_wait(struct mddev *mddev, struct r10conf *conf,
}
static void raid10_read_request(struct mddev *mddev, struct bio *bio,
- struct r10bio *r10_bio, bool io_accounting)
+ struct r10bio *r10_bio)
{
struct r10conf *conf = mddev->private;
struct bio *read_bio;
@@ -1226,7 +1226,7 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
}
slot = r10_bio->read_slot;
- if (io_accounting) {
+ if (likely(!md_cloned_bio(mddev, bio))) {
md_account_bio(mddev, &bio);
r10_bio->master_bio = bio;
}
@@ -1552,7 +1552,7 @@ static void __make_request(struct mddev *mddev, struct bio *bio, int sectors)
conf->geo.raid_disks);
if (bio_data_dir(bio) == READ)
- raid10_read_request(mddev, bio, r10_bio, true);
+ raid10_read_request(mddev, bio, r10_bio);
else
raid10_write_request(mddev, bio, r10_bio);
}
@@ -2872,7 +2872,7 @@ static void handle_read_error(struct mddev *mddev, struct r10bio *r10_bio)
rdev_dec_pending(rdev, mddev);
r10_bio->state = 0;
- raid10_read_request(mddev, r10_bio->master_bio, r10_bio, false);
+ raid10_read_request(mddev, r10_bio->master_bio, r10_bio);
/*
* allow_barrier after re-submit to ensure no sync io
* can be issued while regular io pending.
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread