public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] md/raid1,raid10: fix deadlock and bio accounting in read error path
@ 2026-05-01 11:46 Abd-Alrhman Masalkhi
  2026-05-01 11:46 ` [PATCH v2 1/3] md/raid1,raid10: fix deadlock in read error recovery path Abd-Alrhman Masalkhi
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Abd-Alrhman Masalkhi @ 2026-05-01 11:46 UTC (permalink / raw)
  To: song, yukuai, xni, neilb, shli
  Cc: linux-raid, linux-kernel, Abd-Alrhman Masalkhi

Hi,

This series revisits an issue in the read error recovery path for
raid1 and raid10 when bios are split and resubmitted.

In v1, the approach was to avoid splitting bios in the raid1 thread
to prevent recursion and potential deadlocks. However, this was not
ideal and too restrictive.

As suggested by Yu Kuai, this series instead handles the problem in
md_handle_request() and allows bio splitting.

Link to Yu Kuai' email: https://lore.kernel.org/linux-raid/m2lde74dtw.fsf@gmail.com/T/#m714020a38b60fc5f84b9a24f0c46acbe5d7342d6

This series fixes the above by:
 - handling md cloned bios explicitly in md_handle_request()
 - using md_cloned_bio() to reliably detect the error path
 - fixing bio accounting to avoid double/missing accounting

Changes in v2:
 - Split fixes into separate patches for clarity.
 - Use md_cloned_bio() consistently to detect cloned bios.
 - Recognize that raid10 has the same issue and fix it in this series	
 - Allow splitting bios.
 - Handle md cloned bios explicitly in md_handle_request()
 - Link v1: https://lore.kernel.org/linux-raid/20260427103446.300378-1-abd.masalkhi@gmail.com/

Abd-Alrhman Masalkhi (3):
  md/raid1,raid10: fix deadlock in read error recovery path
  md/raid1,raid10: fix error-path detection with md_cloned_bio()
  md/raid1,raid10: fix bio accounting for split md cloned bios

 drivers/md/md.c     | 25 ++++++++++++++++---------
 drivers/md/md.h     |  5 +++++
 drivers/md/raid1.c  | 15 +++++++++++----
 drivers/md/raid10.c | 28 ++++++++++++++++++----------
 4 files changed, 50 insertions(+), 23 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 1/3] md/raid1,raid10: fix deadlock in read error recovery path
  2026-05-01 11:46 [PATCH v2 0/3] md/raid1,raid10: fix deadlock and bio accounting in read error path Abd-Alrhman Masalkhi
@ 2026-05-01 11:46 ` Abd-Alrhman Masalkhi
  2026-05-01 11:46 ` [PATCH v2 2/3] md/raid1,raid10: fix error-path detection with md_cloned_bio() Abd-Alrhman Masalkhi
  2026-05-01 11:46 ` [PATCH v2 3/3] md/raid1,raid10: fix bio accounting for split md cloned bios Abd-Alrhman Masalkhi
  2 siblings, 0 replies; 4+ messages in thread
From: Abd-Alrhman Masalkhi @ 2026-05-01 11:46 UTC (permalink / raw)
  To: song, yukuai, xni, neilb, shli
  Cc: linux-raid, linux-kernel, Abd-Alrhman Masalkhi

raid1d and raid10d may resubmit a split md cloned bio while handling
a read error. In this case, resubmitting the bio can lead to a deadlock
if the array is suspended before md_handle_request() acquires an
active_io reference via percpu_ref_tryget_live().

Since the cloned bio already holds an active_io reference,
trying to acquire another reference via percpu_ref_tryget_live()
can lead to a deadlock while the array is suspended.

Fix this by using percpu_ref_get() for md cloned bios.

Fixes: bb2a9acefaf9 ("md/raid1: switch to use md_account_bio() for io accounting")
Fixes: 820455238366 ("md/raid10: switch to use md_account_bio() for io accounting")
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
---
Changes in v2:
 - Use md_cloned_bio() consistently to detect cloned bios.
 - Recognize that raid10 has the same issue and fix it in this series	
 - Allow splitting bios.
 - Handle md cloned bios explicitly in md_handle_request()
 - Link v1: https://lore.kernel.org/linux-raid/20260427103446.300378-1-abd.masalkhi@gmail.com/

Please let me know if I should add a Suggested-by tag for Yu Kuai,
as the solution approach was suggested during review.

Link to Yu Kuai' email: https://lore.kernel.org/linux-raid/m2lde74dtw.fsf@gmail.com/T/#m714020a38b60fc5f84b9a24f0c46acbe5d7342d6

Thanks
Abd-alrhman
---
 drivers/md/md.c | 25 ++++++++++++++++---------
 drivers/md/md.h |  5 +++++
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index e926aef9ec43..96db1e7850e9 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -396,17 +396,24 @@ static bool is_suspended(struct mddev *mddev, struct bio *bio)
 bool md_handle_request(struct mddev *mddev, struct bio *bio)
 {
 check_suspended:
-	if (is_suspended(mddev, bio)) {
-		/* Bail out if REQ_NOWAIT is set for the bio */
-		if (bio->bi_opf & REQ_NOWAIT) {
-			bio_wouldblock_error(bio);
-			return true;
+	if (unlikely(md_cloned_bio(mddev, bio))) {
+		/*
+		 * This bio is an MD cloned bio and already holds an
+		 * active_io reference, so percpu_ref_get() is safe here.
+		 */
+		percpu_ref_get(&mddev->active_io);
+	} else {
+		if (is_suspended(mddev, bio)) {
+			/* Bail out if REQ_NOWAIT is set for the bio */
+			if (bio->bi_opf & REQ_NOWAIT) {
+				bio_wouldblock_error(bio);
+				return true;
+			}
+			wait_event(mddev->sb_wait, !is_suspended(mddev, bio));
 		}
-		wait_event(mddev->sb_wait, !is_suspended(mddev, bio));
+		if (!percpu_ref_tryget_live(&mddev->active_io))
+			goto check_suspended;
 	}
-	if (!percpu_ref_tryget_live(&mddev->active_io))
-		goto check_suspended;
-
 	if (!mddev->pers->make_request(mddev, bio)) {
 		percpu_ref_put(&mddev->active_io);
 		if (mddev_is_dm(mddev) && mddev->pers->prepare_suspend)
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 3bfbee595156..e44074d30cf9 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -1038,6 +1038,11 @@ void mddev_update_io_opt(struct mddev *mddev, unsigned int nr_stripes);
 
 extern const struct block_device_operations md_fops;
 
+static inline bool md_cloned_bio(struct mddev *mddev, struct bio *bio)
+{
+	return bio->bi_pool == &mddev->io_clone_set;
+}
+
 /*
  * MD devices can be used undeneath by DM, in which case ->gendisk is NULL.
  */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 2/3] md/raid1,raid10: fix error-path detection with md_cloned_bio()
  2026-05-01 11:46 [PATCH v2 0/3] md/raid1,raid10: fix deadlock and bio accounting in read error path Abd-Alrhman Masalkhi
  2026-05-01 11:46 ` [PATCH v2 1/3] md/raid1,raid10: fix deadlock in read error recovery path Abd-Alrhman Masalkhi
@ 2026-05-01 11:46 ` Abd-Alrhman Masalkhi
  2026-05-01 11:46 ` [PATCH v2 3/3] md/raid1,raid10: fix bio accounting for split md cloned bios Abd-Alrhman Masalkhi
  2 siblings, 0 replies; 4+ messages in thread
From: Abd-Alrhman Masalkhi @ 2026-05-01 11:46 UTC (permalink / raw)
  To: song, yukuai, xni, neilb, shli
  Cc: linux-raid, linux-kernel, Abd-Alrhman Masalkhi

Detect the error path using md_cloned_bio() instead of relying
on r1_bio in raid1 or r10_bio->read_slot in raid10, which may be
NULL or -1 after splitting and resubmitting a failed bio.

As a result, the error path may not be recognized and memory
allocations can incorrectly use GFP_NOIO instead of
(GFP_NOIO | __GFP_HIGH), which can lead to a deadlock under
memory pressure.

Fixes: 689389a06ce7 ("md/raid1: simplify handle_read_error().")
Fixes: 545250f24809 ("md/raid10: simplify handle_read_error()")
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
---
This patch depends on patch 1.

Changes in v2:
 - New patch.
---
 drivers/md/raid1.c  | 13 ++++++++++---
 drivers/md/raid10.c | 20 ++++++++++++++------
 2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index cc9914bd15c1..c52ecd38c163 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1321,11 +1321,18 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio,
 	bool r1bio_existed = !!r1_bio;
 
 	/*
-	 * If r1_bio is set, we are blocking the raid1d thread
-	 * so there is a tiny risk of deadlock.  So ask for
+	 * An md cloned bio indicates we are in the error path.
+	 * This is more reliable than checking r1_bio, which might
+	 * be NULL even in the error path if a failed bio was split.
+	 */
+	bool err_path = md_cloned_bio(mddev, bio);
+
+	/*
+	 * If we are in the error path, we are blocking the raid1d
+	 * thread so there is a tiny risk of deadlock.  So ask for
 	 * emergency memory if needed.
 	 */
-	gfp_t gfp = r1_bio ? (GFP_NOIO | __GFP_HIGH) : GFP_NOIO;
+	gfp_t gfp = err_path ? (GFP_NOIO | __GFP_HIGH) : GFP_NOIO;
 
 	/*
 	 * Still need barrier for READ in case that whole
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 3a591e60a144..8c6fc398260e 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1155,7 +1155,20 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
 	char b[BDEVNAME_SIZE];
 	int slot = r10_bio->read_slot;
 	struct md_rdev *err_rdev = NULL;
-	gfp_t gfp = GFP_NOIO;
+
+	/*
+	 * An md cloned bio indicates we are in the error path.
+	 * This is more reliable than checking slot, which might
+	 * be -1 even in the error path if a failed bio was split.
+	 */
+	bool err_path = md_cloned_bio(mddev, bio);
+
+	/*
+	 * If we are in the error path, we are blocking the raid10d
+	 * thread so there is a tiny risk of deadlock.  So ask for
+	 * emergency memory if needed.
+	 */
+	gfp_t gfp = err_path ? (GFP_NOIO | __GFP_HIGH) : GFP_NOIO;
 
 	if (slot >= 0 && r10_bio->devs[slot].rdev) {
 		/*
@@ -1166,11 +1179,6 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
 		 * we lose the device name in error messages.
 		 */
 		int disk;
-		/*
-		 * As we are blocking raid10, it is a little safer to
-		 * use __GFP_HIGH.
-		 */
-		gfp = GFP_NOIO | __GFP_HIGH;
 
 		disk = r10_bio->devs[slot].devnum;
 		err_rdev = conf->mirrors[disk].rdev;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 3/3] md/raid1,raid10: fix bio accounting for split md cloned bios
  2026-05-01 11:46 [PATCH v2 0/3] md/raid1,raid10: fix deadlock and bio accounting in read error path Abd-Alrhman Masalkhi
  2026-05-01 11:46 ` [PATCH v2 1/3] md/raid1,raid10: fix deadlock in read error recovery path Abd-Alrhman Masalkhi
  2026-05-01 11:46 ` [PATCH v2 2/3] md/raid1,raid10: fix error-path detection with md_cloned_bio() Abd-Alrhman Masalkhi
@ 2026-05-01 11:46 ` Abd-Alrhman Masalkhi
  2 siblings, 0 replies; 4+ messages in thread
From: Abd-Alrhman Masalkhi @ 2026-05-01 11:46 UTC (permalink / raw)
  To: song, yukuai, xni, neilb, shli
  Cc: linux-raid, linux-kernel, Abd-Alrhman Masalkhi

Use md_cloned_bio() to control bio accounting instead of relying
on r1bio_existed in raid1 or the io_accounting flag in raid10.

The previous logic does not reliably reflect whether a bio is an
md cloned bio. When a failed bio is split and resubmitted via
bio_submit_split_bioset() on the error path, this can lead to either
double accounting for md cloned bios, or missing accounting for bios
returned from bio_submit_split_bioset()

Fix this by using md_cloned_bio() to detect md cloned bios and
skip accounting accordingly.

Fixes: bb2a9acefaf9 ("md/raid1: switch to use md_account_bio() for io accounting")
Fixes: 820455238366 ("md/raid10: switch to use md_account_bio() for io accounting")
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
---
This patch depends on patch 1.

Changes in v2:
 - New patch.
---
 drivers/md/raid1.c  | 2 +-
 drivers/md/raid10.c | 8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index c52ecd38c163..dfaf34141325 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1396,7 +1396,7 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio,
 	}
 
 	r1_bio->read_disk = rdisk;
-	if (!r1bio_existed) {
+	if (likely(!md_cloned_bio(mddev, bio))) {
 		md_account_bio(mddev, &bio);
 		r1_bio->master_bio = bio;
 	}
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 8c6fc398260e..93af7bbc9005 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1146,7 +1146,7 @@ static bool regular_request_wait(struct mddev *mddev, struct r10conf *conf,
 }
 
 static void raid10_read_request(struct mddev *mddev, struct bio *bio,
-				struct r10bio *r10_bio, bool io_accounting)
+				struct r10bio *r10_bio)
 {
 	struct r10conf *conf = mddev->private;
 	struct bio *read_bio;
@@ -1226,7 +1226,7 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
 	}
 	slot = r10_bio->read_slot;
 
-	if (io_accounting) {
+	if (likely(!md_cloned_bio(mddev, bio))) {
 		md_account_bio(mddev, &bio);
 		r10_bio->master_bio = bio;
 	}
@@ -1552,7 +1552,7 @@ static void __make_request(struct mddev *mddev, struct bio *bio, int sectors)
 			conf->geo.raid_disks);
 
 	if (bio_data_dir(bio) == READ)
-		raid10_read_request(mddev, bio, r10_bio, true);
+		raid10_read_request(mddev, bio, r10_bio);
 	else
 		raid10_write_request(mddev, bio, r10_bio);
 }
@@ -2872,7 +2872,7 @@ static void handle_read_error(struct mddev *mddev, struct r10bio *r10_bio)
 
 	rdev_dec_pending(rdev, mddev);
 	r10_bio->state = 0;
-	raid10_read_request(mddev, r10_bio->master_bio, r10_bio, false);
+	raid10_read_request(mddev, r10_bio->master_bio, r10_bio);
 	/*
 	 * allow_barrier after re-submit to ensure no sync io
 	 * can be issued while regular io pending.
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-01 11:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-01 11:46 [PATCH v2 0/3] md/raid1,raid10: fix deadlock and bio accounting in read error path Abd-Alrhman Masalkhi
2026-05-01 11:46 ` [PATCH v2 1/3] md/raid1,raid10: fix deadlock in read error recovery path Abd-Alrhman Masalkhi
2026-05-01 11:46 ` [PATCH v2 2/3] md/raid1,raid10: fix error-path detection with md_cloned_bio() Abd-Alrhman Masalkhi
2026-05-01 11:46 ` [PATCH v2 3/3] md/raid1,raid10: fix bio accounting for split md cloned bios Abd-Alrhman Masalkhi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox