[PATCH v2 0/4] md/raid1,raid10: fix write-path reference leaks and clean up error handling

Linux RAID subsystem development
 help / color / mirror / Atom feed

* [PATCH v2 0/4] md/raid1,raid10: fix write-path reference leaks and clean up error handling
@ 2026-06-13 18:28 Abd-Alrhman Masalkhi
  2026-06-13 18:28 ` [PATCH v2 1/4] md/raid1: fix writes_pending and barrier reference leaks on write failures Abd-Alrhman Masalkhi
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Abd-Alrhman Masalkhi @ 2026-06-13 18:28 UTC (permalink / raw)
  To: song, yukuai, magiclinan, xiao, axboe, hare, john.g.garry,
	martin.petersen, vverma
  Cc: linux-raid, linux-kernel, Abd-Alrhman Masalkhi

Hi,

This series fixes several write-path failure handling issues in raid1 and
raid10 and then follows up with a cleanup of raid1_write_request().

The first two patches fix writes_pending leaks caused by failure paths
that complete bios without reaching the normal write completion path.
The raid1 fix also addresses a barrier reference leak when
wait_blocked_rdev() fails after wait_barrier() succeeds.

The third patch fixes additional writes_pending and barrier reference
leaks in raid10 discard handling.

The final patch simplifies raid1_write_request() error handling.

Patches:
md/raid1: fix writes_pending and barrier reference leaks on write failures
md/raid10: fix writes_pending leak on write request failures
md/raid10: fix writes_pending and barrier reference leaks on discard failures
md/raid1: simplify raid1_write_request() error handling

Changes in v2:
 - fix writes_pending leaks in addition to the barrier reference leaks
 - add raid10 fixes for analogous write and discard failure paths
 - add a follow-up cleanup patch to simplify raid1_write_request()
 - simplifies raid1_write_request() error handling.
 - Link v1: https://lore.kernel.org/linux-raid/20260611132500.763528-1-abd.masalkhi@gmail.com/

Thanks,
Abd-Alrhman

Abd-Alrhman Masalkhi (4):
  md/raid1: fix writes_pending and barrier reference leaks on write
    failures
  md/raid10: fix writes_pending leak on write request failures
  md/raid10: fix writes_pending and barrier reference leaks on discard
    failures
  md/raid1: simplify raid1_write_request() error handling

 drivers/md/raid1.c  | 74 ++++++++++++++++++++++++---------------------
 drivers/md/raid10.c | 28 ++++++++++++-----
 2 files changed, 60 insertions(+), 42 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/4] md/raid1: fix writes_pending and barrier reference leaks on write failures
  2026-06-13 18:28 [PATCH v2 0/4] md/raid1,raid10: fix write-path reference leaks and clean up error handling Abd-Alrhman Masalkhi
@ 2026-06-13 18:28 ` Abd-Alrhman Masalkhi
  2026-06-13 18:28 ` [PATCH v2 2/4] md/raid10: fix writes_pending leak on write request failures Abd-Alrhman Masalkhi
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Abd-Alrhman Masalkhi @ 2026-06-13 18:28 UTC (permalink / raw)
  To: song, yukuai, magiclinan, xiao, axboe, hare, john.g.garry,
	martin.petersen, vverma
  Cc: linux-raid, linux-kernel, Abd-Alrhman Masalkhi, sashiko-bot

raid1_make_request() acquires a writes_pending reference with
md_write_start() before calling raid1_write_request(). Several failure
paths in raid1_write_request() complete the bio and return without
reaching the normal write completion path, causing the corresponding
md_write_end() to be skipped.

Make raid1_write_request() return a status indicating whether the write
request was successfully queued. This allows raid1_make_request() to
call md_write_end() when raid1_write_request() fails.

Additionally, if wait_blocked_rdev() fails after wait_barrier()
succeeds, the associated barrier reference is not released.

Call allow_barrier() before returning from that path to keep the barrier
accounting balanced.

Fixes: b1a7ad8b5c4f ("md/raid1: Handle bio_split() errors")
Fixes: f2a38abf5f1c ("md/raid1: Atomic write support")
Fixes: 5aa705039c4f ("md: raid1 add nowait support")
Reported-by: sashiko-bot <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260611083514.754922-1-abd.masalkhi@gmail.com?part=1
Closes: https://sashiko.dev/#/patchset/20260611132500.763528-1-abd.masalkhi@gmail.com?part=1
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
---
Changes in v2:
 - fix writes_pending leaks in addition to the barrier reference leak.
 - make raid1_write_request() return whether the write was successfully
   queued so raid1_make_request() can release writes_pending references
   on failure paths.
 - Link v1: https://lore.kernel.org/linux-raid/20260611132500.763528-1-abd.masalkhi@gmail.com/
---
 drivers/md/raid1.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index b1ed4cc6ade4..f0e1c7125972 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1501,7 +1501,7 @@ static void raid1_start_write_behind(struct mddev *mddev, struct r1bio *r1_bio,
 
 }
 
-static void raid1_write_request(struct mddev *mddev, struct bio *bio,
+static bool raid1_write_request(struct mddev *mddev, struct bio *bio,
 				int max_write_sectors)
 {
 	struct r1conf *conf = mddev->private;
@@ -1512,6 +1512,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 	int max_sectors;
 	bool write_behind = false;
 	bool is_discard = (bio_op(bio) == REQ_OP_DISCARD);
+	sector_t sector = bio->bi_iter.bi_sector;
 
 	if (mddev_is_clustered(mddev) &&
 	    mddev->cluster_ops->area_resyncing(mddev, WRITE,
@@ -1519,7 +1520,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 
 		if (bio->bi_opf & REQ_NOWAIT) {
 			bio_wouldblock_error(bio);
-			return;
+			return false;
 		}
 		wait_event_idle(conf->wait_barrier,
 				!mddev->cluster_ops->area_resyncing(mddev, WRITE,
@@ -1535,12 +1536,13 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 	if (!wait_barrier(conf, bio->bi_iter.bi_sector,
 				bio->bi_opf & REQ_NOWAIT)) {
 		bio_wouldblock_error(bio);
-		return;
+		return false;
 	}
 
 	if (!wait_blocked_rdev(mddev, bio)) {
 		bio_wouldblock_error(bio);
-		return;
+		allow_barrier(conf, sector);
+		return false;
 	}
 
 	r1_bio = alloc_r1bio(mddev, bio);
@@ -1699,7 +1701,8 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 
 	/* In case raid1d snuck in to freeze_array */
 	wake_up_barrier(conf);
-	return;
+	return true;
+
 err_handle:
 	for (k = 0; k < i; k++) {
 		if (r1_bio->bios[k]) {
@@ -1709,6 +1712,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 	}
 
 	raid_end_bio_io(r1_bio);
+	return false;
 }
 
 static bool raid1_make_request(struct mddev *mddev, struct bio *bio)
@@ -1732,8 +1736,9 @@ static bool raid1_make_request(struct mddev *mddev, struct bio *bio)
 	if (bio_data_dir(bio) == READ)
 		raid1_read_request(mddev, bio, sectors, NULL);
 	else {
-		md_write_start(mddev,bio);
-		raid1_write_request(mddev, bio, sectors);
+		md_write_start(mddev, bio);
+		if (!raid1_write_request(mddev, bio, sectors))
+			md_write_end(mddev);
 	}
 	return true;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/4] md/raid10: fix writes_pending leak on write request failures
  2026-06-13 18:28 [PATCH v2 0/4] md/raid1,raid10: fix write-path reference leaks and clean up error handling Abd-Alrhman Masalkhi
  2026-06-13 18:28 ` [PATCH v2 1/4] md/raid1: fix writes_pending and barrier reference leaks on write failures Abd-Alrhman Masalkhi
@ 2026-06-13 18:28 ` Abd-Alrhman Masalkhi
  2026-06-13 18:40   ` sashiko-bot
  2026-06-13 18:28 ` [PATCH v2 3/4] md/raid10: fix writes_pending and barrier reference leaks on discard failures Abd-Alrhman Masalkhi
  2026-06-13 18:28 ` [PATCH v2 4/4] md/raid1: simplify raid1_write_request() error handling Abd-Alrhman Masalkhi
  3 siblings, 1 reply; 7+ messages in thread
From: Abd-Alrhman Masalkhi @ 2026-06-13 18:28 UTC (permalink / raw)
  To: song, yukuai, magiclinan, xiao, axboe, hare, john.g.garry,
	martin.petersen, vverma
  Cc: linux-raid, linux-kernel, Abd-Alrhman Masalkhi

raid10_make_request() acquires a writes_pending reference with
md_write_start() before dispatching write requests. Several failure
paths in raid10_write_request() complete the bio and return without
reaching the normal write completion path, causing the corresponding
md_write_end() to be skipped.

Make raid10_write_request() return a status indicating whether the write
request was successfully queued. This allows raid10_make_request() to
release the writes_pending reference with md_write_end() when a write
request fails.

Fixes: 4cf58d952909 ("md/raid10: Handle bio_split() errors")
Fixes: c9aa889b035f ("md: raid10 add nowait support")
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
---
Changes in v2:
 - new patch.
---
 drivers/md/raid10.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 5bd7698e0a1b..5ad1b0c6207a 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1349,7 +1349,7 @@ static void wait_blocked_dev(struct mddev *mddev, struct r10bio *r10_bio)
 	}
 }
 
-static void raid10_write_request(struct mddev *mddev, struct bio *bio,
+static bool raid10_write_request(struct mddev *mddev, struct bio *bio,
 				 struct r10bio *r10_bio)
 {
 	struct r10conf *conf = mddev->private;
@@ -1365,7 +1365,7 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 		/* Bail out if REQ_NOWAIT is set for the bio */
 		if (bio->bi_opf & REQ_NOWAIT) {
 			bio_wouldblock_error(bio);
-			return;
+			return false;
 		}
 		for (;;) {
 			prepare_to_wait(&conf->wait_barrier,
@@ -1381,7 +1381,7 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 	sectors = r10_bio->sectors;
 	if (!regular_request_wait(mddev, conf, bio, sectors)) {
 		free_r10bio(r10_bio);
-		return;
+		return false;
 	}
 
 	if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
@@ -1398,7 +1398,7 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 		if (bio->bi_opf & REQ_NOWAIT) {
 			allow_barrier(conf);
 			bio_wouldblock_error(bio);
-			return;
+			return false;
 		}
 		mddev_add_trace_msg(conf->mddev,
 			"raid10 wait reshape metadata");
@@ -1514,7 +1514,8 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 			raid10_write_one_disk(mddev, r10_bio, bio, true, i);
 	}
 	one_write_done(r10_bio);
-	return;
+	return true;
+
 err_handle:
 	for (k = 0;  k < i; k++) {
 		int d = r10_bio->devs[k].devnum;
@@ -1532,10 +1533,12 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 	}
 
 	raid_end_bio_io(r10_bio);
+	return false;
 }
 
-static void __make_request(struct mddev *mddev, struct bio *bio, int sectors)
+static bool __make_request(struct mddev *mddev, struct bio *bio, int sectors)
 {
+	bool ret;
 	struct r10conf *conf = mddev->private;
 	struct r10bio *r10_bio;
 
@@ -1551,10 +1554,13 @@ static void __make_request(struct mddev *mddev, struct bio *bio, int sectors)
 	memset(r10_bio->devs, 0, sizeof(r10_bio->devs[0]) *
 			conf->geo.raid_disks);
 
+	ret = true;
 	if (bio_data_dir(bio) == READ)
 		raid10_read_request(mddev, bio, r10_bio);
 	else
-		raid10_write_request(mddev, bio, r10_bio);
+		ret = raid10_write_request(mddev, bio, r10_bio);
+
+	return ret;
 }
 
 static void raid_end_discard_bio(struct r10bio *r10bio)
@@ -1900,7 +1906,8 @@ static bool raid10_make_request(struct mddev *mddev, struct bio *bio)
 		sectors = chunk_sects -
 			(bio->bi_iter.bi_sector &
 			 (chunk_sects - 1));
-	__make_request(mddev, bio, sectors);
+	if (!__make_request(mddev, bio, sectors))
+		md_write_end(mddev);
 
 	/* In case raid10d snuck in to freeze_array */
 	wake_up_barrier(conf);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/4] md/raid10: fix writes_pending and barrier reference leaks on discard failures
  2026-06-13 18:28 [PATCH v2 0/4] md/raid1,raid10: fix write-path reference leaks and clean up error handling Abd-Alrhman Masalkhi
  2026-06-13 18:28 ` [PATCH v2 1/4] md/raid1: fix writes_pending and barrier reference leaks on write failures Abd-Alrhman Masalkhi
  2026-06-13 18:28 ` [PATCH v2 2/4] md/raid10: fix writes_pending leak on write request failures Abd-Alrhman Masalkhi
@ 2026-06-13 18:28 ` Abd-Alrhman Masalkhi
  2026-06-13 18:28 ` [PATCH v2 4/4] md/raid1: simplify raid1_write_request() error handling Abd-Alrhman Masalkhi
  3 siblings, 0 replies; 7+ messages in thread
From: Abd-Alrhman Masalkhi @ 2026-06-13 18:28 UTC (permalink / raw)
  To: song, yukuai, magiclinan, xiao, axboe, hare, john.g.garry,
	martin.petersen, vverma
  Cc: linux-raid, linux-kernel, Abd-Alrhman Masalkhi

raid10_make_request() acquires a writes_pending reference with
md_write_start() before calling raid10_handle_discard(). Several failure
paths in raid10_handle_discard() complete the bio and return without
releasing the corresponding reference, causing md_write_end() to be
skipped.

Call md_write_end() before returning from these failure paths to keep
writes_pending accounting balanced.

Additionally, discard split allocation failures can occur after
wait_barrier() succeeds. Those paths return without calling
allow_barrier(), leaking the associated barrier reference.

Release the barrier before returning from those paths.

Fixes: c9aa889b035f ("md: raid10 add nowait support")
Fixes: 4cf58d952909 ("md/raid10: Handle bio_split() errors")
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
---
Changes in v2:
 - new patch.
---
 drivers/md/raid10.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 5ad1b0c6207a..aacf160ee9f2 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1639,6 +1639,7 @@ static int raid10_handle_discard(struct mddev *mddev, struct bio *bio)
 
 	if (!wait_barrier(conf, bio->bi_opf & REQ_NOWAIT)) {
 		bio_wouldblock_error(bio);
+		md_write_end(mddev);
 		return 0;
 	}
 
@@ -1681,6 +1682,8 @@ static int raid10_handle_discard(struct mddev *mddev, struct bio *bio)
 		if (IS_ERR(split)) {
 			bio->bi_status = errno_to_blk_status(PTR_ERR(split));
 			bio_endio(bio);
+			md_write_end(mddev);
+			allow_barrier(conf);
 			return 0;
 		}
 
@@ -1698,6 +1701,8 @@ static int raid10_handle_discard(struct mddev *mddev, struct bio *bio)
 		if (IS_ERR(split)) {
 			bio->bi_status = errno_to_blk_status(PTR_ERR(split));
 			bio_endio(bio);
+			md_write_end(mddev);
+			allow_barrier(conf);
 			return 0;
 		}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 4/4] md/raid1: simplify raid1_write_request() error handling
  2026-06-13 18:28 [PATCH v2 0/4] md/raid1,raid10: fix write-path reference leaks and clean up error handling Abd-Alrhman Masalkhi
                   ` (2 preceding siblings ...)
  2026-06-13 18:28 ` [PATCH v2 3/4] md/raid10: fix writes_pending and barrier reference leaks on discard failures Abd-Alrhman Masalkhi
@ 2026-06-13 18:28 ` Abd-Alrhman Masalkhi
  2026-06-13 18:47   ` sashiko-bot
  3 siblings, 1 reply; 7+ messages in thread
From: Abd-Alrhman Masalkhi @ 2026-06-13 18:28 UTC (permalink / raw)
  To: song, yukuai, magiclinan, xiao, axboe, hare, john.g.garry,
	martin.petersen, vverma
  Cc: linux-raid, linux-kernel, Abd-Alrhman Masalkhi

raid1_write_request() increments rdev->nr_pending before checking the
badblocks and then immediately decrements it again when a device is
skipped. Move the increment until after the checks succeed so the
reference accounting is easier to follow.

Consolidate the failure paths so that each error label releases exactly
the resources acquired up to that point. err_dec_pending drops pending
references and frees the r1bio, while err_allow_barrier handles the
barrier release before returning.

When a REQ_ATOMIC write cannot be satisfied due to a badblock range,
complete the bio with BLK_STS_NOTSUPP rather than reporting an I/O
error, since the operation is unsupported rather than having failed
during I/O.

Rename max_write_sectors to max_sectors and remove the redundant local
copy.

Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
---
Changes in v2:
 - new patch, depends on patch 1.
---
 drivers/md/raid1.c | 59 +++++++++++++++++++++++-----------------------
 1 file changed, 30 insertions(+), 29 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index f0e1c7125972..dc0b7b8bc2f8 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1502,29 +1502,29 @@ static void raid1_start_write_behind(struct mddev *mddev, struct r1bio *r1_bio,
 }
 
 static bool raid1_write_request(struct mddev *mddev, struct bio *bio,
-				int max_write_sectors)
+				int max_sectors)
 {
 	struct r1conf *conf = mddev->private;
 	struct r1bio *r1_bio;
 	int i, disks, k;
 	unsigned long flags;
 	int first_clone;
-	int max_sectors;
 	bool write_behind = false;
-	bool is_discard = (bio_op(bio) == REQ_OP_DISCARD);
+	bool nowait = bio->bi_opf & REQ_NOWAIT;
+	bool is_discard = op_is_discard(bio->bi_opf);
 	sector_t sector = bio->bi_iter.bi_sector;
 
 	if (mddev_is_clustered(mddev) &&
-	    mddev->cluster_ops->area_resyncing(mddev, WRITE,
-		     bio->bi_iter.bi_sector, bio_end_sector(bio))) {
+	    mddev->cluster_ops->area_resyncing(mddev, WRITE, sector,
+					       bio_end_sector(bio))) {
 
-		if (bio->bi_opf & REQ_NOWAIT) {
+		if (nowait) {
 			bio_wouldblock_error(bio);
 			return false;
 		}
 		wait_event_idle(conf->wait_barrier,
 				!mddev->cluster_ops->area_resyncing(mddev, WRITE,
-								    bio->bi_iter.bi_sector,
+								    sector,
 								    bio_end_sector(bio)));
 	}
 
@@ -1533,20 +1533,18 @@ static bool raid1_write_request(struct mddev *mddev, struct bio *bio,
 	 * thread has put up a bar for new requests.
 	 * Continue immediately if no resync is active currently.
 	 */
-	if (!wait_barrier(conf, bio->bi_iter.bi_sector,
-				bio->bi_opf & REQ_NOWAIT)) {
+	if (!wait_barrier(conf, sector, nowait)) {
 		bio_wouldblock_error(bio);
 		return false;
 	}
 
 	if (!wait_blocked_rdev(mddev, bio)) {
 		bio_wouldblock_error(bio);
-		allow_barrier(conf, sector);
-		return false;
+		goto err_allow_barrier;
 	}
 
 	r1_bio = alloc_r1bio(mddev, bio);
-	r1_bio->sectors = max_write_sectors;
+	r1_bio->sectors = max_sectors;
 
 	/* first select target devices under rcu_lock and
 	 * inc refcount on their rdev.  Record them by setting
@@ -1560,7 +1558,6 @@ static bool raid1_write_request(struct mddev *mddev, struct bio *bio,
 	 */
 
 	disks = conf->raid_disks * 2;
-	max_sectors = r1_bio->sectors;
 	for (i = 0;  i < disks; i++) {
 		struct md_rdev *rdev = conf->mirrors[i].rdev;
 
@@ -1576,23 +1573,21 @@ static bool raid1_write_request(struct mddev *mddev, struct bio *bio,
 		if (!rdev || test_bit(Faulty, &rdev->flags))
 			continue;
 
-		atomic_inc(&rdev->nr_pending);
 		if (test_bit(WriteErrorSeen, &rdev->flags)) {
 			sector_t first_bad;
 			sector_t bad_sectors;
 			int is_bad;
 
-			is_bad = is_badblock(rdev, r1_bio->sector, max_sectors,
+			is_bad = is_badblock(rdev, sector, max_sectors,
 					     &first_bad, &bad_sectors);
-			if (is_bad && first_bad <= r1_bio->sector) {
+			if (is_bad && first_bad <= sector) {
 				/* Cannot write here at all */
-				bad_sectors -= (r1_bio->sector - first_bad);
+				bad_sectors -= (sector - first_bad);
 				if (bad_sectors < max_sectors)
 					/* mustn't write more than bad_sectors
 					 * to other devices yet
 					 */
 					max_sectors = bad_sectors;
-				rdev_dec_pending(rdev, mddev);
 				continue;
 			}
 			if (is_bad) {
@@ -1606,15 +1601,18 @@ static bool raid1_write_request(struct mddev *mddev, struct bio *bio,
 				 * the benefit.
 				 */
 				if (bio->bi_opf & REQ_ATOMIC) {
-					rdev_dec_pending(rdev, mddev);
-					goto err_handle;
+					bio->bi_status = BLK_STS_NOTSUPP;
+					bio_endio(bio);
+					goto err_dec_pending;
 				}
 
-				good_sectors = first_bad - r1_bio->sector;
+				good_sectors = first_bad - sector;
 				if (good_sectors < max_sectors)
 					max_sectors = good_sectors;
 			}
 		}
+
+		atomic_inc(&rdev->nr_pending);
 		r1_bio->bios[i] = bio;
 	}
 
@@ -1630,10 +1628,8 @@ static bool raid1_write_request(struct mddev *mddev, struct bio *bio,
 	if (max_sectors < bio_sectors(bio)) {
 		bio = bio_submit_split_bioset(bio, max_sectors,
 					      &conf->bio_split);
-		if (!bio) {
-			set_bit(R1BIO_Returned, &r1_bio->state);
-			goto err_handle;
-		}
+		if (!bio)
+			goto err_dec_pending;
 
 		r1_bio->master_bio = bio;
 		r1_bio->sectors = max_sectors;
@@ -1677,7 +1673,7 @@ static bool raid1_write_request(struct mddev *mddev, struct bio *bio,
 		mbio->bi_opf &= ~REQ_NOWAIT;
 		r1_bio->bios[i] = mbio;
 
-		mbio->bi_iter.bi_sector	= (r1_bio->sector + rdev->data_offset);
+		mbio->bi_iter.bi_sector	= sector + rdev->data_offset;
 		mbio->bi_end_io	= raid1_end_write_request;
 		if (test_bit(FailFast, &rdev->flags) &&
 		    !test_bit(WriteMostly, &rdev->flags) &&
@@ -1686,7 +1682,7 @@ static bool raid1_write_request(struct mddev *mddev, struct bio *bio,
 		mbio->bi_private = r1_bio;
 
 		atomic_inc(&r1_bio->remaining);
-		mddev_trace_remap(mddev, mbio, r1_bio->sector);
+		mddev_trace_remap(mddev, mbio, sector);
 		/* flush_pending_writes() needs access to the rdev so...*/
 		mbio->bi_bdev = (void *)rdev;
 		if (!raid1_add_bio_to_plug(mddev, mbio, raid1_unplug, disks)) {
@@ -1701,9 +1697,10 @@ static bool raid1_write_request(struct mddev *mddev, struct bio *bio,
 
 	/* In case raid1d snuck in to freeze_array */
 	wake_up_barrier(conf);
+
 	return true;
 
-err_handle:
+err_dec_pending:
 	for (k = 0; k < i; k++) {
 		if (r1_bio->bios[k]) {
 			rdev_dec_pending(conf->mirrors[k].rdev, mddev);
@@ -1711,7 +1708,11 @@ static bool raid1_write_request(struct mddev *mddev, struct bio *bio,
 		}
 	}
 
-	raid_end_bio_io(r1_bio);
+	free_r1bio(r1_bio);
+
+err_allow_barrier:
+	allow_barrier(conf, sector);
+
 	return false;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/4] md/raid10: fix writes_pending leak on write request failures
  2026-06-13 18:28 ` [PATCH v2 2/4] md/raid10: fix writes_pending leak on write request failures Abd-Alrhman Masalkhi
@ 2026-06-13 18:40   ` sashiko-bot
  0 siblings, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-06-13 18:40 UTC (permalink / raw)
  To: Abd-Alrhman Masalkhi; +Cc: linux-raid, yukuai

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] The `r10_bio` structure allocated from the mempool is leaked when `raid10_write_request()` bails out early due to `REQ_NOWAIT`.
--

commit 3940ccdb4c7c7e19c20f9b5e2e1d12545c2558b5
Author: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>

md/raid10: fix writes_pending leak on write request failures

raid10_make_request() acquires a writes_pending reference with
md_write_start() before dispatching write requests. Several failure
paths in raid10_write_request() complete the bio and return without
reaching the normal write completion path, causing the corresponding
md_write_end() to be skipped.

> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index cee5a253a281d..c123a8c76ddcd 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
[ ... ]
> @@ -1365,7 +1365,7 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
>  		/* Bail out if REQ_NOWAIT is set for the bio */
>  		if (bio->bi_opf & REQ_NOWAIT) {
>  			bio_wouldblock_error(bio);
> -			return;
> +			return false;

[Severity: High]
This isn't a bug introduced by this patch, but does this early return leak
the r10_bio structure?

__make_request() allocates r10_bio using:
    r10_bio = mempool_alloc(&conf->r10bio_pool, GFP_NOIO);

It then passes it to raid10_write_request(). When the bio has REQ_NOWAIT set
during a clustered area resync, we complete the bio with an error and return
false without freeing the r10_bio. Could repeatedly triggering this exhaust
the r10bio_pool, causing deadlocks on subsequent operations?

Note that the regular_request_wait() error path right below correctly calls
free_r10bio(r10_bio) before returning.

>  		}
>  		for (;;) {
[ ... ]
> @@ -1398,7 +1398,7 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
>  		if (bio->bi_opf & REQ_NOWAIT) {
>  			allow_barrier(conf);
>  			bio_wouldblock_error(bio);
> -			return;
> +			return false;

[Severity: High]
This is a pre-existing issue, but does this path also leak the r10_bio
mempool allocation when waiting for a reshape metadata update?

Similar to the clustered area resync case above, the r10_bio allocated in
__make_request() does not appear to be freed before returning here.

>  		}
>  		mddev_add_trace_msg(conf->mddev,

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260613182810.1317258-1-abd.masalkhi@gmail.com?part=2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 4/4] md/raid1: simplify raid1_write_request() error handling
  2026-06-13 18:28 ` [PATCH v2 4/4] md/raid1: simplify raid1_write_request() error handling Abd-Alrhman Masalkhi
@ 2026-06-13 18:47   ` sashiko-bot
  0 siblings, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-06-13 18:47 UTC (permalink / raw)
  To: Abd-Alrhman Masalkhi; +Cc: linux-raid, yukuai

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [Critical] Moving `atomic_inc(&rdev->nr_pending)` down opens a race window with `hot_remove_disk`, leading to a NULL pointer dereference during error handling or a Use-After-Free in clustered configurations.

Pre-existing issues:
- [High] A `REQ_ATOMIC` write can be silently split if a bad block overlaps the start of the write, violating atomicity guarantees.
--

commit c334f57d91677f32131f21e50bb32b83136e5a60
Author: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>

md/raid1: simplify raid1_write_request() error handling

This commit delays the rdev->nr_pending increment until after badblocks
are checked to avoid incrementing and immediately decrementing on skipped
devices. It also consolidates error handling paths and modifies REQ_ATOMIC
failures to complete with BLK_STS_NOTSUPP instead of I/O errors.

> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 632d72607e11e..86d4f224ffb11 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
[ ... ]
> @@ -1576,23 +1573,21 @@ static bool raid1_write_request(struct mddev *mddev, struct bio *bio,
>  		if (!rdev || test_bit(Faulty, &rdev->flags))
>  			continue;
>  
> -		atomic_inc(&rdev->nr_pending);
>  		if (test_bit(WriteErrorSeen, &rdev->flags)) {
>  			sector_t first_bad;
>  			sector_t bad_sectors;
>  			int is_bad;
>  
> -			is_bad = is_badblock(rdev, r1_bio->sector, max_sectors,
> +			is_bad = is_badblock(rdev, sector, max_sectors,
>  					     &first_bad, &bad_sectors);
> -			if (is_bad && first_bad <= r1_bio->sector) {
> +			if (is_bad && first_bad <= sector) {
>  				/* Cannot write here at all */
> -				bad_sectors -= (r1_bio->sector - first_bad);
> +				bad_sectors -= (sector - first_bad);
>  				if (bad_sectors < max_sectors)
>  					/* mustn't write more than bad_sectors
>  					 * to other devices yet
>  					 */
>  					max_sectors = bad_sectors;
> -				rdev_dec_pending(rdev, mddev);
>  				continue;

[Severity: High]
This is a pre-existing issue, but does this logic skip the rejection of
REQ_ATOMIC bios if a bad block overlaps the start of the write?

When is_badblock() finds a bad block at or before the write sector, the
code shrinks max_sectors to force a split at the boundary and then calls
continue. Because of the continue, the REQ_ATOMIC bio skips the rejection
logic below.

Could this silently split an atomic write into smaller pieces and submit
them to the healthy mirrors via bio_submit_split_bioset(), violating
block-level atomicity guarantees?

>  			}
>  			if (is_bad) {
[ ... ]
>  				if (bio->bi_opf & REQ_ATOMIC) {
> -					rdev_dec_pending(rdev, mddev);
> -					goto err_handle;
> +					bio->bi_status = BLK_STS_NOTSUPP;
> +					bio_endio(bio);
> +					goto err_dec_pending;
>  				}
>  
> -				good_sectors = first_bad - r1_bio->sector;
> +				good_sectors = first_bad - sector;
>  				if (good_sectors < max_sectors)
>  					max_sectors = good_sectors;
>  			}
>  		}
> +
> +		atomic_inc(&rdev->nr_pending);

[Severity: Critical]
Does moving atomic_inc(&rdev->nr_pending) down open a race window with
hot_remove_disk()?

Because nr_pending acts as an existence lock that prevents the device from
being removed from the array configuration by raid1_remove_conf(),
incrementing it after the is_badblock() and test_bit(WriteErrorSeen, ...)
checks means md_check_recovery() could evaluate rdev_removeable() while
nr_pending is 0 and call hot_remove_disk().

If raid1_remove_conf() unlinks the device by setting
conf->mirrors[i].rdev = NULL, when this thread resumes and the I/O
eventually fails, would the error handling path (handle_write_finished()
calling narrow_write_error()) read conf->mirrors[i].rdev and dereference a
NULL pointer?

Additionally, in clustered configurations, if the unlinked rdev is
immediately freed by md_kick_rdev_from_array(), could this lead to a
use-after-free directly in raid1_write_request() when evaluating
test_bit(WriteErrorSeen, &rdev->flags) or is_badblock()?

>  		r1_bio->bios[i] = bio;
>  	}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260613182810.1317258-1-abd.masalkhi@gmail.com?part=4

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-13 18:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-13 18:28 [PATCH v2 0/4] md/raid1,raid10: fix write-path reference leaks and clean up error handling Abd-Alrhman Masalkhi
2026-06-13 18:28 ` [PATCH v2 1/4] md/raid1: fix writes_pending and barrier reference leaks on write failures Abd-Alrhman Masalkhi
2026-06-13 18:28 ` [PATCH v2 2/4] md/raid10: fix writes_pending leak on write request failures Abd-Alrhman Masalkhi
2026-06-13 18:40   ` sashiko-bot
2026-06-13 18:28 ` [PATCH v2 3/4] md/raid10: fix writes_pending and barrier reference leaks on discard failures Abd-Alrhman Masalkhi
2026-06-13 18:28 ` [PATCH v2 4/4] md/raid1: simplify raid1_write_request() error handling Abd-Alrhman Masalkhi
2026-06-13 18:47   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox