* [PATCH] raid1: fix nr_pending leak in REQ_ATOMIC bad-block error path
@ 2026-05-30 15:14 Abd-Alrhman Masalkhi
2026-05-31 10:21 ` Yu Kuai
2026-06-01 8:43 ` John Garry
0 siblings, 2 replies; 6+ messages in thread
From: Abd-Alrhman Masalkhi @ 2026-05-30 15:14 UTC (permalink / raw)
To: song, yukuai, linan122, john.g.garry, martin.petersen, axboe
Cc: linux-raid, linux-kernel, Abd-Alrhman Masalkhi
In raid1_write_request(), each per-mirror loop iteration begins by
incrementing rdev->nr_pending. If a REQ_ATOMIC write encounters a
badblock within the requested range, the code jumps to err_handle
without dropping the reference taken for the current mirror.
err_handle's cleanup loop will only decrements for k < i and
r1_bio->bios[k] is non-NULL. The current slot is therefore skipped,
leaving its nr_pending reference leaked permanently. The reference
prevents the rdev from ever being removed, since raid1_remove_conf()
refuses to remove an rdev with nr_pending > 0.
Fix this by calling rdev_dec_pending() before jumping to err_handle.
Fixes: f2a38abf5f1c ("md/raid1: Atomic write support")
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
---
drivers/md/raid1.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 181400e147c0..0084bbc24076 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1580,8 +1580,10 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
* complexity of supporting that is not worth
* the benefit.
*/
- if (bio->bi_opf & REQ_ATOMIC)
+ if (bio->bi_opf & REQ_ATOMIC) {
+ rdev_dec_pending(rdev, mddev);
goto err_handle;
+ }
good_sectors = first_bad - r1_bio->sector;
if (good_sectors < max_sectors)
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH] raid1: fix nr_pending leak in REQ_ATOMIC bad-block error path
2026-05-30 15:14 [PATCH] raid1: fix nr_pending leak in REQ_ATOMIC bad-block error path Abd-Alrhman Masalkhi
@ 2026-05-31 10:21 ` Yu Kuai
2026-06-01 8:43 ` John Garry
1 sibling, 0 replies; 6+ messages in thread
From: Yu Kuai @ 2026-05-31 10:21 UTC (permalink / raw)
To: Abd-Alrhman Masalkhi, song, yukuai, john.g.garry, martin.petersen,
axboe
Cc: linux-raid, linux-kernel
在 2026/5/30 23:14, Abd-Alrhman Masalkhi 写道:
> In raid1_write_request(), each per-mirror loop iteration begins by
> incrementing rdev->nr_pending. If a REQ_ATOMIC write encounters a
> badblock within the requested range, the code jumps to err_handle
> without dropping the reference taken for the current mirror.
>
> err_handle's cleanup loop will only decrements for k < i and
> r1_bio->bios[k] is non-NULL. The current slot is therefore skipped,
> leaving its nr_pending reference leaked permanently. The reference
> prevents the rdev from ever being removed, since raid1_remove_conf()
> refuses to remove an rdev with nr_pending > 0.
>
> Fix this by calling rdev_dec_pending() before jumping to err_handle.
>
> Fixes: f2a38abf5f1c ("md/raid1: Atomic write support")
> Signed-off-by: Abd-Alrhman Masalkhi<abd.masalkhi@gmail.com>
> ---
> drivers/md/raid1.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
Applied to md-7.2
--
Thansk,
Kuai
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] raid1: fix nr_pending leak in REQ_ATOMIC bad-block error path
2026-05-30 15:14 [PATCH] raid1: fix nr_pending leak in REQ_ATOMIC bad-block error path Abd-Alrhman Masalkhi
2026-05-31 10:21 ` Yu Kuai
@ 2026-06-01 8:43 ` John Garry
2026-06-01 9:03 ` Abd-Alrhman Masalkhi
1 sibling, 1 reply; 6+ messages in thread
From: John Garry @ 2026-06-01 8:43 UTC (permalink / raw)
To: Abd-Alrhman Masalkhi, song, yukuai, linan122, martin.petersen,
axboe
Cc: linux-raid, linux-kernel
On 30/05/2026 16:14, Abd-Alrhman Masalkhi wrote:
> In raid1_write_request(), each per-mirror loop iteration begins by
> incrementing rdev->nr_pending. If a REQ_ATOMIC write encounters a
> badblock within the requested range, the code jumps to err_handle
> without dropping the reference taken for the current mirror.
>
> err_handle's cleanup loop will only decrements for k < i and
> r1_bio->bios[k] is non-NULL. The current slot is therefore skipped,
> leaving its nr_pending reference leaked permanently. The reference
> prevents the rdev from ever being removed, since raid1_remove_conf()
> refuses to remove an rdev with nr_pending > 0.
>
> Fix this by calling rdev_dec_pending() before jumping to err_handle.
>
> Fixes: f2a38abf5f1c ("md/raid1: Atomic write support")
> Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
FWIW,
Reviewed-by: John Garry <john.g.garry@oracle.com>
> ---
> drivers/md/raid1.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 181400e147c0..0084bbc24076 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1580,8 +1580,10 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
> * complexity of supporting that is not worth
> * the benefit.
> */
> - if (bio->bi_opf & REQ_ATOMIC)
> + if (bio->bi_opf & REQ_ATOMIC) {
> + rdev_dec_pending(rdev, mddev);
It's not so nice that we have 2x locations that does the
rdev_dec_pending work
> goto err_handle;
> + }
>
> good_sectors = first_bad - r1_bio->sector;
> if (good_sectors < max_sectors)
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] raid1: fix nr_pending leak in REQ_ATOMIC bad-block error path
2026-06-01 8:43 ` John Garry
@ 2026-06-01 9:03 ` Abd-Alrhman Masalkhi
2026-06-01 9:05 ` John Garry
0 siblings, 1 reply; 6+ messages in thread
From: Abd-Alrhman Masalkhi @ 2026-06-01 9:03 UTC (permalink / raw)
To: John Garry, song, yukuai, linan122, martin.petersen, axboe
Cc: linux-raid, linux-kernel
hi,
Thank you for the feedback.
On Mon, Jun 01, 2026 at 09:43 +0100, John Garry wrote:
> On 30/05/2026 16:14, Abd-Alrhman Masalkhi wrote:
>> In raid1_write_request(), each per-mirror loop iteration begins by
>> incrementing rdev->nr_pending. If a REQ_ATOMIC write encounters a
>> badblock within the requested range, the code jumps to err_handle
>> without dropping the reference taken for the current mirror.
>>
>> err_handle's cleanup loop will only decrements for k < i and
>> r1_bio->bios[k] is non-NULL. The current slot is therefore skipped,
>> leaving its nr_pending reference leaked permanently. The reference
>> prevents the rdev from ever being removed, since raid1_remove_conf()
>> refuses to remove an rdev with nr_pending > 0.
>>
>> Fix this by calling rdev_dec_pending() before jumping to err_handle.
>>
>> Fixes: f2a38abf5f1c ("md/raid1: Atomic write support")
>> Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@gmail.com>
>
> FWIW,
>
> Reviewed-by: John Garry <john.g.garry@oracle.com>
>
>> ---
>> drivers/md/raid1.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index 181400e147c0..0084bbc24076 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -1580,8 +1580,10 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>> * complexity of supporting that is not worth
>> * the benefit.
>> */
>> - if (bio->bi_opf & REQ_ATOMIC)
>> + if (bio->bi_opf & REQ_ATOMIC) {
>> + rdev_dec_pending(rdev, mddev);
>
> It's not so nice that we have 2x locations that does the
> rdev_dec_pending work
>
Are you suggesting deferring atomic_inc(&rdev->nr_pending) until after
the if (test_bit(WriteErrorSeen, &rdev->flags)) {..} block? The patch
is already in md-7.2; should I send a separate cleanup patch?
>> goto err_handle;
>> + }
>>
>> good_sectors = first_bad - r1_bio->sector;
>> if (good_sectors < max_sectors)
>
--
Best Regards,
Abd-Alrhman
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] raid1: fix nr_pending leak in REQ_ATOMIC bad-block error path
2026-06-01 9:03 ` Abd-Alrhman Masalkhi
@ 2026-06-01 9:05 ` John Garry
2026-06-01 9:13 ` Abd-Alrhman Masalkhi
0 siblings, 1 reply; 6+ messages in thread
From: John Garry @ 2026-06-01 9:05 UTC (permalink / raw)
To: Abd-Alrhman Masalkhi, song, yukuai, linan122, martin.petersen,
axboe
Cc: linux-raid, linux-kernel
On 01/06/2026 10:03, Abd-Alrhman Masalkhi wrote:
>>> +++ b/drivers/md/raid1.c
>>> @@ -1580,8 +1580,10 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>>> * complexity of supporting that is not worth
>>> * the benefit.
>>> */
>>> - if (bio->bi_opf & REQ_ATOMIC)
>>> + if (bio->bi_opf & REQ_ATOMIC) {
>>> + rdev_dec_pending(rdev, mddev);
>> It's not so nice that we have 2x locations that does the
>> rdev_dec_pending work
>>
> Are you suggesting deferring atomic_inc(&rdev->nr_pending) until after
> the if (test_bit(WriteErrorSeen, &rdev->flags)) {..} block? The patch
> is already in md-7.2; should I send a separate cleanup patch?
I'm not suggesting any further change. I am just mentioning that it is
unfortunate that we have 2x locations which does the decrement, which
makes error handling harder to follow.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] raid1: fix nr_pending leak in REQ_ATOMIC bad-block error path
2026-06-01 9:05 ` John Garry
@ 2026-06-01 9:13 ` Abd-Alrhman Masalkhi
0 siblings, 0 replies; 6+ messages in thread
From: Abd-Alrhman Masalkhi @ 2026-06-01 9:13 UTC (permalink / raw)
To: John Garry, song, yukuai, linan122, martin.petersen, axboe
Cc: linux-raid, linux-kernel
hi,
On Mon, Jun 01, 2026 at 10:05 +0100, John Garry wrote:
> On 01/06/2026 10:03, Abd-Alrhman Masalkhi wrote:
>>>> +++ b/drivers/md/raid1.c
>>>> @@ -1580,8 +1580,10 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>>>> * complexity of supporting that is not worth
>>>> * the benefit.
>>>> */
>>>> - if (bio->bi_opf & REQ_ATOMIC)
>>>> + if (bio->bi_opf & REQ_ATOMIC) {
>>>> + rdev_dec_pending(rdev, mddev);
>>> It's not so nice that we have 2x locations that does the
>>> rdev_dec_pending work
>>>
>> Are you suggesting deferring atomic_inc(&rdev->nr_pending) until after
>> the if (test_bit(WriteErrorSeen, &rdev->flags)) {..} block? The patch
>> is already in md-7.2; should I send a separate cleanup patch?
>
> I'm not suggesting any further change. I am just mentioning that it is
> unfortunate that we have 2x locations which does the decrement, which
> makes error handling harder to follow.
You are absolutely right. Having two decrement paths makes the error
handling harder to follow. Thanks for pointing that out.
--
Best Regards,
Abd-Alrhman
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-06-01 9:13 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-30 15:14 [PATCH] raid1: fix nr_pending leak in REQ_ATOMIC bad-block error path Abd-Alrhman Masalkhi
2026-05-31 10:21 ` Yu Kuai
2026-06-01 8:43 ` John Garry
2026-06-01 9:03 ` Abd-Alrhman Masalkhi
2026-06-01 9:05 ` John Garry
2026-06-01 9:13 ` Abd-Alrhman Masalkhi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox