fix ->run failure handling

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* fix ->run failure handling
@ 2024-06-04 17:25 Christoph Hellwig
  2024-06-04 17:25 ` [PATCH 1/2] md/raid0: don't free conf on raid0_run failure Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Christoph Hellwig @ 2024-06-04 17:25 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: linux-raid

Hi Song, hi Yu,

this series fixes problems I found when testing how md handles
trying to creat an array with disks that support incompatible
protection information or a mix of protection information and
not protection information.

Note that this only fixes the oops due to the double frees.  After the
attempt to at the incompatible device the /dev/md0 node is still around
with zero blocks and linked to the the previous devices in sysfs which
is a bit confusing, but unrelated to the resource cleanup touched
here.

raid10 and raid5 also free conf on failure in their ->run handle,
but set mddev->private to NULL which prevents the double free as well
(but requires more code)

Diffstat:
 raid0.c |   21 +++++----------------
 raid1.c |   14 +++-----------
 2 files changed, 8 insertions(+), 27 deletions(-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] md/raid0: don't free conf on raid0_run failure
  2024-06-04 17:25 fix ->run failure handling Christoph Hellwig
@ 2024-06-04 17:25 ` Christoph Hellwig
  2024-06-07  6:08   ` Yu Kuai
  2024-06-04 17:25 ` [PATCH 2/2] md/raid1: " Christoph Hellwig
  2024-06-10 20:58 ` fix ->run failure handling Song Liu
  2 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2024-06-04 17:25 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: linux-raid

The core md code calls the ->free method which already frees conf.

Fixes: 0c031fd37f69 ("md: Move alloc/free acct bioset in to personality")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/raid0.c | 21 +++++----------------
 1 file changed, 5 insertions(+), 16 deletions(-)

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index c5d4aeb68404c9..81c01347cd24e6 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -365,18 +365,13 @@ static sector_t raid0_size(struct mddev *mddev, sector_t sectors, int raid_disks
 	return array_sectors;
 }
 
-static void free_conf(struct mddev *mddev, struct r0conf *conf)
-{
-	kfree(conf->strip_zone);
-	kfree(conf->devlist);
-	kfree(conf);
-}
-
 static void raid0_free(struct mddev *mddev, void *priv)
 {
 	struct r0conf *conf = priv;
 
-	free_conf(mddev, conf);
+	kfree(conf->strip_zone);
+	kfree(conf->devlist);
+	kfree(conf);
 }
 
 static int raid0_set_limits(struct mddev *mddev)
@@ -415,7 +410,7 @@ static int raid0_run(struct mddev *mddev)
 	if (!mddev_is_dm(mddev)) {
 		ret = raid0_set_limits(mddev);
 		if (ret)
-			goto out_free_conf;
+			return ret;
 	}
 
 	/* calculate array device size */
@@ -427,13 +422,7 @@ static int raid0_run(struct mddev *mddev)
 
 	dump_zones(mddev);
 
-	ret = md_integrity_register(mddev);
-	if (ret)
-		goto out_free_conf;
-	return 0;
-out_free_conf:
-	free_conf(mddev, conf);
-	return ret;
+	return md_integrity_register(mddev);
 }
 
 /*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] md/raid1: don't free conf on raid0_run failure
  2024-06-04 17:25 fix ->run failure handling Christoph Hellwig
  2024-06-04 17:25 ` [PATCH 1/2] md/raid0: don't free conf on raid0_run failure Christoph Hellwig
@ 2024-06-04 17:25 ` Christoph Hellwig
  2024-06-10 20:58 ` fix ->run failure handling Song Liu
  2 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2024-06-04 17:25 UTC (permalink / raw)
  To: Song Liu, Yu Kuai; +Cc: linux-raid

The core md code calls the ->free method which already frees conf.

Fixes: 07f1a6850c5d ("md/raid1: fail run raid1 array when active disk less than one")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/raid1.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 7b8a71ca66dde0..1f321826ef02ba 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -3204,7 +3204,6 @@ static int raid1_set_limits(struct mddev *mddev)
 	return queue_limits_set(mddev->gendisk->queue, &lim);
 }
 
-static void raid1_free(struct mddev *mddev, void *priv);
 static int raid1_run(struct mddev *mddev)
 {
 	struct r1conf *conf;
@@ -3238,7 +3237,7 @@ static int raid1_run(struct mddev *mddev)
 	if (!mddev_is_dm(mddev)) {
 		ret = raid1_set_limits(mddev);
 		if (ret)
-			goto abort;
+			return ret;
 	}
 
 	mddev->degraded = 0;
@@ -3252,8 +3251,7 @@ static int raid1_run(struct mddev *mddev)
 	 */
 	if (conf->raid_disks - mddev->degraded < 1) {
 		md_unregister_thread(mddev, &conf->thread);
-		ret = -EINVAL;
-		goto abort;
+		return -EINVAL;
 	}
 
 	if (conf->raid_disks - mddev->degraded == 1)
@@ -3277,14 +3275,8 @@ static int raid1_run(struct mddev *mddev)
 	md_set_array_sectors(mddev, raid1_size(mddev, 0, 0));
 
 	ret = md_integrity_register(mddev);
-	if (ret) {
+	if (ret)
 		md_unregister_thread(mddev, &mddev->thread);
-		goto abort;
-	}
-	return 0;
-
-abort:
-	raid1_free(mddev, conf);
 	return ret;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] md/raid0: don't free conf on raid0_run failure
  2024-06-04 17:25 ` [PATCH 1/2] md/raid0: don't free conf on raid0_run failure Christoph Hellwig
@ 2024-06-07  6:08   ` Yu Kuai
  2024-06-07  6:17     ` Yu Kuai
  0 siblings, 1 reply; 9+ messages in thread
From: Yu Kuai @ 2024-06-07  6:08 UTC (permalink / raw)
  To: Christoph Hellwig, Song Liu; +Cc: linux-raid, yukuai (C)

在 2024/06/05 1:25, Christoph Hellwig 写道:
> The core md code calls the ->free method which already frees conf.
> 
> Fixes: 0c031fd37f69 ("md: Move alloc/free acct bioset in to personality")
> Signed-off-by: Christoph Hellwig <hch@lst.de>

LGTM
Reviewed-by: Yu Kuai <yukuai3@huawei.com>

> ---
>   drivers/md/raid0.c | 21 +++++----------------
>   1 file changed, 5 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index c5d4aeb68404c9..81c01347cd24e6 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -365,18 +365,13 @@ static sector_t raid0_size(struct mddev *mddev, sector_t sectors, int raid_disks
>   	return array_sectors;
>   }
>   
> -static void free_conf(struct mddev *mddev, struct r0conf *conf)
> -{
> -	kfree(conf->strip_zone);
> -	kfree(conf->devlist);
> -	kfree(conf);
> -}
> -
>   static void raid0_free(struct mddev *mddev, void *priv)
>   {
>   	struct r0conf *conf = priv;
>   
> -	free_conf(mddev, conf);
> +	kfree(conf->strip_zone);
> +	kfree(conf->devlist);
> +	kfree(conf);
>   }
>   
>   static int raid0_set_limits(struct mddev *mddev)
> @@ -415,7 +410,7 @@ static int raid0_run(struct mddev *mddev)
>   	if (!mddev_is_dm(mddev)) {
>   		ret = raid0_set_limits(mddev);
>   		if (ret)
> -			goto out_free_conf;
> +			return ret;
>   	}
>   
>   	/* calculate array device size */
> @@ -427,13 +422,7 @@ static int raid0_run(struct mddev *mddev)
>   
>   	dump_zones(mddev);
>   
> -	ret = md_integrity_register(mddev);
> -	if (ret)
> -		goto out_free_conf;
> -	return 0;
> -out_free_conf:
> -	free_conf(mddev, conf);
> -	return ret;
> +	return md_integrity_register(mddev);
>   }
>   
>   /*
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] md/raid0: don't free conf on raid0_run failure
  2024-06-07  6:08   ` Yu Kuai
@ 2024-06-07  6:17     ` Yu Kuai
  2024-06-07  6:19       ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Yu Kuai @ 2024-06-07  6:17 UTC (permalink / raw)
  To: Yu Kuai, Christoph Hellwig, Song Liu; +Cc: linux-raid, yukuai (C)

Hi,

在 2024/06/07 14:08, Yu Kuai 写道:
> 在 2024/06/05 1:25, Christoph Hellwig 写道:
>> The core md code calls the ->free method which already frees conf.
>>
>> Fixes: 0c031fd37f69 ("md: Move alloc/free acct bioset in to personality")
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> LGTM
> Reviewed-by: Yu Kuai <yukuai3@huawei.com>

Sorry that I just noticed that the sysfs api level_store() calls the
pers->run() as well, and it didn't handle failure by pers->free().
It's weried that the api will return success in this case, and after
this patch, it will require __md_stop() to free the conf.

Can you also fix the level_store()? By checking pers->run() and if it
failed, call pers->free() and return error.

Thanks,
Kuai

> 
>> ---
>>   drivers/md/raid0.c | 21 +++++----------------
>>   1 file changed, 5 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
>> index c5d4aeb68404c9..81c01347cd24e6 100644
>> --- a/drivers/md/raid0.c
>> +++ b/drivers/md/raid0.c
>> @@ -365,18 +365,13 @@ static sector_t raid0_size(struct mddev *mddev, 
>> sector_t sectors, int raid_disks
>>       return array_sectors;
>>   }
>> -static void free_conf(struct mddev *mddev, struct r0conf *conf)
>> -{
>> -    kfree(conf->strip_zone);
>> -    kfree(conf->devlist);
>> -    kfree(conf);
>> -}
>> -
>>   static void raid0_free(struct mddev *mddev, void *priv)
>>   {
>>       struct r0conf *conf = priv;
>> -    free_conf(mddev, conf);
>> +    kfree(conf->strip_zone);
>> +    kfree(conf->devlist);
>> +    kfree(conf);
>>   }
>>   static int raid0_set_limits(struct mddev *mddev)
>> @@ -415,7 +410,7 @@ static int raid0_run(struct mddev *mddev)
>>       if (!mddev_is_dm(mddev)) {
>>           ret = raid0_set_limits(mddev);
>>           if (ret)
>> -            goto out_free_conf;
>> +            return ret;
>>       }
>>       /* calculate array device size */
>> @@ -427,13 +422,7 @@ static int raid0_run(struct mddev *mddev)
>>       dump_zones(mddev);
>> -    ret = md_integrity_register(mddev);
>> -    if (ret)
>> -        goto out_free_conf;
>> -    return 0;
>> -out_free_conf:
>> -    free_conf(mddev, conf);
>> -    return ret;
>> +    return md_integrity_register(mddev);
>>   }
>>   /*
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] md/raid0: don't free conf on raid0_run failure
  2024-06-07  6:17     ` Yu Kuai
@ 2024-06-07  6:19       ` Christoph Hellwig
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2024-06-07  6:19 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Christoph Hellwig, Song Liu, linux-raid, yukuai (C)

On Fri, Jun 07, 2024 at 02:17:14PM +0800, Yu Kuai wrote:
> Sorry that I just noticed that the sysfs api level_store() calls the
> pers->run() as well, and it didn't handle failure by pers->free().
> It's weried that the api will return success in this case, and after
> this patch, it will require __md_stop() to free the conf.
>
> Can you also fix the level_store()? By checking pers->run() and if it
> failed, call pers->free() and return error.

I can look into it.  But if we don't have consistent callers anyway
I'd be tempted to not call ->free on ->run failure, as that is a
rather unusual convention.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: fix ->run failure handling
  2024-06-04 17:25 fix ->run failure handling Christoph Hellwig
  2024-06-04 17:25 ` [PATCH 1/2] md/raid0: don't free conf on raid0_run failure Christoph Hellwig
  2024-06-04 17:25 ` [PATCH 2/2] md/raid1: " Christoph Hellwig
@ 2024-06-10 20:58 ` Song Liu
  2024-06-11  6:13   ` Christoph Hellwig
  2 siblings, 1 reply; 9+ messages in thread
From: Song Liu @ 2024-06-10 20:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Yu Kuai, linux-raid

Hi Christoph,

On Tue, Jun 4, 2024 at 10:26 AM Christoph Hellwig <hch@lst.de> wrote:
>
> Hi Song, hi Yu,
>
> this series fixes problems I found when testing how md handles
> trying to creat an array with disks that support incompatible
> protection information or a mix of protection information and
> not protection information.
>
> Note that this only fixes the oops due to the double frees.  After the
> attempt to at the incompatible device the /dev/md0 node is still around
> with zero blocks and linked to the the previous devices in sysfs which
> is a bit confusing, but unrelated to the resource cleanup touched
> here.
>
> raid10 and raid5 also free conf on failure in their ->run handle,
> but set mddev->private to NULL which prevents the double free as well
> (but requires more code)
>
> Diffstat:
>  raid0.c |   21 +++++----------------
>  raid1.c |   14 +++-----------
>  2 files changed, 8 insertions(+), 27 deletions(-)

Thanks for the patchset. I applied it to the md-6.11 branch.

Song

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: fix ->run failure handling
  2024-06-10 20:58 ` fix ->run failure handling Song Liu
@ 2024-06-11  6:13   ` Christoph Hellwig
  2024-06-12 16:36     ` Song Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2024-06-11  6:13 UTC (permalink / raw)
  To: Song Liu; +Cc: Christoph Hellwig, Yu Kuai, linux-raid

On Mon, Jun 10, 2024 at 01:58:17PM -0700, Song Liu wrote:
> > raid10 and raid5 also free conf on failure in their ->run handle,
> > but set mddev->private to NULL which prevents the double free as well
> > (but requires more code)
> >
> > Diffstat:
> >  raid0.c |   21 +++++----------------
> >  raid1.c |   14 +++-----------
> >  2 files changed, 8 insertions(+), 27 deletions(-)
> 
> Thanks for the patchset. I applied it to the md-6.11 branch.

I was planning to respin this to also handle failures in the takeover
path.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: fix ->run failure handling
  2024-06-11  6:13   ` Christoph Hellwig
@ 2024-06-12 16:36     ` Song Liu
  0 siblings, 0 replies; 9+ messages in thread
From: Song Liu @ 2024-06-12 16:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Yu Kuai, linux-raid

Hi Christoph,

On Mon, Jun 10, 2024 at 11:13 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Mon, Jun 10, 2024 at 01:58:17PM -0700, Song Liu wrote:
> > > raid10 and raid5 also free conf on failure in their ->run handle,
> > > but set mddev->private to NULL which prevents the double free as well
> > > (but requires more code)
> > >
> > > Diffstat:
> > >  raid0.c |   21 +++++----------------
> > >  raid1.c |   14 +++-----------
> > >  2 files changed, 8 insertions(+), 27 deletions(-)
> >
> > Thanks for the patchset. I applied it to the md-6.11 branch.
>
> I was planning to respin this to also handle failures in the takeover
> path.

How about we do that in a follow up patch?

Thanks for fixing these issues!

Song

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-06-12 16:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-04 17:25 fix ->run failure handling Christoph Hellwig
2024-06-04 17:25 ` [PATCH 1/2] md/raid0: don't free conf on raid0_run failure Christoph Hellwig
2024-06-07  6:08   ` Yu Kuai
2024-06-07  6:17     ` Yu Kuai
2024-06-07  6:19       ` Christoph Hellwig
2024-06-04 17:25 ` [PATCH 2/2] md/raid1: " Christoph Hellwig
2024-06-10 20:58 ` fix ->run failure handling Song Liu
2024-06-11  6:13   ` Christoph Hellwig
2024-06-12 16:36     ` Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).