linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] Fix race of "mdadm --add" and "mdadm --incremental"
@ 2023-09-07 11:37 Li Xiao Keng
  2023-09-11 16:00 ` Mariusz Tkaczyk
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Li Xiao Keng @ 2023-09-07 11:37 UTC (permalink / raw)
  To: Jes Sorensen, Martin Wilck, Coly Li, linux-raid; +Cc: louhongxiang, miaoguanqin

There is a raid1 with sda and sdb. And we add sdc to this raid,
it may return -EBUSY.

The main process of --add:
1. dev_open(sdc) in Manage_add
2. store_super1(st, di->fd) in write_init_super1
3. fsync(fd) in store_super1
4. close(di->fd) in write_init_super1
5. ioctl(ADD_NEW_DISK)

Step 2 and 3 will add sdc to metadata of raid1. There will be
udev(change of sdc) event after step4. Then "/usr/sbin/mdadm
--incremental --export $devnode --offroot $env{DEVLINKS}"
will be run, and the sdc will be added to the raid1. Then
step 5 will return -EBUSY because it checks if device isn't
claimed in md_import_device()->lock_rdev()->blkdev_get_by_dev()
->blkdev_get().

It will be confusing for users because sdc is added first time.
The "incremental" will get map_lock before add sdc to raid1.
So we add map_lock before write_init_super in "mdadm --add"
to fix the race of "add" and "incremental".

Signed-off-by: Li Xiao Keng <lixiaokeng@huawei.com>
Signed-off-by: Guanqin Miao <miaoguanqin@huawei.com>
---
 Manage.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/Manage.c b/Manage.c
index f997b163..075dd720 100644
--- a/Manage.c
+++ b/Manage.c
@@ -704,6 +704,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 	struct supertype *dev_st;
 	int j;
 	mdu_disk_info_t disc;
+	struct map_ent *map = NULL;

 	if (!get_dev_size(tfd, dv->devname, &ldsize)) {
 		if (dv->disposition == 'M')
@@ -907,6 +908,9 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 		disc.raid_disk = 0;
 	}

+	if (map_lock(&map))
+		pr_err("failed to get exclusive lock on mapfile when add disk\n");
+
 	if (array->not_persistent==0) {
 		int dfd;
 		if (dv->disposition == 'j')
@@ -918,9 +922,9 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 		dfd = dev_open(dv->devname, O_RDWR | O_EXCL|O_DIRECT);
 		if (tst->ss->add_to_super(tst, &disc, dfd,
 					  dv->devname, INVALID_SECTORS))
-			return -1;
+			goto unlock;
 		if (tst->ss->write_init_super(tst))
-			return -1;
+			goto unlock;
 	} else if (dv->disposition == 'A') {
 		/*  this had better be raid1.
 		 * As we are "--re-add"ing we must find a spare slot
@@ -978,14 +982,14 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 			pr_err("add failed for %s: could not get exclusive access to container\n",
 			       dv->devname);
 			tst->ss->free_super(tst);
-			return -1;
+			goto unlock;
 		}

 		/* Check if metadata handler is able to accept the drive */
 		if (!tst->ss->validate_geometry(tst, LEVEL_CONTAINER, 0, 1, NULL,
 		    0, 0, dv->devname, NULL, 0, 1)) {
 			close(container_fd);
-			return -1;
+			goto unlock;
 		}

 		Kill(dv->devname, NULL, 0, -1, 0);
@@ -994,7 +998,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 					  dv->devname, INVALID_SECTORS)) {
 			close(dfd);
 			close(container_fd);
-			return -1;
+			goto unlock;
 		}
 		if (!mdmon_running(tst->container_devnm))
 			tst->ss->sync_metadata(tst);
@@ -1005,7 +1009,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 			       dv->devname);
 			close(container_fd);
 			tst->ss->free_super(tst);
-			return -1;
+			goto unlock;
 		}
 		sra->array.level = LEVEL_CONTAINER;
 		/* Need to set data_offset and component_size */
@@ -1020,7 +1024,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 			pr_err("add new device to external metadata failed for %s\n", dv->devname);
 			close(container_fd);
 			sysfs_free(sra);
-			return -1;
+			goto unlock;
 		}
 		ping_monitor(devnm);
 		sysfs_free(sra);
@@ -1034,7 +1038,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 			else
 				pr_err("add new device failed for %s as %d: %s\n",
 				       dv->devname, j, strerror(errno));
-			return -1;
+			goto unlock;
 		}
 		if (dv->disposition == 'j') {
 			pr_err("Journal added successfully, making %s read-write\n", devname);
@@ -1045,7 +1049,11 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 	}
 	if (verbose >= 0)
 		pr_err("added %s\n", dv->devname);
+	map_unlock(&map);
 	return 1;
+unlock:
+	map_unlock(&map);
+	return -1;
 }

 int Manage_remove(struct supertype *tst, int fd, struct mddev_dev *dv,
-- 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] Fix race of "mdadm --add" and "mdadm --incremental"
  2023-09-07 11:37 [PATCH v3] Fix race of "mdadm --add" and "mdadm --incremental" Li Xiao Keng
@ 2023-09-11 16:00 ` Mariusz Tkaczyk
  2023-09-12  7:18   ` Li Xiao Keng
  2023-10-07  9:26 ` Li Xiao Keng
  2023-10-26 21:42 ` Jes Sorensen
  2 siblings, 1 reply; 6+ messages in thread
From: Mariusz Tkaczyk @ 2023-09-11 16:00 UTC (permalink / raw)
  To: Li Xiao Keng
  Cc: Jes Sorensen, Martin Wilck, Coly Li, linux-raid, louhongxiang,
	miaoguanqin

On Thu, 7 Sep 2023 19:37:44 +0800
Li Xiao Keng <lixiaokeng@huawei.com> wrote:

> There is a raid1 with sda and sdb. And we add sdc to this raid,
> it may return -EBUSY.
> 
> The main process of --add:
> 1. dev_open(sdc) in Manage_add
> 2. store_super1(st, di->fd) in write_init_super1
> 3. fsync(fd) in store_super1
> 4. close(di->fd) in write_init_super1
> 5. ioctl(ADD_NEW_DISK)
> 
> Step 2 and 3 will add sdc to metadata of raid1. There will be
> udev(change of sdc) event after step4. Then "/usr/sbin/mdadm
> --incremental --export $devnode --offroot $env{DEVLINKS}"
> will be run, and the sdc will be added to the raid1. Then
> step 5 will return -EBUSY because it checks if device isn't
> claimed in md_import_device()->lock_rdev()->blkdev_get_by_dev()
> ->blkdev_get().  
> 
> It will be confusing for users because sdc is added first time.
> The "incremental" will get map_lock before add sdc to raid1.
> So we add map_lock before write_init_super in "mdadm --add"
> to fix the race of "add" and "incremental".
> 
> Signed-off-by: Li Xiao Keng <lixiaokeng@huawei.com>
> Signed-off-by: Guanqin Miao <miaoguanqin@huawei.com>

Hello Li Xiao Keng,
Thanks for the patch, I'm trying to get my head around this.
As I understand, you added a mapfile locking and you are failing when the log
is already claimed, but you prefer to return -1 and print error instead of
returning EBUSY, right?

If yes, it has nothing to do with fixing. It is just a obscure hack to make it,
let say more reliable. We can either get EBUSY and know that is expected. I
see no difference, your solution is same confusing as current implementation.

Also, you used map_lock(). I remember that initially I proposed - my bad.
We also rejected udev locking, as a buggy. In fact- I don't see ready
synchronization method to avoid this race in mdadm right now.

The best way, I see is:
1. Write metadata.
2. Give udev chance to process the device.
3. Add it manually if udev failed (or D_NO_UDEV).

If it does not work for you, we need to add synchronization, I know that there
is pidfile used for mdmon, perhaps we can reuse it? We don't need something
fancy, just something to keep all synchronised and block incremental to led
--add finish the action.

We cannot use map_lock() and map_unlock() in this context. It claims lock on
mapfile which describes every array in your system (base details, there is no
info about members), see /var/run/mdadm. Add is just a tiny action to add disk
and we are not going to update the map. Taking this lock will make --add
more vulnerable, for example it will fail because another array is assembled or
created in the same time (I know that it will be hard to achieve but it is
another race - we don't want it).

And we cannot allow for partially done action, you added this lock after
writing metadata (according to description), we are not taking this lock to
complete the action, just to hide race. Lock should be taken before
writing metadata, but we know that it will not hide an issue.

In this for it cannot be merged I don't see a gain, we are returning -1 instead
-22, what is the difference? I don't see error message as more meaningful now.

Thanks,
Mariusz

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] Fix race of "mdadm --add" and "mdadm --incremental"
  2023-09-11 16:00 ` Mariusz Tkaczyk
@ 2023-09-12  7:18   ` Li Xiao Keng
  2023-09-12  7:58     ` Mariusz Tkaczyk
  0 siblings, 1 reply; 6+ messages in thread
From: Li Xiao Keng @ 2023-09-12  7:18 UTC (permalink / raw)
  To: Mariusz Tkaczyk
  Cc: Jes Sorensen, Martin Wilck, Coly Li, linux-raid, louhongxiang,
	miaoguanqin



On 2023/9/12 0:00, Mariusz Tkaczyk wrote:
> On Thu, 7 Sep 2023 19:37:44 +0800
> Li Xiao Keng <lixiaokeng@huawei.com> wrote:
> 
>> There is a raid1 with sda and sdb. And we add sdc to this raid,
>> it may return -EBUSY.
>>
>> The main process of --add:
>> 1. dev_open(sdc) in Manage_add
>> 2. store_super1(st, di->fd) in write_init_super1
>> 3. fsync(fd) in store_super1
>> 4. close(di->fd) in write_init_super1
>> 5. ioctl(ADD_NEW_DISK)
>>
>> Step 2 and 3 will add sdc to metadata of raid1. There will be
>> udev(change of sdc) event after step4. Then "/usr/sbin/mdadm
>> --incremental --export $devnode --offroot $env{DEVLINKS}"
>> will be run, and the sdc will be added to the raid1. Then
>> step 5 will return -EBUSY because it checks if device isn't
>> claimed in md_import_device()->lock_rdev()->blkdev_get_by_dev()
>> ->blkdev_get().  
>>
>> It will be confusing for users because sdc is added first time.
>> The "incremental" will get map_lock before add sdc to raid1.
>> So we add map_lock before write_init_super in "mdadm --add"
>> to fix the race of "add" and "incremental".
>>
>> Signed-off-by: Li Xiao Keng <lixiaokeng@huawei.com>
>> Signed-off-by: Guanqin Miao <miaoguanqin@huawei.com>
> 
> Hello Li Xiao Keng,
> Thanks for the patch, I'm trying to get my head around this.
> As I understand, you added a mapfile locking and you are failing when the log
> is already claimed, but you prefer to return -1 and print error instead of
> returning EBUSY, right?
No, I do not prefer to return -1 and print error instead of returning EBUSY.
At first, I had the same idea as Martin, thinking that map_lock() would fail when
it is locked by someone else.Actually it is not fail but block. So now I just
print error and go ahead if map_lock() fails, like map_lock() elsewhere.
> 
> If yes, it has nothing to do with fixing. It is just a obscure hack to make it,
> let say more reliable. We can either get EBUSY and know that is expected. I
> see no difference, your solution is same confusing as current implementation.
> 
> Also, you used map_lock(). I remember that initially I proposed - my bad.
> We also rejected udev locking, as a buggy. In fact- I don't see ready
> synchronization method to avoid this race in mdadm right now.
> 
> The best way, I see is:
> 1. Write metadata.
> 2. Give udev chance to process the device.
> 3. Add it manually if udev failed (or D_NO_UDEV).
Is there any command to only write metadata?  Or change implementation of --add ?
> 
> If it does not work for you, we need to add synchronization, I know that there
> is pidfile used for mdmon, perhaps we can reuse it? We don't need something
> fancy, just something to keep all synchronised and block incremental to led
> --add finish the action.
> 
> We cannot use map_lock() and map_unlock() in this context. It claims lock on
> mapfile which describes every array in your system (base details, there is no
> info about members), see /var/run/mdadm. Add is just a tiny action to add disk
> and we are not going to update the map. Taking this lock will make --add
> more vulnerable, for example it will fail because another array is assembled or
> created in the same time (I know that it will be hard to achieve but it is
> another race - we don't want it).
When I test this patch, I find map_lock()in --incremental is block but not fail.
So I think it will not fail when another array is assembled.
> 
> And we cannot allow for partially done action, you added this lock after
> writing metadata (according to description), we are not taking this lock to
> complete the action, just to hide race. Lock should be taken before
> writing metadata, but we know that it will not hide an issue.
Here I add map_lock() before writing metadata and ulock it until finish. Do you
mean it should be before attempt_re_add()?
> 
> In this for it cannot be merged I don't see a gain, we are returning -1 instead
> -22, what is the difference? I don't see error message as more meaningful nowThis patch just wants to make "--add" success.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] Fix race of "mdadm --add" and "mdadm --incremental"
  2023-09-12  7:18   ` Li Xiao Keng
@ 2023-09-12  7:58     ` Mariusz Tkaczyk
  0 siblings, 0 replies; 6+ messages in thread
From: Mariusz Tkaczyk @ 2023-09-12  7:58 UTC (permalink / raw)
  To: Li Xiao Keng
  Cc: Jes Sorensen, Martin Wilck, Coly Li, linux-raid, louhongxiang,
	miaoguanqin

On Tue, 12 Sep 2023 15:18:33 +0800
Li Xiao Keng <lixiaokeng@huawei.com> wrote:

> On 2023/9/12 0:00, Mariusz Tkaczyk wrote:
> > On Thu, 7 Sep 2023 19:37:44 +0800
> > Li Xiao Keng <lixiaokeng@huawei.com> wrote:
> >   
> >> There is a raid1 with sda and sdb. And we add sdc to this raid,
> >> it may return -EBUSY.
> >>
> >> The main process of --add:
> >> 1. dev_open(sdc) in Manage_add
> >> 2. store_super1(st, di->fd) in write_init_super1
> >> 3. fsync(fd) in store_super1
> >> 4. close(di->fd) in write_init_super1
> >> 5. ioctl(ADD_NEW_DISK)
> >>
> >> Step 2 and 3 will add sdc to metadata of raid1. There will be
> >> udev(change of sdc) event after step4. Then "/usr/sbin/mdadm
> >> --incremental --export $devnode --offroot $env{DEVLINKS}"
> >> will be run, and the sdc will be added to the raid1. Then
> >> step 5 will return -EBUSY because it checks if device isn't
> >> claimed in md_import_device()->lock_rdev()->blkdev_get_by_dev()  
> >> ->blkdev_get().    
> >>
> >> It will be confusing for users because sdc is added first time.
> >> The "incremental" will get map_lock before add sdc to raid1.
> >> So we add map_lock before write_init_super in "mdadm --add"
> >> to fix the race of "add" and "incremental".
> >>
> >> Signed-off-by: Li Xiao Keng <lixiaokeng@huawei.com>
> >> Signed-off-by: Guanqin Miao <miaoguanqin@huawei.com>  
> > 
> > Hello Li Xiao Keng,
> > Thanks for the patch, I'm trying to get my head around this.
> > As I understand, you added a mapfile locking and you are failing when the
> > log is already claimed, but you prefer to return -1 and print error instead
> > of returning EBUSY, right?  
> No, I do not prefer to return -1 and print error instead of returning EBUSY.
> At first, I had the same idea as Martin, thinking that map_lock() would fail
> when it is locked by someone else.Actually it is not fail but block. So now I
> just print error and go ahead if map_lock() fails, like map_lock() elsewhere.

Ok, thanks now it have more sense!

> > 
> > If yes, it has nothing to do with fixing. It is just a obscure hack to make
> > it, let say more reliable. We can either get EBUSY and know that is
> > expected. I see no difference, your solution is same confusing as current
> > implementation.
> > 
> > Also, you used map_lock(). I remember that initially I proposed - my bad.
> > We also rejected udev locking, as a buggy. In fact- I don't see ready
> > synchronization method to avoid this race in mdadm right now.
> > 
> > The best way, I see is:
> > 1. Write metadata.
> > 2. Give udev chance to process the device.
> > 3. Add it manually if udev failed (or D_NO_UDEV).  
> Is there any command to only write metadata?  Or change implementation of
> --add ?

I think that we should change implementation for --add.

> > 
> > If it does not work for you, we need to add synchronization, I know that
> > there is pidfile used for mdmon, perhaps we can reuse it? We don't need
> > something fancy, just something to keep all synchronised and block
> > incremental to led --add finish the action.
> > 
> > We cannot use map_lock() and map_unlock() in this context. It claims lock on
> > mapfile which describes every array in your system (base details, there is
> > no info about members), see /var/run/mdadm. Add is just a tiny action to
> > add disk and we are not going to update the map. Taking this lock will make
> > --add more vulnerable, for example it will fail because another array is
> > assembled or created in the same time (I know that it will be hard to
> > achieve but it is another race - we don't want it).  
> When I test this patch, I find map_lock()in --incremental is block but not
> fail. So I think it will not fail when another array is assembled.

Oh, it seems that flock() is waiting for the lock to be acquired, it is
blocking function. Now I read the documentation and I confirmed it.

> > 
> > And we cannot allow for partially done action, you added this lock after
> > writing metadata (according to description), we are not taking this lock to
> > complete the action, just to hide race. Lock should be taken before
> > writing metadata, but we know that it will not hide an issue.  
> Here I add map_lock() before writing metadata and ulock it until finish. Do
> you mean it should be before attempt_re_add()?

Indeed you wrote "before write_init_super in "mdadm --add". I was
against taking a map_lock() but now, when I understand that other processes will
wait I see this as a simplest way to keep all synchronized. 

After your clarification- I don't have an objections to take this. Taking
map_lock is not the best approach because we lock all mddevices but I don't see
a strong need to have something better. --add is a user action, low risk of
race with other --add or --assemble/--incremental.

I was against it because I thought that it is not blocking0, i.e. we are failing
if lock cannot be obtained. 

Reviewed-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>

Thanks,
Mariusz

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] Fix race of "mdadm --add" and "mdadm --incremental"
  2023-09-07 11:37 [PATCH v3] Fix race of "mdadm --add" and "mdadm --incremental" Li Xiao Keng
  2023-09-11 16:00 ` Mariusz Tkaczyk
@ 2023-10-07  9:26 ` Li Xiao Keng
  2023-10-26 21:42 ` Jes Sorensen
  2 siblings, 0 replies; 6+ messages in thread
From: Li Xiao Keng @ 2023-10-07  9:26 UTC (permalink / raw)
  To: Jes Sorensen, Martin Wilck, Coly Li, linux-raid; +Cc: louhongxiang, miaoguanqin

ping

On 2023/9/7 19:37, Li Xiao Keng wrote:
> There is a raid1 with sda and sdb. And we add sdc to this raid,
> it may return -EBUSY.
> 
> The main process of --add:
> 1. dev_open(sdc) in Manage_add
> 2. store_super1(st, di->fd) in write_init_super1
> 3. fsync(fd) in store_super1
> 4. close(di->fd) in write_init_super1
> 5. ioctl(ADD_NEW_DISK)
> 
> Step 2 and 3 will add sdc to metadata of raid1. There will be
> udev(change of sdc) event after step4. Then "/usr/sbin/mdadm
> --incremental --export $devnode --offroot $env{DEVLINKS}"
> will be run, and the sdc will be added to the raid1. Then
> step 5 will return -EBUSY because it checks if device isn't
> claimed in md_import_device()->lock_rdev()->blkdev_get_by_dev()
> ->blkdev_get().
> 
> It will be confusing for users because sdc is added first time.
> The "incremental" will get map_lock before add sdc to raid1.
> So we add map_lock before write_init_super in "mdadm --add"
> to fix the race of "add" and "incremental".
> 
> Signed-off-by: Li Xiao Keng <lixiaokeng@huawei.com>
> Signed-off-by: Guanqin Miao <miaoguanqin@huawei.com>
> ---
>  Manage.c | 24 ++++++++++++++++--------
>  1 file changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/Manage.c b/Manage.c
> index f997b163..075dd720 100644
> --- a/Manage.c
> +++ b/Manage.c
> @@ -704,6 +704,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  	struct supertype *dev_st;
>  	int j;
>  	mdu_disk_info_t disc;
> +	struct map_ent *map = NULL;
> 
>  	if (!get_dev_size(tfd, dv->devname, &ldsize)) {
>  		if (dv->disposition == 'M')
> @@ -907,6 +908,9 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  		disc.raid_disk = 0;
>  	}
> 
> +	if (map_lock(&map))
> +		pr_err("failed to get exclusive lock on mapfile when add disk\n");
> +
>  	if (array->not_persistent==0) {
>  		int dfd;
>  		if (dv->disposition == 'j')
> @@ -918,9 +922,9 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  		dfd = dev_open(dv->devname, O_RDWR | O_EXCL|O_DIRECT);
>  		if (tst->ss->add_to_super(tst, &disc, dfd,
>  					  dv->devname, INVALID_SECTORS))
> -			return -1;
> +			goto unlock;
>  		if (tst->ss->write_init_super(tst))
> -			return -1;
> +			goto unlock;
>  	} else if (dv->disposition == 'A') {
>  		/*  this had better be raid1.
>  		 * As we are "--re-add"ing we must find a spare slot
> @@ -978,14 +982,14 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  			pr_err("add failed for %s: could not get exclusive access to container\n",
>  			       dv->devname);
>  			tst->ss->free_super(tst);
> -			return -1;
> +			goto unlock;
>  		}
> 
>  		/* Check if metadata handler is able to accept the drive */
>  		if (!tst->ss->validate_geometry(tst, LEVEL_CONTAINER, 0, 1, NULL,
>  		    0, 0, dv->devname, NULL, 0, 1)) {
>  			close(container_fd);
> -			return -1;
> +			goto unlock;
>  		}
> 
>  		Kill(dv->devname, NULL, 0, -1, 0);
> @@ -994,7 +998,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  					  dv->devname, INVALID_SECTORS)) {
>  			close(dfd);
>  			close(container_fd);
> -			return -1;
> +			goto unlock;
>  		}
>  		if (!mdmon_running(tst->container_devnm))
>  			tst->ss->sync_metadata(tst);
> @@ -1005,7 +1009,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  			       dv->devname);
>  			close(container_fd);
>  			tst->ss->free_super(tst);
> -			return -1;
> +			goto unlock;
>  		}
>  		sra->array.level = LEVEL_CONTAINER;
>  		/* Need to set data_offset and component_size */
> @@ -1020,7 +1024,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  			pr_err("add new device to external metadata failed for %s\n", dv->devname);
>  			close(container_fd);
>  			sysfs_free(sra);
> -			return -1;
> +			goto unlock;
>  		}
>  		ping_monitor(devnm);
>  		sysfs_free(sra);
> @@ -1034,7 +1038,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  			else
>  				pr_err("add new device failed for %s as %d: %s\n",
>  				       dv->devname, j, strerror(errno));
> -			return -1;
> +			goto unlock;
>  		}
>  		if (dv->disposition == 'j') {
>  			pr_err("Journal added successfully, making %s read-write\n", devname);
> @@ -1045,7 +1049,11 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  	}
>  	if (verbose >= 0)
>  		pr_err("added %s\n", dv->devname);
> +	map_unlock(&map);
>  	return 1;
> +unlock:
> +	map_unlock(&map);
> +	return -1;
>  }
> 
>  int Manage_remove(struct supertype *tst, int fd, struct mddev_dev *dv,
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] Fix race of "mdadm --add" and "mdadm --incremental"
  2023-09-07 11:37 [PATCH v3] Fix race of "mdadm --add" and "mdadm --incremental" Li Xiao Keng
  2023-09-11 16:00 ` Mariusz Tkaczyk
  2023-10-07  9:26 ` Li Xiao Keng
@ 2023-10-26 21:42 ` Jes Sorensen
  2 siblings, 0 replies; 6+ messages in thread
From: Jes Sorensen @ 2023-10-26 21:42 UTC (permalink / raw)
  To: Li Xiao Keng, Martin Wilck, Coly Li, linux-raid; +Cc: louhongxiang, miaoguanqin

On 9/7/23 07:37, Li Xiao Keng wrote:
> There is a raid1 with sda and sdb. And we add sdc to this raid,
> it may return -EBUSY.
> 
> The main process of --add:
> 1. dev_open(sdc) in Manage_add
> 2. store_super1(st, di->fd) in write_init_super1
> 3. fsync(fd) in store_super1
> 4. close(di->fd) in write_init_super1
> 5. ioctl(ADD_NEW_DISK)
> 
> Step 2 and 3 will add sdc to metadata of raid1. There will be
> udev(change of sdc) event after step4. Then "/usr/sbin/mdadm
> --incremental --export $devnode --offroot $env{DEVLINKS}"
> will be run, and the sdc will be added to the raid1. Then
> step 5 will return -EBUSY because it checks if device isn't
> claimed in md_import_device()->lock_rdev()->blkdev_get_by_dev()
> ->blkdev_get().
> 
> It will be confusing for users because sdc is added first time.
> The "incremental" will get map_lock before add sdc to raid1.
> So we add map_lock before write_init_super in "mdadm --add"
> to fix the race of "add" and "incremental".
> 
> Signed-off-by: Li Xiao Keng <lixiaokeng@huawei.com>
> Signed-off-by: Guanqin Miao <miaoguanqin@huawei.com>

Applied!

Thanks,
Jes




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-10-26 21:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-07 11:37 [PATCH v3] Fix race of "mdadm --add" and "mdadm --incremental" Li Xiao Keng
2023-09-11 16:00 ` Mariusz Tkaczyk
2023-09-12  7:18   ` Li Xiao Keng
2023-09-12  7:58     ` Mariusz Tkaczyk
2023-10-07  9:26 ` Li Xiao Keng
2023-10-26 21:42 ` Jes Sorensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).