[PATCH v2] Fix race of "mdadm --add" and "mdadm --incremental"

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] Fix race of "mdadm --add" and "mdadm --incremental"
@ 2023-09-05 12:02 Li Xiao Keng
  2023-09-05 15:07 ` Martin Wilck
  2023-09-05 16:17 ` Coly Li
  0 siblings, 2 replies; 8+ messages in thread
From: Li Xiao Keng @ 2023-09-05 12:02 UTC (permalink / raw)
  To: Jes Sorensen, Martin Wilck, Coly Li, linux-raid; +Cc: louhongxiang, miaoguanqin

When we add a new disk to a raid, it may return -EBUSY.

The main process of --add:
1. dev_open
2. store_super1(st, di->fd) in write_init_super1
3. fsync(di->fd) in write_init_super1
4. close(di->fd)
5. ioctl(ADD_NEW_DISK)

However, there will be some udev(change) event after step4. Then
"/usr/sbin/mdadm --incremental ..." will be run, and the new disk
will be add to md device. After that, ioctl will return -EBUSY.

Here we add map_lock before write_init_super in "mdadm --add"
to fix this race.

Signed-off-by: Li Xiao Keng <lixiaokeng@huawei.com>
Signed-off-by: Guanqin Miao <miaoguanqin@huawei.com>
---
 Assemble.c |  5 ++++-
 Manage.c   | 25 +++++++++++++++++--------
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/Assemble.c b/Assemble.c
index 49804941..086890ed 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -1479,8 +1479,11 @@ try_again:
 	 * to our list.  We flag them so that we don't try to re-add,
 	 * but can remove if they turn out to not be wanted.
 	 */
-	if (map_lock(&map))
+	if (map_lock(&map)) {
 		pr_err("failed to get exclusive lock on mapfile - continue anyway...\n");
+		return 1;
+	}
+
 	if (c->update == UOPT_UUID)
 		mp = NULL;
 	else
diff --git a/Manage.c b/Manage.c
index f54de7c6..6a101bae 100644
--- a/Manage.c
+++ b/Manage.c
@@ -703,6 +703,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 	struct supertype *dev_st;
 	int j;
 	mdu_disk_info_t disc;
+	struct map_ent *map = NULL;

 	if (!get_dev_size(tfd, dv->devname, &ldsize)) {
 		if (dv->disposition == 'M')
@@ -900,6 +901,10 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 		disc.raid_disk = 0;
 	}

+	if (map_lock(&map)) {
+		pr_err("failed to get exclusive lock on mapfile when add disk\n");
+		return -1;
+	}
 	if (array->not_persistent==0) {
 		int dfd;
 		if (dv->disposition == 'j')
@@ -911,9 +916,9 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 		dfd = dev_open(dv->devname, O_RDWR | O_EXCL|O_DIRECT);
 		if (tst->ss->add_to_super(tst, &disc, dfd,
 					  dv->devname, INVALID_SECTORS))
-			return -1;
+			goto unlock;
 		if (tst->ss->write_init_super(tst))
-			return -1;
+			goto unlock;
 	} else if (dv->disposition == 'A') {
 		/*  this had better be raid1.
 		 * As we are "--re-add"ing we must find a spare slot
@@ -971,14 +976,14 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 			pr_err("add failed for %s: could not get exclusive access to container\n",
 			       dv->devname);
 			tst->ss->free_super(tst);
-			return -1;
+			goto unlock;
 		}

 		/* Check if metadata handler is able to accept the drive */
 		if (!tst->ss->validate_geometry(tst, LEVEL_CONTAINER, 0, 1, NULL,
 		    0, 0, dv->devname, NULL, 0, 1)) {
 			close(container_fd);
-			return -1;
+			goto unlock;
 		}

 		Kill(dv->devname, NULL, 0, -1, 0);
@@ -987,7 +992,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 					  dv->devname, INVALID_SECTORS)) {
 			close(dfd);
 			close(container_fd);
-			return -1;
+			goto unlock;
 		}
 		if (!mdmon_running(tst->container_devnm))
 			tst->ss->sync_metadata(tst);
@@ -998,7 +1003,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 			       dv->devname);
 			close(container_fd);
 			tst->ss->free_super(tst);
-			return -1;
+			goto unlock;
 		}
 		sra->array.level = LEVEL_CONTAINER;
 		/* Need to set data_offset and component_size */
@@ -1013,7 +1018,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 			pr_err("add new device to external metadata failed for %s\n", dv->devname);
 			close(container_fd);
 			sysfs_free(sra);
-			return -1;
+			goto unlock;
 		}
 		ping_monitor(devnm);
 		sysfs_free(sra);
@@ -1027,7 +1032,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 			else
 				pr_err("add new device failed for %s as %d: %s\n",
 				       dv->devname, j, strerror(errno));
-			return -1;
+			goto unlock;
 		}
 		if (dv->disposition == 'j') {
 			pr_err("Journal added successfully, making %s read-write\n", devname);
@@ -1038,7 +1043,11 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
 	}
 	if (verbose >= 0)
 		pr_err("added %s\n", dv->devname);
+	map_unlock(&map);
 	return 1;
+unlock:
+	map_unlock(&map);
+	return -1;
 }

 int Manage_remove(struct supertype *tst, int fd, struct mddev_dev *dv,

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] Fix race of "mdadm --add" and "mdadm --incremental"
  2023-09-05 12:02 [PATCH v2] Fix race of "mdadm --add" and "mdadm --incremental" Li Xiao Keng
@ 2023-09-05 15:07 ` Martin Wilck
  2023-09-05 16:17 ` Coly Li
  1 sibling, 0 replies; 8+ messages in thread
From: Martin Wilck @ 2023-09-05 15:07 UTC (permalink / raw)
  To: Li Xiao Keng, Jes Sorensen, Coly Li, linux-raid; +Cc: louhongxiang, miaoguanqin

On Tue, 2023-09-05 at 20:02 +0800, Li Xiao Keng wrote:
> When we add a new disk to a raid, it may return -EBUSY.
> 
> The main process of --add:
> 1. dev_open
> 2. store_super1(st, di->fd) in write_init_super1
> 3. fsync(di->fd) in write_init_super1
> 4. close(di->fd)
> 5. ioctl(ADD_NEW_DISK)
> 
> However, there will be some udev(change) event after step4. Then
> "/usr/sbin/mdadm --incremental ..." will be run, and the new disk
> will be add to md device. After that, ioctl will return -EBUSY.
> 
> Here we add map_lock before write_init_super in "mdadm --add"
> to fix this race.
> 
> Signed-off-by: Li Xiao Keng <lixiaokeng@huawei.com>
> Signed-off-by: Guanqin Miao <miaoguanqin@huawei.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>

> ---
>  Assemble.c |  5 ++++-
>  Manage.c   | 25 +++++++++++++++++--------
>  2 files changed, 21 insertions(+), 9 deletions(-)
> 
> diff --git a/Assemble.c b/Assemble.c
> index 49804941..086890ed 100644
> --- a/Assemble.c
> +++ b/Assemble.c
> @@ -1479,8 +1479,11 @@ try_again:
>          * to our list.  We flag them so that we don't try to re-add,
>          * but can remove if they turn out to not be wanted.
>          */
> -       if (map_lock(&map))
> +       if (map_lock(&map)) {
>                 pr_err("failed to get exclusive lock on mapfile -
> continue anyway...\n");
> +               return 1;
> +       }
> +
>         if (c->update == UOPT_UUID)
>                 mp = NULL;
>         else
> diff --git a/Manage.c b/Manage.c
> index f54de7c6..6a101bae 100644
> --- a/Manage.c
> +++ b/Manage.c
> @@ -703,6 +703,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev
> *dv,
>         struct supertype *dev_st;
>         int j;
>         mdu_disk_info_t disc;
> +       struct map_ent *map = NULL;
> 
>         if (!get_dev_size(tfd, dv->devname, &ldsize)) {
>                 if (dv->disposition == 'M')
> @@ -900,6 +901,10 @@ int Manage_add(int fd, int tfd, struct mddev_dev
> *dv,
>                 disc.raid_disk = 0;
>         }
> 
> +       if (map_lock(&map)) {
> +               pr_err("failed to get exclusive lock on mapfile when
> add disk\n");
> +               return -1;
> +       }
>         if (array->not_persistent==0) {
>                 int dfd;
>                 if (dv->disposition == 'j')
> @@ -911,9 +916,9 @@ int Manage_add(int fd, int tfd, struct mddev_dev
> *dv,
>                 dfd = dev_open(dv->devname, O_RDWR |
> O_EXCL|O_DIRECT);
>                 if (tst->ss->add_to_super(tst, &disc, dfd,
>                                           dv->devname,
> INVALID_SECTORS))
> -                       return -1;
> +                       goto unlock;
>                 if (tst->ss->write_init_super(tst))
> -                       return -1;
> +                       goto unlock;
>         } else if (dv->disposition == 'A') {
>                 /*  this had better be raid1.
>                  * As we are "--re-add"ing we must find a spare slot
> @@ -971,14 +976,14 @@ int Manage_add(int fd, int tfd, struct
> mddev_dev *dv,
>                         pr_err("add failed for %s: could not get
> exclusive access to container\n",
>                                dv->devname);
>                         tst->ss->free_super(tst);
> -                       return -1;
> +                       goto unlock;
>                 }
> 
>                 /* Check if metadata handler is able to accept the
> drive */
>                 if (!tst->ss->validate_geometry(tst, LEVEL_CONTAINER,
> 0, 1, NULL,
>                     0, 0, dv->devname, NULL, 0, 1)) {
>                         close(container_fd);
> -                       return -1;
> +                       goto unlock;
>                 }
> 
>                 Kill(dv->devname, NULL, 0, -1, 0);
> @@ -987,7 +992,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev
> *dv,
>                                           dv->devname,
> INVALID_SECTORS)) {
>                         close(dfd);
>                         close(container_fd);
> -                       return -1;
> +                       goto unlock;
>                 }
>                 if (!mdmon_running(tst->container_devnm))
>                         tst->ss->sync_metadata(tst);
> @@ -998,7 +1003,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev
> *dv,
>                                dv->devname);
>                         close(container_fd);
>                         tst->ss->free_super(tst);
> -                       return -1;
> +                       goto unlock;
>                 }
>                 sra->array.level = LEVEL_CONTAINER;
>                 /* Need to set data_offset and component_size */
> @@ -1013,7 +1018,7 @@ int Manage_add(int fd, int tfd, struct
> mddev_dev *dv,
>                         pr_err("add new device to external metadata
> failed for %s\n", dv->devname);
>                         close(container_fd);
>                         sysfs_free(sra);
> -                       return -1;
> +                       goto unlock;
>                 }
>                 ping_monitor(devnm);
>                 sysfs_free(sra);
> @@ -1027,7 +1032,7 @@ int Manage_add(int fd, int tfd, struct
> mddev_dev *dv,
>                         else
>                                 pr_err("add new device failed for %s
> as %d: %s\n",
>                                        dv->devname, j,
> strerror(errno));
> -                       return -1;
> +                       goto unlock;
>                 }
>                 if (dv->disposition == 'j') {
>                         pr_err("Journal added successfully, making %s
> read-write\n", devname);
> @@ -1038,7 +1043,11 @@ int Manage_add(int fd, int tfd, struct
> mddev_dev *dv,
>         }
>         if (verbose >= 0)
>                 pr_err("added %s\n", dv->devname);
> +       map_unlock(&map);
>         return 1;
> +unlock:
> +       map_unlock(&map);
> +       return -1;
>  }
> 
>  int Manage_remove(struct supertype *tst, int fd, struct mddev_dev
> *dv,


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] Fix race of "mdadm --add" and "mdadm --incremental"
  2023-09-05 12:02 [PATCH v2] Fix race of "mdadm --add" and "mdadm --incremental" Li Xiao Keng
  2023-09-05 15:07 ` Martin Wilck
@ 2023-09-05 16:17 ` Coly Li
  2023-09-05 19:08   ` Martin Wilck
  1 sibling, 1 reply; 8+ messages in thread
From: Coly Li @ 2023-09-05 16:17 UTC (permalink / raw)
  To: Li Xiao Keng
  Cc: Jes Sorensen, Martin Wilck, linux-raid, louhongxiang, miaoguanqin

Hi Xiao Keng,

Thanks for the updated version, I add my comments inline.

On Tue, Sep 05, 2023 at 08:02:06PM +0800, Li Xiao Keng wrote:
> When we add a new disk to a raid, it may return -EBUSY.

Where is above -EBUSY from? do you mean mdadm command returns
-EBUSY or it is returned by some specific function in mdadm
source code.

> 
> The main process of --add:
> 1. dev_open
> 2. store_super1(st, di->fd) in write_init_super1
> 3. fsync(di->fd) in write_init_super1
> 4. close(di->fd)
> 5. ioctl(ADD_NEW_DISK)
> 
> However, there will be some udev(change) event after step4. Then
> "/usr/sbin/mdadm --incremental ..." will be run, and the new disk
> will be add to md device. After that, ioctl will return -EBUSY.
>

Dose returning -EBUSY hurt anything? Or only returns -EBUSY and
other stuffs all work as expected?

 
> Here we add map_lock before write_init_super in "mdadm --add"
> to fix this race.
> 

I am not familiar this part of code, but I see ignoring the failure
from map_lock() in Assemble() is on purpose by Neil. Therefore I
just guess simply return from Assemble when map_lock() fails might
not be what you wanted.


> Signed-off-by: Li Xiao Keng <lixiaokeng@huawei.com>
> Signed-off-by: Guanqin Miao <miaoguanqin@huawei.com>
> ---
>  Assemble.c |  5 ++++-
>  Manage.c   | 25 +++++++++++++++++--------
>  2 files changed, 21 insertions(+), 9 deletions(-)
> 
> diff --git a/Assemble.c b/Assemble.c
> index 49804941..086890ed 100644
> --- a/Assemble.c
> +++ b/Assemble.c
> @@ -1479,8 +1479,11 @@ try_again:
>  	 * to our list.  We flag them so that we don't try to re-add,
>  	 * but can remove if they turn out to not be wanted.
>  	 */
> -	if (map_lock(&map))
> +	if (map_lock(&map)) {
>  		pr_err("failed to get exclusive lock on mapfile - continue anyway...\n");
> +		return 1;

Especially when the error message noticed "continue anyway" but a return 1
followed, the behavior might be still confusing.

> +	}
> +
>  	if (c->update == UOPT_UUID)
>  		mp = NULL;
>  	else
> diff --git a/Manage.c b/Manage.c
> index f54de7c6..6a101bae 100644
> --- a/Manage.c
> +++ b/Manage.c
> @@ -703,6 +703,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  	struct supertype *dev_st;
>  	int j;
>  	mdu_disk_info_t disc;
> +	struct map_ent *map = NULL;
> 
>  	if (!get_dev_size(tfd, dv->devname, &ldsize)) {
>  		if (dv->disposition == 'M')
> @@ -900,6 +901,10 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  		disc.raid_disk = 0;
>  	}
> 
> +	if (map_lock(&map)) {
> +		pr_err("failed to get exclusive lock on mapfile when add disk\n");
> +		return -1;
> +	}
>  	if (array->not_persistent==0) {
>  		int dfd;
>  		if (dv->disposition == 'j')
> @@ -911,9 +916,9 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  		dfd = dev_open(dv->devname, O_RDWR | O_EXCL|O_DIRECT);
>  		if (tst->ss->add_to_super(tst, &disc, dfd,
>  					  dv->devname, INVALID_SECTORS))
> -			return -1;
> +			goto unlock;
>  		if (tst->ss->write_init_super(tst))
> -			return -1;
> +			goto unlock;
>  	} else if (dv->disposition == 'A') {
>  		/*  this had better be raid1.
>  		 * As we are "--re-add"ing we must find a spare slot
> @@ -971,14 +976,14 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  			pr_err("add failed for %s: could not get exclusive access to container\n",
>  			       dv->devname);
>  			tst->ss->free_super(tst);
> -			return -1;
> +			goto unlock;
>  		}
> 
>  		/* Check if metadata handler is able to accept the drive */
>  		if (!tst->ss->validate_geometry(tst, LEVEL_CONTAINER, 0, 1, NULL,
>  		    0, 0, dv->devname, NULL, 0, 1)) {
>  			close(container_fd);
> -			return -1;
> +			goto unlock;
>  		}
> 
>  		Kill(dv->devname, NULL, 0, -1, 0);
> @@ -987,7 +992,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  					  dv->devname, INVALID_SECTORS)) {
>  			close(dfd);
>  			close(container_fd);
> -			return -1;
> +			goto unlock;
>  		}
>  		if (!mdmon_running(tst->container_devnm))
>  			tst->ss->sync_metadata(tst);
> @@ -998,7 +1003,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  			       dv->devname);
>  			close(container_fd);
>  			tst->ss->free_super(tst);
> -			return -1;
> +			goto unlock;
>  		}
>  		sra->array.level = LEVEL_CONTAINER;
>  		/* Need to set data_offset and component_size */
> @@ -1013,7 +1018,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  			pr_err("add new device to external metadata failed for %s\n", dv->devname);
>  			close(container_fd);
>  			sysfs_free(sra);
> -			return -1;
> +			goto unlock;
>  		}
>  		ping_monitor(devnm);
>  		sysfs_free(sra);
> @@ -1027,7 +1032,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  			else
>  				pr_err("add new device failed for %s as %d: %s\n",
>  				       dv->devname, j, strerror(errno));
> -			return -1;
> +			goto unlock;
>  		}
>  		if (dv->disposition == 'j') {
>  			pr_err("Journal added successfully, making %s read-write\n", devname);
> @@ -1038,7 +1043,11 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  	}
>  	if (verbose >= 0)
>  		pr_err("added %s\n", dv->devname);
> +	map_unlock(&map);
>  	return 1;
> +unlock:
> +	map_unlock(&map);
> +	return -1;
>  }
> 
>  int Manage_remove(struct supertype *tst, int fd, struct mddev_dev *dv,

I am not challenging, just before I understand what you are trying to fix,
it is quite hard for me to join the discussion with you for this change.

And, this version is much more informative, and thank you for the effort.

-- 
Coly Li

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] Fix race of "mdadm --add" and "mdadm --incremental"
  2023-09-05 16:17 ` Coly Li
@ 2023-09-05 19:08   ` Martin Wilck
  2023-09-06  8:51     ` Li Xiao Keng
  0 siblings, 1 reply; 8+ messages in thread
From: Martin Wilck @ 2023-09-05 19:08 UTC (permalink / raw)
  To: Coly Li, Li Xiao Keng; +Cc: Jes Sorensen, linux-raid, louhongxiang, miaoguanqin

On Wed, 2023-09-06 at 00:17 +0800, Coly Li wrote:
> Hi Xiao Keng,
> 
> Thanks for the updated version, I add my comments inline.
> 
> On Tue, Sep 05, 2023 at 08:02:06PM +0800, Li Xiao Keng wrote:
> > When we add a new disk to a raid, it may return -EBUSY.
> 
> Where is above -EBUSY from? do you mean mdadm command returns
> -EBUSY or it is returned by some specific function in mdadm
> source code.
> 
> > 
> > The main process of --add:
> > 1. dev_open
> > 2. store_super1(st, di->fd) in write_init_super1
> > 3. fsync(di->fd) in write_init_super1
> > 4. close(di->fd)
> > 5. ioctl(ADD_NEW_DISK)
> > 
> > However, there will be some udev(change) event after step4. Then
> > "/usr/sbin/mdadm --incremental ..." will be run, and the new disk
> > will be add to md device. After that, ioctl will return -EBUSY.
> > 
> 
> Dose returning -EBUSY hurt anything? Or only returns -EBUSY and
> other stuffs all work as expected?

IIUC, it does not. The manual --add command will fail. Li Xiao Keng has
described the problem in earlier emails.
 
> > Here we add map_lock before write_init_super in "mdadm --add"
> > to fix this race.
> > 
> 
> I am not familiar this part of code, but I see ignoring the failure
> from map_lock() in Assemble() is on purpose by Neil. Therefore I
> just guess simply return from Assemble when map_lock() fails might
> not be what you wanted.
> 
> 
> > Signed-off-by: Li Xiao Keng <lixiaokeng@huawei.com>
> > Signed-off-by: Guanqin Miao <miaoguanqin@huawei.com>
> > ---
> >  Assemble.c |  5 ++++-
> >  Manage.c   | 25 +++++++++++++++++--------
> >  2 files changed, 21 insertions(+), 9 deletions(-)
> > 
> > diff --git a/Assemble.c b/Assemble.c
> > index 49804941..086890ed 100644
> > --- a/Assemble.c
> > +++ b/Assemble.c
> > @@ -1479,8 +1479,11 @@ try_again:
> >          * to our list.  We flag them so that we don't try to re-
> > add,
> >          * but can remove if they turn out to not be wanted.
> >          */
> > -       if (map_lock(&map))
> > +       if (map_lock(&map)) {
> >                 pr_err("failed to get exclusive lock on mapfile -
> > continue anyway...\n");
> > +               return 1;
> 
> Especially when the error message noticed "continue anyway" but a
> return 1
> followed, the behavior might be still confusing.

Now as you're saying it, I recall I had the same comment last time ;-)

I might add that "return 1" is dangerous, as it pretends that
Manage_add() was successful and actually added a device, which is not
the case. In the special case that Li Xiao Keng wants to fix, it's true
(sort of) because the asynchronous "mdadm -I" will have added the
device already. But there could be other races where Assemble_map()
can't obtain the lock and still the device will not be added later.

Regards
Martin

> 
> > +       }
> > +
> >         if (c->update == UOPT_UUID)
> >                 mp = NULL;
> >         else
> > diff --git a/Manage.c b/Manage.c
> > index f54de7c6..6a101bae 100644
> > --- a/Manage.c
> > +++ b/Manage.c
> > @@ -703,6 +703,7 @@ int Manage_add(int fd, int tfd, struct
> > mddev_dev *dv,
> >         struct supertype *dev_st;
> >         int j;
> >         mdu_disk_info_t disc;
> > +       struct map_ent *map = NULL;
> > 
> >         if (!get_dev_size(tfd, dv->devname, &ldsize)) {
> >                 if (dv->disposition == 'M')
> > @@ -900,6 +901,10 @@ int Manage_add(int fd, int tfd, struct
> > mddev_dev *dv,
> >                 disc.raid_disk = 0;
> >         }
> > 
> > +       if (map_lock(&map)) {
> > +               pr_err("failed to get exclusive lock on mapfile
> > when add disk\n");
> > +               return -1;
> > +       }
> >         if (array->not_persistent==0) {
> >                 int dfd;
> >                 if (dv->disposition == 'j')
> > @@ -911,9 +916,9 @@ int Manage_add(int fd, int tfd, struct
> > mddev_dev *dv,
> >                 dfd = dev_open(dv->devname, O_RDWR |
> > O_EXCL|O_DIRECT);
> >                 if (tst->ss->add_to_super(tst, &disc, dfd,
> >                                           dv->devname,
> > INVALID_SECTORS))
> > -                       return -1;
> > +                       goto unlock;
> >                 if (tst->ss->write_init_super(tst))
> > -                       return -1;
> > +                       goto unlock;
> >         } else if (dv->disposition == 'A') {
> >                 /*  this had better be raid1.
> >                  * As we are "--re-add"ing we must find a spare
> > slot
> > @@ -971,14 +976,14 @@ int Manage_add(int fd, int tfd, struct
> > mddev_dev *dv,
> >                         pr_err("add failed for %s: could not get
> > exclusive access to container\n",
> >                                dv->devname);
> >                         tst->ss->free_super(tst);
> > -                       return -1;
> > +                       goto unlock;
> >                 }
> > 
> >                 /* Check if metadata handler is able to accept the
> > drive */
> >                 if (!tst->ss->validate_geometry(tst,
> > LEVEL_CONTAINER, 0, 1, NULL,
> >                     0, 0, dv->devname, NULL, 0, 1)) {
> >                         close(container_fd);
> > -                       return -1;
> > +                       goto unlock;
> >                 }
> > 
> >                 Kill(dv->devname, NULL, 0, -1, 0);
> > @@ -987,7 +992,7 @@ int Manage_add(int fd, int tfd, struct
> > mddev_dev *dv,
> >                                           dv->devname,
> > INVALID_SECTORS)) {
> >                         close(dfd);
> >                         close(container_fd);
> > -                       return -1;
> > +                       goto unlock;
> >                 }
> >                 if (!mdmon_running(tst->container_devnm))
> >                         tst->ss->sync_metadata(tst);
> > @@ -998,7 +1003,7 @@ int Manage_add(int fd, int tfd, struct
> > mddev_dev *dv,
> >                                dv->devname);
> >                         close(container_fd);
> >                         tst->ss->free_super(tst);
> > -                       return -1;
> > +                       goto unlock;
> >                 }
> >                 sra->array.level = LEVEL_CONTAINER;
> >                 /* Need to set data_offset and component_size */
> > @@ -1013,7 +1018,7 @@ int Manage_add(int fd, int tfd, struct
> > mddev_dev *dv,
> >                         pr_err("add new device to external metadata
> > failed for %s\n", dv->devname);
> >                         close(container_fd);
> >                         sysfs_free(sra);
> > -                       return -1;
> > +                       goto unlock;
> >                 }
> >                 ping_monitor(devnm);
> >                 sysfs_free(sra);
> > @@ -1027,7 +1032,7 @@ int Manage_add(int fd, int tfd, struct
> > mddev_dev *dv,
> >                         else
> >                                 pr_err("add new device failed for
> > %s as %d: %s\n",
> >                                        dv->devname, j,
> > strerror(errno));
> > -                       return -1;
> > +                       goto unlock;
> >                 }
> >                 if (dv->disposition == 'j') {
> >                         pr_err("Journal added successfully, making
> > %s read-write\n", devname);
> > @@ -1038,7 +1043,11 @@ int Manage_add(int fd, int tfd, struct
> > mddev_dev *dv,
> >         }
> >         if (verbose >= 0)
> >                 pr_err("added %s\n", dv->devname);
> > +       map_unlock(&map);
> >         return 1;
> > +unlock:
> > +       map_unlock(&map);
> > +       return -1;
> >  }
> > 
> >  int Manage_remove(struct supertype *tst, int fd, struct mddev_dev
> > *dv,
> 
> I am not challenging, just before I understand what you are trying to
> fix,
> it is quite hard for me to join the discussion with you for this
> change.
> 
> And, this version is much more informative, and thank you for the
> effort.
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] Fix race of "mdadm --add" and "mdadm --incremental"
  2023-09-05 19:08   ` Martin Wilck
@ 2023-09-06  8:51     ` Li Xiao Keng
  2023-09-06 13:31       ` Martin Wilck
  0 siblings, 1 reply; 8+ messages in thread
From: Li Xiao Keng @ 2023-09-06  8:51 UTC (permalink / raw)
  To: Martin Wilck, Coly Li; +Cc: Jes Sorensen, linux-raid, louhongxiang, miaoguanqin



On 2023/9/6 3:08, Martin Wilck wrote:
> On Wed, 2023-09-06 at 00:17 +0800, Coly Li wrote:
>> Hi Xiao Keng,
>>
>> Thanks for the updated version, I add my comments inline.
>>
>> On Tue, Sep 05, 2023 at 08:02:06PM +0800, Li Xiao Keng wrote:
>>> When we add a new disk to a raid, it may return -EBUSY.
>>
>> Where is above -EBUSY from? do you mean mdadm command returns
>> -EBUSY or it is returned by some specific function in mdadm
>> source code.
>>

Because the new disk is added to the raid by "mdadm --incremental",
the "mdadm --add" will return the err.

>>>
>>> The main process of --add:
>>> 1. dev_open
>>> 2. store_super1(st, di->fd) in write_init_super1
>>> 3. fsync(di->fd) in write_init_super1
>>> 4. close(di->fd)
>>> 5. ioctl(ADD_NEW_DISK)
>>>
>>> However, there will be some udev(change) event after step4. Then
>>> "/usr/sbin/mdadm --incremental ..." will be run, and the new disk
>>> will be add to md device. After that, ioctl will return -EBUSY.
>>>
>>
>> Dose returning -EBUSY hurt anything? Or only returns -EBUSY and
>> other stuffs all work as expected?
> 
> IIUC, it does not. The manual --add command will fail. Li Xiao Keng has
> described the problem in earlier emails.
Yes！ The disk is add to the raid, but the manual --add command will fail.
We will decide the next action based on the return value.

>  
>>> Here we add map_lock before write_init_super in "mdadm --add"
>>> to fix this race.
>>>
>>
>> I am not familiar this part of code, but I see ignoring the failure
>> from map_lock() in Assemble() is on purpose by Neil. Therefore I
>> just guess simply return from Assemble when map_lock() fails might
>> not be what you wanted.
>>
>>
>>> Signed-off-by: Li Xiao Keng <lixiaokeng@huawei.com>
>>> Signed-off-by: Guanqin Miao <miaoguanqin@huawei.com>
>>> ---
>>>  Assemble.c |  5 ++++-
>>>  Manage.c   | 25 +++++++++++++++++--------
>>>  2 files changed, 21 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/Assemble.c b/Assemble.c
>>> index 49804941..086890ed 100644
>>> --- a/Assemble.c
>>> +++ b/Assemble.c
>>> @@ -1479,8 +1479,11 @@ try_again:
>>>          * to our list.  We flag them so that we don't try to re-
>>> add,
>>>          * but can remove if they turn out to not be wanted.
>>>          */
>>> -       if (map_lock(&map))
>>> +       if (map_lock(&map)) {
>>>                 pr_err("failed to get exclusive lock on mapfile -
>>> continue anyway...\n");
>>> +               return 1;
>>
>> Especially when the error message noticed "continue anyway" but a
>> return 1
>> followed, the behavior might be still confusing.
> 
> Now as you're saying it, I recall I had the same comment last time ;-)
> 
I'm very sorry for this stupid mistake. I I find I send v1 patch but not
v2. I will send patch v2 to instead of it.

-	if (map_lock(&map))
-		pr_err("failed to get exclusive lock on mapfile - continue anyway...\n");
+	if (map_lock(&map)) {
+		pr_err("failed to get exclusive lock on mapfile when assemble raid.\n");
+		return 1;
+	}

> I might add that "return 1" is dangerous, as it pretends that
> Manage_add() was successful and actually added a device, which is not
> the case. In the special case that Li Xiao Keng wants to fix, it's true
> (sort of) because the asynchronous "mdadm -I" will have added the
> device already. But there could be other races where Assemble_map()
> can't obtain the lock and still the device will not be added later.
> 

Do I missunstandings
"AFAICS it would only help if the code snipped above did not only
pr_err() but exit if it can't get an exclusive lock." ?

Anyway, map_lock is a blocking function. If it can't get the lock, it blocks.
If map_lock() return error, Assemble() return 1. When -add unlock it,
Assemble() will go ahead but not return at map_lock().

Do you think it should remain as it is? Like:

	if (map_lock(&map))
		pr_err("failed to get exclusive lock on mapfile - continue anyway...\n");

>>>
>>> +       if (map_lock(&map)) {
>>> +               pr_err("failed to get exclusive lock on mapfile
>>> when add disk\n");
>>> +               return -1;
>>> +       }
>>>         if (array->not_persistent==0) {
>>>                 int dfd;
>>>                 if (dv->disposition == 'j')
>>> @@ -911,9 +916,9 @@ int Manage_add(int fd, int tfd, struct
>>> mddev_dev *dv,
>>>                 dfd = dev_open(dv->devname, O_RDWR |


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] Fix race of "mdadm --add" and "mdadm --incremental"
  2023-09-06  8:51     ` Li Xiao Keng
@ 2023-09-06 13:31       ` Martin Wilck
  2023-09-07  3:02         ` Li Xiao Keng
  0 siblings, 1 reply; 8+ messages in thread
From: Martin Wilck @ 2023-09-06 13:31 UTC (permalink / raw)
  To: Li Xiao Keng, Coly Li; +Cc: Jes Sorensen, linux-raid, louhongxiang, miaoguanqin

On Wed, 2023-09-06 at 16:51 +0800, Li Xiao Keng wrote:
> 
> 
> On 2023/9/6 3:08, Martin Wilck wrote:
> > On Wed, 2023-09-06 at 00:17 +0800, Coly Li wrote:
> > > Hi Xiao Keng,
> > > 
> > > Thanks for the updated version, I add my comments inline.
> > > 
> > > On Tue, Sep 05, 2023 at 08:02:06PM +0800, Li Xiao Keng wrote:
> > > > When we add a new disk to a raid, it may return -EBUSY.
> > > 
> > > Where is above -EBUSY from? do you mean mdadm command returns
> > > -EBUSY or it is returned by some specific function in mdadm
> > > source code.
> > > 
> 
> Because the new disk is added to the raid by "mdadm --incremental",
> the "mdadm --add" will return the err.
> 
> > > > 
> > > > The main process of --add:
> > > > 1. dev_open
> > > > 2. store_super1(st, di->fd) in write_init_super1
> > > > 3. fsync(di->fd) in write_init_super1
> > > > 4. close(di->fd)
> > > > 5. ioctl(ADD_NEW_DISK)
> > > > 
> > > > However, there will be some udev(change) event after step4.
> > > > Then
> > > > "/usr/sbin/mdadm --incremental ..." will be run, and the new
> > > > disk
> > > > will be add to md device. After that, ioctl will return -EBUSY.
> > > > 
> > > 
> > > Dose returning -EBUSY hurt anything? Or only returns -EBUSY and
> > > other stuffs all work as expected?
> > 
> > IIUC, it does not. The manual --add command will fail. Li Xiao Keng
> > has
> > described the problem in earlier emails.
> Yes！ The disk is add to the raid, but the manual --add command will
> fail.
> We will decide the next action based on the return value.
> 
> >  
> > > > Here we add map_lock before write_init_super in "mdadm --add"
> > > > to fix this race.
> > > > 
> > > 
> > > I am not familiar this part of code, but I see ignoring the
> > > failure
> > > from map_lock() in Assemble() is on purpose by Neil. Therefore I
> > > just guess simply return from Assemble when map_lock() fails
> > > might
> > > not be what you wanted.
> > > 
> > > 
> > > > Signed-off-by: Li Xiao Keng <lixiaokeng@huawei.com>
> > > > Signed-off-by: Guanqin Miao <miaoguanqin@huawei.com>
> > > > ---
> > > >  Assemble.c |  5 ++++-
> > > >  Manage.c   | 25 +++++++++++++++++--------
> > > >  2 files changed, 21 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/Assemble.c b/Assemble.c
> > > > index 49804941..086890ed 100644
> > > > --- a/Assemble.c
> > > > +++ b/Assemble.c
> > > > @@ -1479,8 +1479,11 @@ try_again:
> > > >          * to our list.  We flag them so that we don't try to
> > > > re-
> > > > add,
> > > >          * but can remove if they turn out to not be wanted.
> > > >          */
> > > > -       if (map_lock(&map))
> > > > +       if (map_lock(&map)) {
> > > >                 pr_err("failed to get exclusive lock on mapfile
> > > > -
> > > > continue anyway...\n");
> > > > +               return 1;
> > > 
> > > Especially when the error message noticed "continue anyway" but a
> > > return 1
> > > followed, the behavior might be still confusing.
> > 
> > Now as you're saying it, I recall I had the same comment last time
> > ;-)
> > 
> I'm very sorry for this stupid mistake. I I find I send v1 patch but
> not
> v2. I will send patch v2 to instead of it.
> 
> -       if (map_lock(&map))
> -               pr_err("failed to get exclusive lock on mapfile -
> continue anyway...\n");
> +       if (map_lock(&map)) {
> +               pr_err("failed to get exclusive lock on mapfile when
> assemble raid.\n");
> +               return 1;
> +       }
> 
> > I might add that "return 1" is dangerous, as it pretends that
> > Manage_add() was successful and actually added a device, which is
> > not
> > the case. In the special case that Li Xiao Keng wants to fix, it's
> > true
> > (sort of) because the asynchronous "mdadm -I" will have added the
> > device already. But there could be other races where Assemble_map()
> > can't obtain the lock and still the device will not be added later.
> > 
> 
> Do I missunstandings
> "AFAICS it would only help if the code snipped above did not only
> pr_err() but exit if it can't get an exclusive lock." ?
> 

> Anyway, map_lock is a blocking function. If it can't get the lock, it
blocks.
> If map_lock() return error, Assemble() return 1. When -add unlock it,
> Assemble() will go ahead but not return at map_lock().

Maybe *I* was misunderstanding. I thought map_lock() returned error if
the lock was held by the other process. What exactly does an error
return from map_lock() mean? If it does not mean "lock held by another
process", why does your patch solve the race issue?

Martin



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] Fix race of "mdadm --add" and "mdadm --incremental"
  2023-09-06 13:31       ` Martin Wilck
@ 2023-09-07  3:02         ` Li Xiao Keng
  2023-09-07  8:33           ` Martin Wilck
  0 siblings, 1 reply; 8+ messages in thread
From: Li Xiao Keng @ 2023-09-07  3:02 UTC (permalink / raw)
  To: Martin Wilck, Coly Li; +Cc: Jes Sorensen, linux-raid, louhongxiang, miaoguanqin



On 2023/9/6 21:31, Martin Wilck wrote:
> On Wed, 2023-09-06 at 16:51 +0800, Li Xiao Keng wrote:
>>
>>
>> On 2023/9/6 3:08, Martin Wilck wrote:
>>> On Wed, 2023-09-06 at 00:17 +0800, Coly Li wrote:
>>>> Hi Xiao Keng,
>>>>
>>>> Thanks for the updated version, I add my comments inline.
>>>>
>>>> On Tue, Sep 05, 2023 at 08:02:06PM +0800, Li Xiao Keng wrote:
>>>>> When we add a new disk to a raid, it may return -EBUSY.
>>>>
>>>> Where is above -EBUSY from? do you mean mdadm command returns
>>>> -EBUSY or it is returned by some specific function in mdadm
>>>> source code.
>>>>
>>
>> Because the new disk is added to the raid by "mdadm --incremental",
>> the "mdadm --add" will return the err.
>>
>>>>>
>>>>> The main process of --add:
>>>>> 1. dev_open
>>>>> 2. store_super1(st, di->fd) in write_init_super1
>>>>> 3. fsync(di->fd) in write_init_super1
>>>>> 4. close(di->fd)
>>>>> 5. ioctl(ADD_NEW_DISK)
>>>>>
>>>>> However, there will be some udev(change) event after step4.
>>>>> Then
>>>>> "/usr/sbin/mdadm --incremental ..." will be run, and the new
>>>>> disk
>>>>> will be add to md device. After that, ioctl will return -EBUSY.
>>>>>
>>>>
>>>> Dose returning -EBUSY hurt anything? Or only returns -EBUSY and
>>>> other stuffs all work as expected?
>>>
>>> IIUC, it does not. The manual --add command will fail. Li Xiao Keng
>>> has
>>> described the problem in earlier emails.
>> Yes！ The disk is add to the raid, but the manual --add command will
>> fail.
>> We will decide the next action based on the return value.
>>
>>>  
>>>>> Here we add map_lock before write_init_super in "mdadm --add"
>>>>> to fix this race.
>>>>>
>>>>
>>>> I am not familiar this part of code, but I see ignoring the
>>>> failure
>>>> from map_lock() in Assemble() is on purpose by Neil. Therefore I
>>>> just guess simply return from Assemble when map_lock() fails
>>>> might
>>>> not be what you wanted.
>>>>
>>>>
>>>>> Signed-off-by: Li Xiao Keng <lixiaokeng@huawei.com>
>>>>> Signed-off-by: Guanqin Miao <miaoguanqin@huawei.com>
>>>>> ---
>>>>>  Assemble.c |  5 ++++-
>>>>>  Manage.c   | 25 +++++++++++++++++--------
>>>>>  2 files changed, 21 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/Assemble.c b/Assemble.c
>>>>> index 49804941..086890ed 100644
>>>>> --- a/Assemble.c
>>>>> +++ b/Assemble.c
>>>>> @@ -1479,8 +1479,11 @@ try_again:
>>>>>          * to our list.  We flag them so that we don't try to
>>>>> re-
>>>>> add,
>>>>>          * but can remove if they turn out to not be wanted.
>>>>>          */
>>>>> -       if (map_lock(&map))
>>>>> +       if (map_lock(&map)) {
>>>>>                 pr_err("failed to get exclusive lock on mapfile
>>>>> -
>>>>> continue anyway...\n");
>>>>> +               return 1;
>>>>
>>>> Especially when the error message noticed "continue anyway" but a
>>>> return 1
>>>> followed, the behavior might be still confusing.
>>>
>>> Now as you're saying it, I recall I had the same comment last time
>>> ;-)
>>>
>> I'm very sorry for this stupid mistake. I I find I send v1 patch but
>> not
>> v2. I will send patch v2 to instead of it.
>>
>> -       if (map_lock(&map))
>> -               pr_err("failed to get exclusive lock on mapfile -
>> continue anyway...\n");
>> +       if (map_lock(&map)) {
>> +               pr_err("failed to get exclusive lock on mapfile when
>> assemble raid.\n");
>> +               return 1;
>> +       }
>>
>>> I might add that "return 1" is dangerous, as it pretends that
>>> Manage_add() was successful and actually added a device, which is
>>> not
>>> the case. In the special case that Li Xiao Keng wants to fix, it's
>>> true
>>> (sort of) because the asynchronous "mdadm -I" will have added the
>>> device already. But there could be other races where Assemble_map()
>>> can't obtain the lock and still the device will not be added later.
>>>
>>
>> Do I missunstandings
>> "AFAICS it would only help if the code snipped above did not only
>> pr_err() but exit if it can't get an exclusive lock." ?
>>
> 
>> Anyway, map_lock is a blocking function. If it can't get the lock, it
> blocks.
>> If map_lock() return error, Assemble() return 1. When -add unlock it,
>> Assemble() will go ahead but not return at map_lock().
> 
> Maybe *I* was misunderstanding. I thought map_lock() returned error if
> the lock was held by the other process. What exactly does an error
> return from map_lock() mean? If it does not mean "lock held by another
> process", why does your patch solve the race issue?
> 

The -add locks map_lock() before udev(change) event happen, and unlocks it
until ioctl(ADD_NEW_DISK) finishing to solve the race issue. This makes
manual -add return success and -incremental (triggered by uevent) will
fail, which is same as the previous successful execution of the -add command.

> Martin
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] Fix race of "mdadm --add" and "mdadm --incremental"
  2023-09-07  3:02         ` Li Xiao Keng
@ 2023-09-07  8:33           ` Martin Wilck
  0 siblings, 0 replies; 8+ messages in thread
From: Martin Wilck @ 2023-09-07  8:33 UTC (permalink / raw)
  To: Li Xiao Keng, Coly Li; +Cc: Jes Sorensen, linux-raid, louhongxiang, miaoguanqin

On Thu, 2023-09-07 at 11:02 +0800, Li Xiao Keng wrote:
> 
> 
> On 2023/9/6 21:31, Martin Wilck wrote:
> > On Wed, 2023-09-06 at 16:51 +0800, Li Xiao Keng wrote:
> > > 
> > > 
> > > On 2023/9/6 3:08, Martin Wilck wrote:
> > > > On Wed, 2023-09-06 at 00:17 +0800, Coly Li wrote:
> > > > > Hi Xiao Keng,
> > > > > 
> > > > > Thanks for the updated version, I add my comments inline.
> > > > > 
> > > > > On Tue, Sep 05, 2023 at 08:02:06PM +0800, Li Xiao Keng wrote:
> > > > > > When we add a new disk to a raid, it may return -EBUSY.
> > > > > 
> > > > > Where is above -EBUSY from? do you mean mdadm command returns
> > > > > -EBUSY or it is returned by some specific function in mdadm
> > > > > source code.
> > > > > 
> > > 
> > > Because the new disk is added to the raid by "mdadm --
> > > incremental",
> > > the "mdadm --add" will return the err.
> > > 
> > > > > > 
> > > > > > The main process of --add:
> > > > > > 1. dev_open
> > > > > > 2. store_super1(st, di->fd) in write_init_super1
> > > > > > 3. fsync(di->fd) in write_init_super1
> > > > > > 4. close(di->fd)
> > > > > > 5. ioctl(ADD_NEW_DISK)
> > > > > > 
> > > > > > However, there will be some udev(change) event after step4.
> > > > > > Then
> > > > > > "/usr/sbin/mdadm --incremental ..." will be run, and the
> > > > > > new
> > > > > > disk
> > > > > > will be add to md device. After that, ioctl will return -
> > > > > > EBUSY.
> > > > > > 
> > > > > 
> > > > > Dose returning -EBUSY hurt anything? Or only returns -EBUSY
> > > > > and
> > > > > other stuffs all work as expected?
> > > > 
> > > > IIUC, it does not. The manual --add command will fail. Li Xiao
> > > > Keng
> > > > has
> > > > described the problem in earlier emails.
> > > Yes！ The disk is add to the raid, but the manual --add command
> > > will
> > > fail.
> > > We will decide the next action based on the return value.
> > > 
> > > >  
> > > > > > Here we add map_lock before write_init_super in "mdadm --
> > > > > > add"
> > > > > > to fix this race.
> > > > > > 
> > > > > 
> > > > > I am not familiar this part of code, but I see ignoring the
> > > > > failure
> > > > > from map_lock() in Assemble() is on purpose by Neil.
> > > > > Therefore I
> > > > > just guess simply return from Assemble when map_lock() fails
> > > > > might
> > > > > not be what you wanted.
> > > > > 
> > > > > 
> > > > > > Signed-off-by: Li Xiao Keng <lixiaokeng@huawei.com>
> > > > > > Signed-off-by: Guanqin Miao <miaoguanqin@huawei.com>
> > > > > > ---
> > > > > >  Assemble.c |  5 ++++-
> > > > > >  Manage.c   | 25 +++++++++++++++++--------
> > > > > >  2 files changed, 21 insertions(+), 9 deletions(-)
> > > > > > 
> > > > > > diff --git a/Assemble.c b/Assemble.c
> > > > > > index 49804941..086890ed 100644
> > > > > > --- a/Assemble.c
> > > > > > +++ b/Assemble.c
> > > > > > @@ -1479,8 +1479,11 @@ try_again:
> > > > > >          * to our list.  We flag them so that we don't try
> > > > > > to
> > > > > > re-
> > > > > > add,
> > > > > >          * but can remove if they turn out to not be
> > > > > > wanted.
> > > > > >          */
> > > > > > -       if (map_lock(&map))
> > > > > > +       if (map_lock(&map)) {
> > > > > >                 pr_err("failed to get exclusive lock on
> > > > > > mapfile
> > > > > > -
> > > > > > continue anyway...\n");
> > > > > > +               return 1;
> > > > > 
> > > > > Especially when the error message noticed "continue anyway"
> > > > > but a
> > > > > return 1
> > > > > followed, the behavior might be still confusing.
> > > > 
> > > > Now as you're saying it, I recall I had the same comment last
> > > > time
> > > > ;-)
> > > > 
> > > I'm very sorry for this stupid mistake. I I find I send v1 patch
> > > but
> > > not
> > > v2. I will send patch v2 to instead of it.
> > > 
> > > -       if (map_lock(&map))
> > > -               pr_err("failed to get exclusive lock on mapfile -
> > > continue anyway...\n");
> > > +       if (map_lock(&map)) {
> > > +               pr_err("failed to get exclusive lock on mapfile
> > > when
> > > assemble raid.\n");
> > > +               return 1;
> > > +       }
> > > 
> > > > I might add that "return 1" is dangerous, as it pretends that
> > > > Manage_add() was successful and actually added a device, which
> > > > is
> > > > not
> > > > the case. In the special case that Li Xiao Keng wants to fix,
> > > > it's
> > > > true
> > > > (sort of) because the asynchronous "mdadm -I" will have added
> > > > the
> > > > device already. But there could be other races where
> > > > Assemble_map()
> > > > can't obtain the lock and still the device will not be added
> > > > later.
> > > > 
> > > 
> > > Do I missunstandings
> > > "AFAICS it would only help if the code snipped above did not only
> > > pr_err() but exit if it can't get an exclusive lock." ?
> > > 
> > 
> > > Anyway, map_lock is a blocking function. If it can't get the
> > > lock, it
> > blocks.
> > > If map_lock() return error, Assemble() return 1. When -add unlock
> > > it,
> > > Assemble() will go ahead but not return at map_lock().
> > 
> > Maybe *I* was misunderstanding. I thought map_lock() returned error
> > if
> > the lock was held by the other process. What exactly does an error
> > return from map_lock() mean? If it does not mean "lock held by
> > another
> > process", why does your patch solve the race issue?
> > 
> 
> The -add locks map_lock() before udev(change) event happen, and
> unlocks it
> until ioctl(ADD_NEW_DISK) finishing to solve the race issue. This
> makes
> manual -add return success and -incremental (triggered by uevent)
> will
> fail, which is same as the previous successful execution of the -add
> command.

OK. Then maybe you should just leave the code for handling the
map_lock() error condition as-is. It tends to confuse readers.

Martin


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-09-07 15:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-05 12:02 [PATCH v2] Fix race of "mdadm --add" and "mdadm --incremental" Li Xiao Keng
2023-09-05 15:07 ` Martin Wilck
2023-09-05 16:17 ` Coly Li
2023-09-05 19:08   ` Martin Wilck
2023-09-06  8:51     ` Li Xiao Keng
2023-09-06 13:31       ` Martin Wilck
2023-09-07  3:02         ` Li Xiao Keng
2023-09-07  8:33           ` Martin Wilck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).