Re: [PATCH v2] Fix race of "mdadm --add" and "mdadm --incremental"

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Martin Wilck <mwilck@suse.com>
To: Coly Li <colyli@suse.de>, Li Xiao Keng <lixiaokeng@huawei.com>
Cc: Jes Sorensen <jes@trained-monkey.org>,
	linux-raid@vger.kernel.org, louhongxiang@huawei.com,
	miaoguanqin <miaoguanqin@huawei.com>
Subject: Re: [PATCH v2] Fix race of "mdadm --add" and "mdadm --incremental"
Date: Tue, 05 Sep 2023 21:08:45 +0200	[thread overview]
Message-ID: <12823520a0fe774908bd0830f59d05d2e7c06126.camel@suse.com> (raw)
In-Reply-To: <kjdwwbkqj6fuaijow2nldh5ofbxymto2mzqcullb57jtx6q6h2@46kropdd4lql>

On Wed, 2023-09-06 at 00:17 +0800, Coly Li wrote:
> Hi Xiao Keng,
> 
> Thanks for the updated version, I add my comments inline.
> 
> On Tue, Sep 05, 2023 at 08:02:06PM +0800, Li Xiao Keng wrote:
> > When we add a new disk to a raid, it may return -EBUSY.
> 
> Where is above -EBUSY from? do you mean mdadm command returns
> -EBUSY or it is returned by some specific function in mdadm
> source code.
> 
> > 
> > The main process of --add:
> > 1. dev_open
> > 2. store_super1(st, di->fd) in write_init_super1
> > 3. fsync(di->fd) in write_init_super1
> > 4. close(di->fd)
> > 5. ioctl(ADD_NEW_DISK)
> > 
> > However, there will be some udev(change) event after step4. Then
> > "/usr/sbin/mdadm --incremental ..." will be run, and the new disk
> > will be add to md device. After that, ioctl will return -EBUSY.
> > 
> 
> Dose returning -EBUSY hurt anything? Or only returns -EBUSY and
> other stuffs all work as expected?

IIUC, it does not. The manual --add command will fail. Li Xiao Keng has
described the problem in earlier emails.
 
> > Here we add map_lock before write_init_super in "mdadm --add"
> > to fix this race.
> > 
> 
> I am not familiar this part of code, but I see ignoring the failure
> from map_lock() in Assemble() is on purpose by Neil. Therefore I
> just guess simply return from Assemble when map_lock() fails might
> not be what you wanted.
> 
> 
> > Signed-off-by: Li Xiao Keng <lixiaokeng@huawei.com>
> > Signed-off-by: Guanqin Miao <miaoguanqin@huawei.com>
> > ---
> >  Assemble.c |  5 ++++-
> >  Manage.c   | 25 +++++++++++++++++--------
> >  2 files changed, 21 insertions(+), 9 deletions(-)
> > 
> > diff --git a/Assemble.c b/Assemble.c
> > index 49804941..086890ed 100644
> > --- a/Assemble.c
> > +++ b/Assemble.c
> > @@ -1479,8 +1479,11 @@ try_again:
> >          * to our list.  We flag them so that we don't try to re-
> > add,
> >          * but can remove if they turn out to not be wanted.
> >          */
> > -       if (map_lock(&map))
> > +       if (map_lock(&map)) {
> >                 pr_err("failed to get exclusive lock on mapfile -
> > continue anyway...\n");
> > +               return 1;
> 
> Especially when the error message noticed "continue anyway" but a
> return 1
> followed, the behavior might be still confusing.

Now as you're saying it, I recall I had the same comment last time ;-)

I might add that "return 1" is dangerous, as it pretends that
Manage_add() was successful and actually added a device, which is not
the case. In the special case that Li Xiao Keng wants to fix, it's true
(sort of) because the asynchronous "mdadm -I" will have added the
device already. But there could be other races where Assemble_map()
can't obtain the lock and still the device will not be added later.

Regards
Martin

> 
> > +       }
> > +
> >         if (c->update == UOPT_UUID)
> >                 mp = NULL;
> >         else
> > diff --git a/Manage.c b/Manage.c
> > index f54de7c6..6a101bae 100644
> > --- a/Manage.c
> > +++ b/Manage.c
> > @@ -703,6 +703,7 @@ int Manage_add(int fd, int tfd, struct
> > mddev_dev *dv,
> >         struct supertype *dev_st;
> >         int j;
> >         mdu_disk_info_t disc;
> > +       struct map_ent *map = NULL;
> > 
> >         if (!get_dev_size(tfd, dv->devname, &ldsize)) {
> >                 if (dv->disposition == 'M')
> > @@ -900,6 +901,10 @@ int Manage_add(int fd, int tfd, struct
> > mddev_dev *dv,
> >                 disc.raid_disk = 0;
> >         }
> > 
> > +       if (map_lock(&map)) {
> > +               pr_err("failed to get exclusive lock on mapfile
> > when add disk\n");
> > +               return -1;
> > +       }
> >         if (array->not_persistent==0) {
> >                 int dfd;
> >                 if (dv->disposition == 'j')
> > @@ -911,9 +916,9 @@ int Manage_add(int fd, int tfd, struct
> > mddev_dev *dv,
> >                 dfd = dev_open(dv->devname, O_RDWR |
> > O_EXCL|O_DIRECT);
> >                 if (tst->ss->add_to_super(tst, &disc, dfd,
> >                                           dv->devname,
> > INVALID_SECTORS))
> > -                       return -1;
> > +                       goto unlock;
> >                 if (tst->ss->write_init_super(tst))
> > -                       return -1;
> > +                       goto unlock;
> >         } else if (dv->disposition == 'A') {
> >                 /*  this had better be raid1.
> >                  * As we are "--re-add"ing we must find a spare
> > slot
> > @@ -971,14 +976,14 @@ int Manage_add(int fd, int tfd, struct
> > mddev_dev *dv,
> >                         pr_err("add failed for %s: could not get
> > exclusive access to container\n",
> >                                dv->devname);
> >                         tst->ss->free_super(tst);
> > -                       return -1;
> > +                       goto unlock;
> >                 }
> > 
> >                 /* Check if metadata handler is able to accept the
> > drive */
> >                 if (!tst->ss->validate_geometry(tst,
> > LEVEL_CONTAINER, 0, 1, NULL,
> >                     0, 0, dv->devname, NULL, 0, 1)) {
> >                         close(container_fd);
> > -                       return -1;
> > +                       goto unlock;
> >                 }
> > 
> >                 Kill(dv->devname, NULL, 0, -1, 0);
> > @@ -987,7 +992,7 @@ int Manage_add(int fd, int tfd, struct
> > mddev_dev *dv,
> >                                           dv->devname,
> > INVALID_SECTORS)) {
> >                         close(dfd);
> >                         close(container_fd);
> > -                       return -1;
> > +                       goto unlock;
> >                 }
> >                 if (!mdmon_running(tst->container_devnm))
> >                         tst->ss->sync_metadata(tst);
> > @@ -998,7 +1003,7 @@ int Manage_add(int fd, int tfd, struct
> > mddev_dev *dv,
> >                                dv->devname);
> >                         close(container_fd);
> >                         tst->ss->free_super(tst);
> > -                       return -1;
> > +                       goto unlock;
> >                 }
> >                 sra->array.level = LEVEL_CONTAINER;
> >                 /* Need to set data_offset and component_size */
> > @@ -1013,7 +1018,7 @@ int Manage_add(int fd, int tfd, struct
> > mddev_dev *dv,
> >                         pr_err("add new device to external metadata
> > failed for %s\n", dv->devname);
> >                         close(container_fd);
> >                         sysfs_free(sra);
> > -                       return -1;
> > +                       goto unlock;
> >                 }
> >                 ping_monitor(devnm);
> >                 sysfs_free(sra);
> > @@ -1027,7 +1032,7 @@ int Manage_add(int fd, int tfd, struct
> > mddev_dev *dv,
> >                         else
> >                                 pr_err("add new device failed for
> > %s as %d: %s\n",
> >                                        dv->devname, j,
> > strerror(errno));
> > -                       return -1;
> > +                       goto unlock;
> >                 }
> >                 if (dv->disposition == 'j') {
> >                         pr_err("Journal added successfully, making
> > %s read-write\n", devname);
> > @@ -1038,7 +1043,11 @@ int Manage_add(int fd, int tfd, struct
> > mddev_dev *dv,
> >         }
> >         if (verbose >= 0)
> >                 pr_err("added %s\n", dv->devname);
> > +       map_unlock(&map);
> >         return 1;
> > +unlock:
> > +       map_unlock(&map);
> > +       return -1;
> >  }
> > 
> >  int Manage_remove(struct supertype *tst, int fd, struct mddev_dev
> > *dv,
> 
> I am not challenging, just before I understand what you are trying to
> fix,
> it is quite hard for me to join the discussion with you for this
> change.
> 
> And, this version is much more informative, and thank you for the
> effort.
>

next prev parent reply	other threads:[~2023-09-05 19:15 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-05 12:02 [PATCH v2] Fix race of "mdadm --add" and "mdadm --incremental" Li Xiao Keng
2023-09-05 15:07 ` Martin Wilck
2023-09-05 16:17 ` Coly Li
2023-09-05 19:08   ` Martin Wilck [this message]
2023-09-06  8:51     ` Li Xiao Keng
2023-09-06 13:31       ` Martin Wilck
2023-09-07  3:02         ` Li Xiao Keng
2023-09-07  8:33           ` Martin Wilck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=12823520a0fe774908bd0830f59d05d2e7c06126.camel@suse.com \
    --to=mwilck@suse.com \
    --cc=colyli@suse.de \
    --cc=jes@trained-monkey.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=lixiaokeng@huawei.com \
    --cc=louhongxiang@huawei.com \
    --cc=miaoguanqin@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).