Re: [PATCH v3] Fix race of "mdadm --add" and "mdadm --incremental"

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
To: Li Xiao Keng <lixiaokeng@huawei.com>
Cc: Jes Sorensen <jes@trained-monkey.org>,
	Martin Wilck <mwilck@suse.com>, Coly Li <colyli@suse.de>,
	<linux-raid@vger.kernel.org>, <louhongxiang@huawei.com>,
	miaoguanqin <miaoguanqin@huawei.com>
Subject: Re: [PATCH v3] Fix race of "mdadm --add" and "mdadm --incremental"
Date: Tue, 12 Sep 2023 09:58:28 +0200	[thread overview]
Message-ID: <20230912095828.000002a5@linux.intel.com> (raw)
In-Reply-To: <d1744e2f-0476-153e-11b6-662f16d54b73@huawei.com>

On Tue, 12 Sep 2023 15:18:33 +0800
Li Xiao Keng <lixiaokeng@huawei.com> wrote:

> On 2023/9/12 0:00, Mariusz Tkaczyk wrote:
> > On Thu, 7 Sep 2023 19:37:44 +0800
> > Li Xiao Keng <lixiaokeng@huawei.com> wrote:
> >   
> >> There is a raid1 with sda and sdb. And we add sdc to this raid,
> >> it may return -EBUSY.
> >>
> >> The main process of --add:
> >> 1. dev_open（sdc) in Manage_add
> >> 2. store_super1(st, di->fd) in write_init_super1
> >> 3. fsync(fd) in store_super1
> >> 4. close(di->fd) in write_init_super1
> >> 5. ioctl(ADD_NEW_DISK)
> >>
> >> Step 2 and 3 will add sdc to metadata of raid1. There will be
> >> udev(change of sdc) event after step4. Then "/usr/sbin/mdadm
> >> --incremental --export $devnode --offroot $env{DEVLINKS}"
> >> will be run, and the sdc will be added to the raid1. Then
> >> step 5 will return -EBUSY because it checks if device isn't
> >> claimed in md_import_device()->lock_rdev()->blkdev_get_by_dev()  
> >> ->blkdev_get().    
> >>
> >> It will be confusing for users because sdc is added first time.
> >> The "incremental" will get map_lock before add sdc to raid1.
> >> So we add map_lock before write_init_super in "mdadm --add"
> >> to fix the race of "add" and "incremental".
> >>
> >> Signed-off-by: Li Xiao Keng <lixiaokeng@huawei.com>
> >> Signed-off-by: Guanqin Miao <miaoguanqin@huawei.com>  
> > 
> > Hello Li Xiao Keng,
> > Thanks for the patch, I'm trying to get my head around this.
> > As I understand, you added a mapfile locking and you are failing when the
> > log is already claimed, but you prefer to return -1 and print error instead
> > of returning EBUSY, right?  
> No, I do not prefer to return -1 and print error instead of returning EBUSY.
> At first, I had the same idea as Martin, thinking that map_lock() would fail
> when it is locked by someone else.Actually it is not fail but block. So now I
> just print error and go ahead if map_lock() fails, like map_lock() elsewhere.

Ok, thanks now it have more sense!

> > 
> > If yes, it has nothing to do with fixing. It is just a obscure hack to make
> > it, let say more reliable. We can either get EBUSY and know that is
> > expected. I see no difference, your solution is same confusing as current
> > implementation.
> > 
> > Also, you used map_lock(). I remember that initially I proposed - my bad.
> > We also rejected udev locking, as a buggy. In fact- I don't see ready
> > synchronization method to avoid this race in mdadm right now.
> > 
> > The best way, I see is:
> > 1. Write metadata.
> > 2. Give udev chance to process the device.
> > 3. Add it manually if udev failed (or D_NO_UDEV).  
> Is there any command to only write metadata?  Or change implementation of
> --add ?

I think that we should change implementation for --add.

> > 
> > If it does not work for you, we need to add synchronization, I know that
> > there is pidfile used for mdmon, perhaps we can reuse it? We don't need
> > something fancy, just something to keep all synchronised and block
> > incremental to led --add finish the action.
> > 
> > We cannot use map_lock() and map_unlock() in this context. It claims lock on
> > mapfile which describes every array in your system (base details, there is
> > no info about members), see /var/run/mdadm. Add is just a tiny action to
> > add disk and we are not going to update the map. Taking this lock will make
> > --add more vulnerable, for example it will fail because another array is
> > assembled or created in the same time (I know that it will be hard to
> > achieve but it is another race - we don't want it).  
> When I test this patch, I find map_lock(）in --incremental is block but not
> fail. So I think it will not fail when another array is assembled.

Oh, it seems that flock() is waiting for the lock to be acquired, it is
blocking function. Now I read the documentation and I confirmed it.

> > 
> > And we cannot allow for partially done action, you added this lock after
> > writing metadata (according to description), we are not taking this lock to
> > complete the action, just to hide race. Lock should be taken before
> > writing metadata, but we know that it will not hide an issue.  
> Here I add map_lock() before writing metadata and ulock it until finish. Do
> you mean it should be before attempt_re_add()?

Indeed you wrote "before write_init_super in "mdadm --add". I was
against taking a map_lock() but now, when I understand that other processes will
wait I see this as a simplest way to keep all synchronized. 

After your clarification- I don't have an objections to take this. Taking
map_lock is not the best approach because we lock all mddevices but I don't see
a strong need to have something better. --add is a user action, low risk of
race with other --add or --assemble/--incremental.

I was against it because I thought that it is not blocking0, i.e. we are failing
if lock cannot be obtained. 

Reviewed-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>

Thanks,
Mariusz

next prev parent reply	other threads:[~2023-09-12  8:01 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-07 11:37 [PATCH v3] Fix race of "mdadm --add" and "mdadm --incremental" Li Xiao Keng
2023-09-11 16:00 ` Mariusz Tkaczyk
2023-09-12  7:18   ` Li Xiao Keng
2023-09-12  7:58     ` Mariusz Tkaczyk [this message]
2023-10-07  9:26 ` Li Xiao Keng
2023-10-26 21:42 ` Jes Sorensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230912095828.000002a5@linux.intel.com \
    --to=mariusz.tkaczyk@linux.intel.com \
    --cc=colyli@suse.de \
    --cc=jes@trained-monkey.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=lixiaokeng@huawei.com \
    --cc=louhongxiang@huawei.com \
    --cc=miaoguanqin@huawei.com \
    --cc=mwilck@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).