From: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
To: Li Xiao Keng <lixiaokeng@huawei.com>
Cc: Jes Sorensen <jes@trained-monkey.org>,
Martin Wilck <mwilck@suse.com>, Coly Li <colyli@suse.de>,
<linux-raid@vger.kernel.org>, <louhongxiang@huawei.com>,
miaoguanqin <miaoguanqin@huawei.com>
Subject: Re: [PATCH v3] Fix race of "mdadm --add" and "mdadm --incremental"
Date: Tue, 12 Sep 2023 09:58:28 +0200 [thread overview]
Message-ID: <20230912095828.000002a5@linux.intel.com> (raw)
In-Reply-To: <d1744e2f-0476-153e-11b6-662f16d54b73@huawei.com>
On Tue, 12 Sep 2023 15:18:33 +0800
Li Xiao Keng <lixiaokeng@huawei.com> wrote:
> On 2023/9/12 0:00, Mariusz Tkaczyk wrote:
> > On Thu, 7 Sep 2023 19:37:44 +0800
> > Li Xiao Keng <lixiaokeng@huawei.com> wrote:
> >
> >> There is a raid1 with sda and sdb. And we add sdc to this raid,
> >> it may return -EBUSY.
> >>
> >> The main process of --add:
> >> 1. dev_open(sdc) in Manage_add
> >> 2. store_super1(st, di->fd) in write_init_super1
> >> 3. fsync(fd) in store_super1
> >> 4. close(di->fd) in write_init_super1
> >> 5. ioctl(ADD_NEW_DISK)
> >>
> >> Step 2 and 3 will add sdc to metadata of raid1. There will be
> >> udev(change of sdc) event after step4. Then "/usr/sbin/mdadm
> >> --incremental --export $devnode --offroot $env{DEVLINKS}"
> >> will be run, and the sdc will be added to the raid1. Then
> >> step 5 will return -EBUSY because it checks if device isn't
> >> claimed in md_import_device()->lock_rdev()->blkdev_get_by_dev()
> >> ->blkdev_get().
> >>
> >> It will be confusing for users because sdc is added first time.
> >> The "incremental" will get map_lock before add sdc to raid1.
> >> So we add map_lock before write_init_super in "mdadm --add"
> >> to fix the race of "add" and "incremental".
> >>
> >> Signed-off-by: Li Xiao Keng <lixiaokeng@huawei.com>
> >> Signed-off-by: Guanqin Miao <miaoguanqin@huawei.com>
> >
> > Hello Li Xiao Keng,
> > Thanks for the patch, I'm trying to get my head around this.
> > As I understand, you added a mapfile locking and you are failing when the
> > log is already claimed, but you prefer to return -1 and print error instead
> > of returning EBUSY, right?
> No, I do not prefer to return -1 and print error instead of returning EBUSY.
> At first, I had the same idea as Martin, thinking that map_lock() would fail
> when it is locked by someone else.Actually it is not fail but block. So now I
> just print error and go ahead if map_lock() fails, like map_lock() elsewhere.
Ok, thanks now it have more sense!
> >
> > If yes, it has nothing to do with fixing. It is just a obscure hack to make
> > it, let say more reliable. We can either get EBUSY and know that is
> > expected. I see no difference, your solution is same confusing as current
> > implementation.
> >
> > Also, you used map_lock(). I remember that initially I proposed - my bad.
> > We also rejected udev locking, as a buggy. In fact- I don't see ready
> > synchronization method to avoid this race in mdadm right now.
> >
> > The best way, I see is:
> > 1. Write metadata.
> > 2. Give udev chance to process the device.
> > 3. Add it manually if udev failed (or D_NO_UDEV).
> Is there any command to only write metadata? Or change implementation of
> --add ?
I think that we should change implementation for --add.
> >
> > If it does not work for you, we need to add synchronization, I know that
> > there is pidfile used for mdmon, perhaps we can reuse it? We don't need
> > something fancy, just something to keep all synchronised and block
> > incremental to led --add finish the action.
> >
> > We cannot use map_lock() and map_unlock() in this context. It claims lock on
> > mapfile which describes every array in your system (base details, there is
> > no info about members), see /var/run/mdadm. Add is just a tiny action to
> > add disk and we are not going to update the map. Taking this lock will make
> > --add more vulnerable, for example it will fail because another array is
> > assembled or created in the same time (I know that it will be hard to
> > achieve but it is another race - we don't want it).
> When I test this patch, I find map_lock()in --incremental is block but not
> fail. So I think it will not fail when another array is assembled.
Oh, it seems that flock() is waiting for the lock to be acquired, it is
blocking function. Now I read the documentation and I confirmed it.
> >
> > And we cannot allow for partially done action, you added this lock after
> > writing metadata (according to description), we are not taking this lock to
> > complete the action, just to hide race. Lock should be taken before
> > writing metadata, but we know that it will not hide an issue.
> Here I add map_lock() before writing metadata and ulock it until finish. Do
> you mean it should be before attempt_re_add()?
Indeed you wrote "before write_init_super in "mdadm --add". I was
against taking a map_lock() but now, when I understand that other processes will
wait I see this as a simplest way to keep all synchronized.
After your clarification- I don't have an objections to take this. Taking
map_lock is not the best approach because we lock all mddevices but I don't see
a strong need to have something better. --add is a user action, low risk of
race with other --add or --assemble/--incremental.
I was against it because I thought that it is not blocking0, i.e. we are failing
if lock cannot be obtained.
Reviewed-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Thanks,
Mariusz
next prev parent reply other threads:[~2023-09-12 8:01 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-07 11:37 [PATCH v3] Fix race of "mdadm --add" and "mdadm --incremental" Li Xiao Keng
2023-09-11 16:00 ` Mariusz Tkaczyk
2023-09-12 7:18 ` Li Xiao Keng
2023-09-12 7:58 ` Mariusz Tkaczyk [this message]
2023-10-07 9:26 ` Li Xiao Keng
2023-10-26 21:42 ` Jes Sorensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230912095828.000002a5@linux.intel.com \
--to=mariusz.tkaczyk@linux.intel.com \
--cc=colyli@suse.de \
--cc=jes@trained-monkey.org \
--cc=linux-raid@vger.kernel.org \
--cc=lixiaokeng@huawei.com \
--cc=louhongxiang@huawei.com \
--cc=miaoguanqin@huawei.com \
--cc=mwilck@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.