From: NeilBrown <neilb@suse.com>
To: Goldwyn Rodrigues <rgoldwyn@suse.de>, linux-raid@vger.kernel.org
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Subject: Re: [PATCH 1/6] md-cluster: Protect communication with mutexes
Date: Fri, 13 Nov 2015 08:59:08 +1100 [thread overview]
Message-ID: <87fv0ariab.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <5641631F.2080000@suse.de>
[-- Attachment #1: Type: text/plain, Size: 9182 bytes --]
On Tue, Nov 10 2015, Goldwyn Rodrigues wrote:
> On 11/09/2015 05:31 PM, NeilBrown wrote:
>> On Fri, Nov 06 2015, rgoldwyn@suse.de wrote:
>>
>>> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>>>
>>> Communication can happen through multiple threads. It is possible that
>>> one thread steps over another threads sequence. So, we use mutexes to
>>> protect both the send and receive sequences.
>>>
>>> We use a new flag CLUSTER_MD_COMM_LOCKED which is used to detect
>>> if communication is already locked. This is useful for cases where we need to
>>> take and hold the token DLM lock for a while. This bit is set only
>>> after locking communication.
>>
>> I don't understand what this flag is trying to achieve, but I'm fairly
>> sure it doesn't achieve it.
>
> Lets consider three specific cases of locking communication channels to
> show the conflict:
>
> 1. resync_info_update(): communication is locked and release for sending
> a message.
> 2. A regular md_update_sb(): communication is locked in
> metadata_update_start() and unlocked in metadata_update_finish() after
> writing to disk. In metadata_update_finish(), the sendmsg is called to
> send METADATA_UPDATED.
> 3. An add operation which culminates in a md_update_sb(): Here the
> communication is locked before initiating add. If the add is successful,
> it results in md_update_sb(). In md_update_sb(), metadata_update_start()
> has to check if the communication is already locked. If locked, it
> should not lock again.
Oh, I get it now - thanks.
Going back and looking at the original commit I can now see that it does
say that, but I didn't understand the implication at the time.
I might be wrong again, but I think this approach is broken.
The 'add disk' sequence does:
1/ ->add_new_disk which takes the lock
2/ other stuff, protected by the lock
3/ schedule a metadata update
4/ metadata update is initiated, which doesn't take the lock
because the flag is set
5/ metadata update completes, lock is dropped.
What if some other event causes the metadata update to happen during
'2'?
Maybe if you delayed setting MD_CLUSTER_COMM_LOCKED until after '2'.
i.e. set it when scheduling the update. So the flag means:
"lock has been taken for metadata update"
One tricky bit there would be if metadata_update_start() found the bit
wasn't set, and then entered mutex_lock() in lock_comm().
When MD_CLUSTER_COMM_LOCKED gets set we want that code to stop waiting
and start doing useful things. But it won't.
It might be easiest to make our own 'mutex' with a flag bit and a wait
queue.
Then calls to mutex_lock become
wait_event(mddev->wait, !test_and_set_bit(bit, &cinfo->state));
mutex_unlock becoems
clear_bit(bit, &cinfo->state);
and the mutex_lock used from metadata_update_start is
wait_event(mddev->wait, !test_and_set_bit(bit, &cinfo->state) ||
test_and_clear_bit(MD_CLUSTER_COMM_LOCKED,
....));
That might work.... providing it is well documented.
(and you are right - mutex_trylock wouldn't have helped, I was
completely misunderstanding).
Thanks,
NeilBrown
>
> The flag is used only for case 3. If the communication is already
> locked, it should not lock again. This flag is set only after
> lock_comm() has executed, but is checked for in metadata_update_start().
> This should insure that any of the operations 1, 2 or 3 do not interfere
> with each other.
>
> I am not sure if I have made the best effort to explain this. I had a
> tough time getting it right (which may or may not be complete).
>
>>
>> Maybe if it was test_and_set_bit in metadata_update_start, it might make
>> sense. But then I would suggest that clearing the bit be moved to
>> unlock_comm()
>>
>> Do you just want to use mutex_try_lock() to detect if communication is
>> already locked?
>
> Consider a race between md_update_sb() and sendmsg() [Case 1. and 2]
>
> mutex_try_lock() may not work in this situation because lock_comm()
> could have been called by sendmsg(), which will release it as soon as
> the message is sent. In the meantime (while the lock is locked), if a
> metadata_update_sb() operation executes. It will not relock the
> communication. This will result in the WARN_ON in unlock_comm() since
> sendmsg() sequence had already unlocked it.
>
>
>>
>>>
>>> Also, Remove stray/ununsed sb_mutex.
>>
>> I already removed that it mainline - I should have mentioned.
>>
>> Thanks,
>> NeilBrown
>>
>>
>>>
>>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
>>> ---
>>> drivers/md/md-cluster.c | 26 +++++++++++++++++++++-----
>>> 1 file changed, 21 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
>>> index b73729b..a93734e 100644
>>> --- a/drivers/md/md-cluster.c
>>> +++ b/drivers/md/md-cluster.c
>>> @@ -47,6 +47,7 @@ struct resync_info {
>>> #define MD_CLUSTER_WAITING_FOR_NEWDISK 1
>>> #define MD_CLUSTER_SUSPEND_READ_BALANCING 2
>>> #define MD_CLUSTER_BEGIN_JOIN_CLUSTER 3
>>> +#define MD_CLUSTER_COMM_LOCKED 4
>>>
>>>
>>> struct md_cluster_info {
>>> @@ -54,7 +55,8 @@ struct md_cluster_info {
>>> dlm_lockspace_t *lockspace;
>>> int slot_number;
>>> struct completion completion;
>>> - struct mutex sb_mutex;
>>> + struct mutex recv_mutex;
>>> + struct mutex send_mutex;
>>> struct dlm_lock_resource *bitmap_lockres;
>>> struct dlm_lock_resource *resync_lockres;
>>> struct list_head suspend_list;
>>> @@ -503,6 +505,7 @@ static void recv_daemon(struct md_thread *thread)
>>> struct cluster_msg msg;
>>> int ret;
>>>
>>> + mutex_lock(&cinfo->recv_mutex);
>>> /*get CR on Message*/
>>> if (dlm_lock_sync(message_lockres, DLM_LOCK_CR)) {
>>> pr_err("md/raid1:failed to get CR on MESSAGE\n");
>>> @@ -529,6 +532,7 @@ static void recv_daemon(struct md_thread *thread)
>>> ret = dlm_unlock_sync(message_lockres);
>>> if (unlikely(ret != 0))
>>> pr_info("unlock msg failed return %d\n", ret);
>>> + mutex_unlock(&cinfo->recv_mutex);
>>> }
>>>
>>> /* lock_comm()
>>> @@ -542,20 +546,22 @@ static int lock_comm(struct md_cluster_info *cinfo)
>>> {
>>> int error;
>>>
>>> - if (cinfo->token_lockres->mode == DLM_LOCK_EX)
>>> - return 0;
>>> + mutex_lock(&cinfo->send_mutex);
>>>
>>> error = dlm_lock_sync(cinfo->token_lockres, DLM_LOCK_EX);
>>> if (error)
>>> pr_err("md-cluster(%s:%d): failed to get EX on TOKEN (%d)\n",
>>> __func__, __LINE__, error);
>>> + mutex_lock(&cinfo->recv_mutex);
>>> return error;
>>> }
>>>
>>> static void unlock_comm(struct md_cluster_info *cinfo)
>>> {
>>> WARN_ON(cinfo->token_lockres->mode != DLM_LOCK_EX);
>>> + mutex_unlock(&cinfo->recv_mutex);
>>> dlm_unlock_sync(cinfo->token_lockres);
>>> + mutex_unlock(&cinfo->send_mutex);
>>> }
>>>
>>> /* __sendmsg()
>>> @@ -709,7 +715,8 @@ static int join(struct mddev *mddev, int nodes)
>>> init_completion(&cinfo->completion);
>>> set_bit(MD_CLUSTER_BEGIN_JOIN_CLUSTER, &cinfo->state);
>>>
>>> - mutex_init(&cinfo->sb_mutex);
>>> + mutex_init(&cinfo->send_mutex);
>>> + mutex_init(&cinfo->recv_mutex);
>>> mddev->cluster_info = cinfo;
>>>
>>> memset(str, 0, 64);
>>> @@ -839,7 +846,12 @@ static int slot_number(struct mddev *mddev)
>>>
>>> static int metadata_update_start(struct mddev *mddev)
>>> {
>>> - return lock_comm(mddev->cluster_info);
>>> + struct md_cluster_info *cinfo = mddev->cluster_info;
>>> + int ret;
>>> + if (test_and_clear_bit(MD_CLUSTER_COMM_LOCKED, &cinfo->state))
>>> + return 0;
>>> + ret = lock_comm(cinfo);
>>> + return ret;
>>> }
>>>
>>> static int metadata_update_finish(struct mddev *mddev)
>>> @@ -864,6 +876,7 @@ static int metadata_update_finish(struct mddev *mddev)
>>> ret = __sendmsg(cinfo, &cmsg);
>>> } else
>>> pr_warn("md-cluster: No good device id found to send\n");
>>> + clear_bit(MD_CLUSTER_COMM_LOCKED, &cinfo->state);
>>> unlock_comm(cinfo);
>>> return ret;
>>> }
>>> @@ -871,6 +884,7 @@ static int metadata_update_finish(struct mddev *mddev)
>>> static void metadata_update_cancel(struct mddev *mddev)
>>> {
>>> struct md_cluster_info *cinfo = mddev->cluster_info;
>>> + clear_bit(MD_CLUSTER_COMM_LOCKED, &cinfo->state);
>>> unlock_comm(cinfo);
>>> }
>>>
>>> @@ -945,6 +959,7 @@ static int add_new_disk(struct mddev *mddev, struct md_rdev *rdev)
>>> memcpy(cmsg.uuid, uuid, 16);
>>> cmsg.raid_slot = cpu_to_le32(rdev->desc_nr);
>>> lock_comm(cinfo);
>>> + set_bit(MD_CLUSTER_COMM_LOCKED, &cinfo->state);
>>> ret = __sendmsg(cinfo, &cmsg);
>>> if (ret)
>>> return ret;
>>> @@ -964,6 +979,7 @@ static int add_new_disk(struct mddev *mddev, struct md_rdev *rdev)
>>> static void add_new_disk_cancel(struct mddev *mddev)
>>> {
>>> struct md_cluster_info *cinfo = mddev->cluster_info;
>>> + clear_bit(MD_CLUSTER_COMM_LOCKED, &cinfo->state);
>>> unlock_comm(cinfo);
>>> }
>>>
>>> --
>>> 1.8.5.6
>
> --
> Goldwyn
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
next prev parent reply other threads:[~2015-11-12 21:59 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-06 3:50 [PATCH 1/6] md-cluster: Protect communication with mutexes rgoldwyn
2015-11-06 3:50 ` [PATCH 2/6] md-cluster: Avoid the resync ping-pong rgoldwyn
2015-11-09 23:39 ` NeilBrown
2015-11-06 3:50 ` [PATCH 3/6] md-cluster: remove a disk asynchronously from cluster environment rgoldwyn
2015-11-09 23:43 ` NeilBrown
2015-11-06 3:50 ` [PATCH 4/6] md-cluster: Defer MD reloading to mddev->thread rgoldwyn
2015-11-09 23:48 ` NeilBrown
2015-11-10 3:26 ` Goldwyn Rodrigues
2015-11-20 8:25 ` [v2 PATCH] " Guoqing Jiang
2015-11-06 3:50 ` [PATCH 5/6] md-cluster: Fix the remove sequence with the new MD reload code rgoldwyn
2015-11-09 23:49 ` NeilBrown
2015-11-06 3:50 ` [PATCH 6/6] md-cluster: Allow spare devices to be marked as faulty rgoldwyn
2015-11-09 23:51 ` NeilBrown
2015-11-09 23:31 ` [PATCH 1/6] md-cluster: Protect communication with mutexes NeilBrown
2015-11-10 3:23 ` Goldwyn Rodrigues
2015-11-12 21:59 ` NeilBrown [this message]
2015-11-20 8:27 ` [v2 PATCH 2/5] " Guoqing Jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87fv0ariab.fsf@notabene.neil.brown.name \
--to=neilb@suse.com \
--cc=linux-raid@vger.kernel.org \
--cc=rgoldwyn@suse.com \
--cc=rgoldwyn@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).