From: Abhijit Bhopatkar <abhopatk@cisco.com>
To: linux-raid@vger.kernel.org, Lidong Zhong <lzhong@suse.com>,
Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: "Reese Faucette (rfaucett)" <rfaucett@cisco.com>
Subject: Re: [PATCH] md-cluster: avoid deadlock on MESSAGE lock resource
Date: Fri, 08 May 2015 18:44:25 +0530 [thread overview]
Message-ID: <554CB6B1.3030206@cisco.com> (raw)
In-Reply-To: <554CB5DB.4020305@cisco.com>
On 08/05/15 6:40 pm, Abhijit Bhopatkar wrote:
>
> Every receiver has CR lock on MESSAGE while processing the message. When
> every receiver releases ACK lock and for some reason fails to grab EX on
> MESSAGE resource in time, a waiting sender could queue an EX on MESSAGE
> instead. Now when receiver queues its up convert request on MESSAGE it
> will end up in a deadlock situation.
>
> Setting NOQUEUE flag on MESSAGE lock resource while grabbing the EX on
> MESSAGE on sender will avoid this deadlock. If sender can not grab
> MESSAGE lock immediately it should retry until the lock is granted.
>
> Signed-off-by: Abhijit Bhopatkar <abhopatk@cisco.com>
> ---
> This has been minimally tested on a three node cluster.
>
I have tested standard mdadm operations (create, assemble etc).
What more testing would you want me to do on this before its considered
ready?
Regards,
Abhijit
> drivers/md/md-cluster.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
> index fcfc4b9..04ac309 100644
> --- a/drivers/md/md-cluster.c
> +++ b/drivers/md/md-cluster.c
> @@ -512,7 +512,10 @@ static void unlock_comm(struct md_cluster_info *cinfo)
> * This function performs the actual sending of the message. This function is
> * usually called after performing the encompassing operation
> * The function:
> - * 1. Grabs the message lockresource in EX mode
> + * 1. Grabs the message lockresource in EX. Do not queue the request if not granted
> + immediately. This avoids deadlock with receivers when receivers try to
> + upconvert CR to EX of message lockresource. The thread will retry until the
> + request is granted.
> * 2. Copies the message to the message LVB
> * 3. Downconverts message lockresource to CR
> * 4. Upconverts ack lock resource from CR to EX. This forces the BAST on other nodes
> @@ -526,12 +529,19 @@ static int __sendmsg(struct md_cluster_info *cinfo, struct cluster_msg *cmsg)
> int slot = cinfo->slot_number - 1;
>
> cmsg->slot = cpu_to_le32(slot);
> - /*get EX on Message*/
> +
> + /* get EX on Message with noqueue flag */
> + cinfo->message_lockres->flags |= DLM_LKF_NOQUEUE;
> +
> +retry:
> error = dlm_lock_sync(cinfo->message_lockres, DLM_LOCK_EX);
> if (error) {
> + if (error == -EAGAIN)
> + goto retry;
> pr_err("md-cluster: failed to get EX on MESSAGE (%d)\n", error);
> goto failed_message;
> }
> + cinfo->message_lockres->flags &= ~DLM_LKF_NOQUEUE;
>
> memcpy(cinfo->message_lockres->lksb.sb_lvbptr, (void *)cmsg,
> sizeof(struct cluster_msg));
> -- 2.1.0
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2015-05-08 13:14 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-08 13:10 [PATCH] md-cluster: avoid deadlock on MESSAGE lock resource Abhijit Bhopatkar
2015-05-08 13:14 ` Abhijit Bhopatkar [this message]
2015-05-13 2:05 ` Lidong Zhong
2015-05-16 20:58 ` Goldwyn Rodrigues
2015-05-25 14:26 ` Abhijit Bhopatkar
2015-05-26 14:44 ` Goldwyn Rodrigues
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=554CB6B1.3030206@cisco.com \
--to=abhopatk@cisco.com \
--cc=linux-raid@vger.kernel.org \
--cc=lzhong@suse.com \
--cc=rfaucett@cisco.com \
--cc=rgoldwyn@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).