linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Goldwyn Rodrigues <rgoldwyn@suse.de>
To: Guoqing Jiang <gqjiang@suse.com>, neilb@suse.de
Cc: linux-raid@vger.kernel.org
Subject: Re: [PATCH 06/12] md-cluster: add the error check if failed to get dlm lock
Date: Mon, 27 Jul 2015 11:48:13 -0500	[thread overview]
Message-ID: <55B660CD.2030600@suse.de> (raw)
In-Reply-To: <1436518883-12783-3-git-send-email-gqjiang@suse.com>

Hi Guoqing,

On 07/10/2015 04:01 AM, Guoqing Jiang wrote:
> In complicated cluster environment, it is possible that the
> dlm lock couldn't be get/convert on purpose, the related err
> info is added for better debug potential issue.
>
> For lockres_free, if the lock is blocking by a lock request or
> conversion request, then dlm_unlock just put it back to grant
> queue, so need to ensure the lock is free finally.


I cannot think of a scenario where a DLM_CANCEL will be returned. Could 
you explain the situation a bit more?

>
> Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
> ---
>   drivers/md/md-cluster.c | 41 +++++++++++++++++++++++++++++++++++------
>   1 file changed, 35 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
> index 2a57f19..b80a689 100644
> --- a/drivers/md/md-cluster.c
> +++ b/drivers/md/md-cluster.c
> @@ -166,10 +166,24 @@ out_err:
>
>   static void lockres_free(struct dlm_lock_resource *res)
>   {
> +	int ret;
> +
>   	if (!res)
>   		return;
>
> -	dlm_unlock(res->ls, res->lksb.sb_lkid, 0, &res->lksb, res);
> +	/* cancel a lock request or a conversion request that is blocked */
> +	res->flags |= DLM_LKF_CANCEL;
> +retry:
> +	ret = dlm_unlock(res->ls, res->lksb.sb_lkid, 0, &res->lksb, res);
> +	if (unlikely(ret != 0)) {
> +		pr_info("%s: failed to unlock %s return %d\n", __func__, res->name, ret);
> +
> +		/* if a lock conversion is cancelled, then the lock is put
> +		 * back to grant queue, need to ensure it is unlocked */
> +		if (ret == -DLM_ECANCEL)
> +			goto retry;
> +	}
> +	res->flags &= ~DLM_LKF_CANCEL;
>   	wait_for_completion(&res->completion);
>
>   	kfree(res->name);
> @@ -474,6 +488,7 @@ static void recv_daemon(struct md_thread *thread)
>   	struct dlm_lock_resource *ack_lockres = cinfo->ack_lockres;
>   	struct dlm_lock_resource *message_lockres = cinfo->message_lockres;
>   	struct cluster_msg msg;
> +	int ret;
>
>   	/*get CR on Message*/
>   	if (dlm_lock_sync(message_lockres, DLM_LOCK_CR)) {
> @@ -486,13 +501,21 @@ static void recv_daemon(struct md_thread *thread)
>   	process_recvd_msg(thread->mddev, &msg);
>
>   	/*release CR on ack_lockres*/
> -	dlm_unlock_sync(ack_lockres);
> +	ret = dlm_unlock_sync(ack_lockres);
> +	if (unlikely(ret != 0))
> +		pr_info("unlock ack failed return %d\n", ret);
>   	/*up-convert to PR on message_lockres*/
> -	dlm_lock_sync(message_lockres, DLM_LOCK_PR);
> +	ret = dlm_lock_sync(message_lockres, DLM_LOCK_PR);
> +	if (unlikely(ret != 0))
> +		pr_info("lock PR on msg failed return %d\n", ret);
>   	/*get CR on ack_lockres again*/
> -	dlm_lock_sync(ack_lockres, DLM_LOCK_CR);
> +	ret = dlm_lock_sync(ack_lockres, DLM_LOCK_CR);
> +	if (unlikely(ret != 0))
> +		pr_info("lock CR on ack failed return %d\n", ret);
>   	/*release CR on message_lockres*/
> -	dlm_unlock_sync(message_lockres);
> +	ret = dlm_unlock_sync(message_lockres);
> +	if (unlikely(ret != 0))
> +		pr_info("unlock msg failed return %d\n", ret);
>   }
>
>   /* lock_comm()
> @@ -567,7 +590,13 @@ static int __sendmsg(struct md_cluster_info *cinfo, struct cluster_msg *cmsg)
>   	}
>
>   failed_ack:
> -	dlm_unlock_sync(cinfo->message_lockres);
> +	error = dlm_unlock_sync(cinfo->message_lockres);
> +	if (unlikely(error != 0)) {
> +		pr_err("md-cluster: failed convert to NL on MESSAGE(%d)\n",
> +			error);
> +		/* in case the message can't be released due to some reason */
> +		goto failed_ack;
> +	}
>   failed_message:
>   	return error;
>   }
>

-- 
Goldwyn

  reply	other threads:[~2015-07-27 16:48 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-10  8:54 [PATCH 00/12] md-cluster: code improvement, fixs and new feature Guoqing Jiang
2015-07-10  8:54 ` [PATCH 01/12] md-cluster: use %pU to print UUIDs Guoqing Jiang
2015-07-27 16:21   ` Goldwyn Rodrigues
2015-07-10  8:54 ` [PATCH 02/12] md-cluster: split recover_slot for future code reuse Guoqing Jiang
2015-07-10  8:54 ` [PATCH 03/12] md-cluster: transfer the resync ownership to another node Guoqing Jiang
2015-07-27 16:24   ` Goldwyn Rodrigues
2015-07-10  9:01 ` [PATCH 04/12] md-cluster: fix deadlock issue on message lock Guoqing Jiang
2015-07-10  9:01   ` [PATCH 05/12] md-cluster: init completion within lockres_init Guoqing Jiang
2015-07-27 16:44     ` Goldwyn Rodrigues
2015-07-10  9:01   ` [PATCH 06/12] md-cluster: add the error check if failed to get dlm lock Guoqing Jiang
2015-07-27 16:48     ` Goldwyn Rodrigues [this message]
2015-07-28  3:04       ` Guoqing Jiang
2015-07-29  0:22         ` NeilBrown
2015-07-29  2:03           ` Guoqing Jiang
2015-07-29 23:39         ` Goldwyn Rodrigues
2015-07-10  9:01   ` [PATCH 07/12] md-cluster: init suspend_list and suspend_lock early in join Guoqing Jiang
2015-07-27 16:29     ` Goldwyn Rodrigues
2015-07-10  9:01   ` [PATCH 08/12] md-cluster: remove the unused sb_lock Guoqing Jiang
2015-07-27 16:29     ` Goldwyn Rodrigues
2015-07-10  9:01   ` [PATCH 09/12] md-cluster: add missed lockres_free Guoqing Jiang
2015-07-27 16:30     ` Goldwyn Rodrigues
2015-07-10  9:01   ` [PATCH 10/12] md-cluster: only call complete(&cinfo->completion) when node join cluster Guoqing Jiang
2015-07-27 16:49     ` Goldwyn Rodrigues
2015-07-10  9:01   ` [PATCH 11/12] md-cluster: Read the disk bitmap sb and check if it needs recovery Guoqing Jiang
2015-07-27 16:31     ` Goldwyn Rodrigues
2015-07-10  9:01   ` [PATCH 12/12] md-cluster: handle error situations more precisely in lockres_init Guoqing Jiang
2015-07-27 16:34     ` Goldwyn Rodrigues
2015-07-28  3:05       ` Guoqing Jiang
2015-07-27 16:25   ` [PATCH 04/12] md-cluster: fix deadlock issue on message lock Goldwyn Rodrigues

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55B660CD.2030600@suse.de \
    --to=rgoldwyn@suse.de \
    --cc=gqjiang@suse.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).