All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sunil Mushran <sunil.mushran@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH] ocfs2/dlm: avoid incorrect bit set in refmap on recovery master
Date: Thu, 29 Jul 2010 11:27:14 -0700	[thread overview]
Message-ID: <4C51C802.2050905@oracle.com> (raw)
In-Reply-To: <201007291239.o6TCd9Vd028074@rcsinet15.oracle.com>

comments inlined

On 07/29/2010 05:37 AM, Wengang Wang wrote:
> In the following situation, there remains an incorrect bit in refmap on the
> recovery master. Finally the recovery master will fail at purging the lockres
> due to the incorrect bit in refmap.
>
> 1) node A has no interest on lockres A any longer, so it is purging it.
> 2) the owner of lockres A is node B, so node A is sending de-ref message
> to node B.
> 3) at this time, node B crashed. node C becomes the recovery master. it recovers
> lockres A(because the master is the dead node B).
> 4) node A migrated lockres A to node C with a refbit there.
> 5) node A failed to send de-ref message to node B because it crashed. The failure
> is ignored. no other action is done for lockres A any more.
>
> For mormal, re-send the deref message to it to recovery master can fix it. Well,
> ignoring the failure of deref to the original master and not recovering the lockres
> to recovery master has the same effect. And the later is simpler.
>
> Signed-off-by: Wengang Wang<wen.gang.wang@oracle.com>
> ---
>   fs/ocfs2/dlm/dlmrecovery.c |   17 +++++++++++++----
>   fs/ocfs2/dlm/dlmthread.c   |   28 +++++++++++++++++-----------
>   2 files changed, 30 insertions(+), 15 deletions(-)
>
> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> index 9dfaac7..2b57cc4 100644
> --- a/fs/ocfs2/dlm/dlmrecovery.c
> +++ b/fs/ocfs2/dlm/dlmrecovery.c
> @@ -1997,6 +1997,8 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt *dlm,
>   	struct list_head *queue;
>   	struct dlm_lock *lock, *next;
>
> +	assert_spin_locked(&dlm->spinlock);
> +	assert_spin_locked(&res->spinlock);
>   	res->state |= DLM_LOCK_RES_RECOVERING;
>   	if (!list_empty(&res->recovering)) {
>   		mlog(0,
> @@ -2334,11 +2336,18 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node)
>   					     dlm->name, res->lockname.len,
>   					     res->lockname.name, dead_node);
>
> -				/* the wake_up for this will happen when the
> -				 * RECOVERING flag is dropped later */
> -				res->state&= ~DLM_LOCK_RES_DROPPING_REF;
> +				/*
> +				 * don't migrate a lockres which is in progress
> +				 * of dropping ref
> +				 */
> +				if (res->state&  DLM_LOCK_RES_DROPPING_REF) {
> +					mlog(ML_NOTICE, "%.*s ignored for "
> +					     "migration\n", res->lockname.len,
> +					     res->lockname.name);
> +				} else
> +					dlm_move_lockres_to_recovery_list(dlm,
> +									  res);
>
> -				dlm_move_lockres_to_recovery_list(dlm, res);
>   			} else if (res->owner == dlm->node_num) {
>   				dlm_free_dead_locks(dlm, res, dead_node);
>   				__dlm_lockres_calc_usage(dlm, res);
>    

So the code reads like this.

                                 if (res->state & DLM_LOCK_RES_DROPPING_REF)
                                         mlog(0, "%s:%.*s: owned by "
                                              "dead node %u, this node was "
                                              "dropping its ref when it 
died. "
                                              "continue, dropping the 
flag.\n",
                                              dlm->name, res->lockname.len,
                                              res->lockname.name, 
dead_node);

                                 /*
                                  * don't migrate a lockres which is in 
progress
                                  * of dropping ref
                                  */
                                 if (res->state & 
DLM_LOCK_RES_DROPPING_REF) {
                                         mlog(ML_NOTICE, "%.*s ignored for "
                                              "migration\n", 
res->lockname.len,
                                              res->lockname.name);
                                 } else
                                         
dlm_move_lockres_to_recovery_list(dlm,
                                                                           res);

The first mlog should be removed. It is incorrect. The second mlog
is more appropriate. Could be reworded ("Ignore %.*s for recovery as it is
being freed").

The comment can just be removed. The mlog says it all.

> diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
> index dd78ca3..47420ce 100644
> --- a/fs/ocfs2/dlm/dlmthread.c
> +++ b/fs/ocfs2/dlm/dlmthread.c
> @@ -92,17 +92,23 @@ int __dlm_lockres_has_locks(struct dlm_lock_resource *res)
>    * truly ready to be freed. */
>   int __dlm_lockres_unused(struct dlm_lock_resource *res)
>   {
> -	if (!__dlm_lockres_has_locks(res)&&
> -	    (list_empty(&res->dirty)&&  !(res->state&  DLM_LOCK_RES_DIRTY))) {
> -		/* try not to scan the bitmap unless the first two
> -		 * conditions are already true */
> -		int bit = find_next_bit(res->refmap, O2NM_MAX_NODES, 0);
> -		if (bit>= O2NM_MAX_NODES) {
> -			/* since the bit for dlm->node_num is not
> -			 * set, inflight_locks better be zero */
> -			BUG_ON(res->inflight_locks != 0);
> -			return 1;
> -		}
> +	int bit;
> +
> +	if (__dlm_lockres_has_locks(res))
> +		return 0;
> +
> +	if (!list_empty(&res->dirty) || res->state&  DLM_LOCK_RES_DIRTY)
> +		return 0;
> +
> +	if (res->state&  DLM_LOCK_RES_RECOVERING)
> +		return 0;
> +
> +	bit = find_next_bit(res->refmap, O2NM_MAX_NODES, 0);
> +	if (bit>= O2NM_MAX_NODES) {
> +		/* since the bit for dlm->node_num is not
> +		 * set, inflight_locks better be zero */
> +		BUG_ON(res->inflight_locks != 0);
> +		return 1;
>   	}
>   	return 0;
>   }
>    


I like it. But you reversed the flow at the end. How about...

	bit = find_next_bit(res->refmap, O2NM_MAX_NODES, 0);
	if (bit<  O2NM_MAX_NODES)
		return 0;

	/*
	 * Since the bit for dlm->node_num is not set, inflight_locks
	 * better be zero
	 */
	BUG_ON(res->inflight_locks != 0);

	return 1;

  parent reply	other threads:[~2010-07-29 18:27 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-23 12:15 [Ocfs2-devel] [PATCH] ocfs2/dlm: avoid incorrect bit set in refmap on recovery master Wengang Wang
2010-07-23 22:27 ` Srinivas Eeda
2010-07-26  1:42   ` Wengang Wang
2010-07-29 12:37   ` Wengang Wang
2010-07-29 17:51     ` Srinivas Eeda
2010-07-29 18:27     ` Sunil Mushran [this message]
2010-07-30  8:14       ` Wengang Wang
2010-07-30 17:30         ` Sunil Mushran
2010-08-07 18:40         ` Joel Becker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C51C802.2050905@oracle.com \
    --to=sunil.mushran@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.