ocfs2-devel.oss.oracle.com archive mirror
 help / color / mirror / Atom feed
* [Ocfs2-devel] [PATCH 1/1] ocfs2/dlm: resend deref to new master if recovery occures
@ 2010-05-24 14:35 Wengang Wang
  2010-05-24 14:41 ` Wengang Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Wengang Wang @ 2010-05-24 14:35 UTC (permalink / raw)
  To: ocfs2-devel

When purge a lockres, we unhash the lockres ignore the result of deref request
and ignore the lockres state.
There is a problem that rarely happen. It can happen when recovery take places.
Say node A is the master of the lockres with node B wants to deref and there is
a node C. If things happen in the following order, the bug is triggered.

1) node B send DEREF to node A for lockres A and waiting for result.
2) node A crashed, node C become the recovery master.
3) node C mastered lockres A with node B has a ref on it.
4) node B goes to unhashes the lockres A with a ref on node C.
 
After step 4), if a umount comes on node C, it will hang at migrate lockres A 
since node B has a ref on it.

The fix is that we check if recovery happened on lockres A after sending DEREF
request. If that happened, we keep lockres A in hash and in purge list for
another try to send DEREF to the new master(node C). So that node C can clear
the incorrect refbit.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
---
 fs/ocfs2/dlm/dlmthread.c |   44 +++++++++++++++++++++++++++++++++++++++-----
 1 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
index d4f73ca..946ff1a 100644
--- a/fs/ocfs2/dlm/dlmthread.c
+++ b/fs/ocfs2/dlm/dlmthread.c
@@ -158,6 +158,8 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
 	int master;
 	int ret = 0;
 
+	assert_spin_locked(&dlm->spinlock);
+
 	spin_lock(&res->spinlock);
 	if (!__dlm_lockres_unused(res)) {
 		mlog(0, "%s:%.*s: tried to purge but not unused\n",
@@ -211,18 +213,30 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
 	}
 
 	spin_lock(&res->spinlock);
+	/*
+	 * we dropped dlm->spinlock and res->spinlock when sending the DEREF
+	 * request, there is a chance that a recovery happend on this lockres.
+	 * in that case, we have to DEREF to the new master when recovery
+	 * finished. otherwise, there can be an incorrect ref on the lockres
+	 * on the new master  on behalf of this node.
+	 */
+	if (unlikely(res->state & DLM_LOCK_RES_RECOVERING)) {
+		spin_unlock(&res->spinlock);
+		return -EAGAIN;
+	}
+
 	if (!list_empty(&res->purge)) {
 		mlog(0, "removing lockres %.*s:%p from purgelist, "
 		     "master = %d\n", res->lockname.len, res->lockname.name,
 		     res, master);
 		list_del_init(&res->purge);
-		spin_unlock(&res->spinlock);
+		/* not the last ref */
 		dlm_lockres_put(res);
 		dlm->purge_count--;
-	} else
-		spin_unlock(&res->spinlock);
+	}
 
 	__dlm_unhash_lockres(res);
+	spin_unlock(&res->spinlock);
 
 	/* lockres is not in the hash now.  drop the flag and wake up
 	 * any processes waiting in dlm_get_lock_resource. */
@@ -241,6 +255,9 @@ static void dlm_run_purge_list(struct dlm_ctxt *dlm,
 	unsigned int run_max, unused;
 	unsigned long purge_jiffies;
 	struct dlm_lock_resource *lockres;
+	int ret;
+
+#define DLM_WAIT_RECOVERY_FINISH_MS 500
 
 	spin_lock(&dlm->spinlock);
 	run_max = dlm->purge_count;
@@ -280,8 +297,25 @@ static void dlm_run_purge_list(struct dlm_ctxt *dlm,
 
 		/* This may drop and reacquire the dlm spinlock if it
 		 * has to do migration. */
-		if (dlm_purge_lockres(dlm, lockres))
-			BUG();
+		ret = dlm_purge_lockres(dlm, lockres);
+		if (ret) {
+			if (ret == -EAGAIN) {
+				/*
+				 * recovery occured on this lockres. try to
+				 * DEREF to the new master.
+				 */
+				dlm_lockres_put(lockres);
+				spin_unlock(&dlm->spinlock);
+				mlog(ML_ERROR, "retry to purge %.*s after %d"
+				     "ms\n", lockres->lockname.len,
+				     lockres->lockname.name,
+				     DLM_WAIT_RECOVERY_FINISH_MS);
+				msleep(DLM_WAIT_RECOVERY_FINISH_MS);
+				spin_lock(&dlm->spinlock);
+				continue;
+			} else
+				BUG();
+		}
 
 		dlm_lockres_put(lockres);
 
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Ocfs2-devel] [PATCH 1/1] ocfs2/dlm: resend deref to new master if recovery occures
  2010-05-24 14:35 Wengang Wang
@ 2010-05-24 14:41 ` Wengang Wang
  0 siblings, 0 replies; 9+ messages in thread
From: Wengang Wang @ 2010-05-24 14:41 UTC (permalink / raw)
  To: ocfs2-devel

s/ignore/ignoring/g  for the following 2 lines. 
> When purge a lockres, we unhash the lockres ignore the result of deref request
> and ignore the lockres state.

regards,
wengang.

> There is a problem that rarely happen. It can happen when recovery take places.
> Say node A is the master of the lockres with node B wants to deref and there is
> a node C. If things happen in the following order, the bug is triggered.
> 
> 1) node B send DEREF to node A for lockres A and waiting for result.
> 2) node A crashed, node C become the recovery master.
> 3) node C mastered lockres A with node B has a ref on it.
> 4) node B goes to unhashes the lockres A with a ref on node C.
>  
> After step 4), if a umount comes on node C, it will hang at migrate lockres A 
> since node B has a ref on it.
> 
> The fix is that we check if recovery happened on lockres A after sending DEREF
> request. If that happened, we keep lockres A in hash and in purge list for
> another try to send DEREF to the new master(node C). So that node C can clear
> the incorrect refbit.
> 
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> ---
>  fs/ocfs2/dlm/dlmthread.c |   44 +++++++++++++++++++++++++++++++++++++++-----
>  1 files changed, 39 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
> index d4f73ca..946ff1a 100644
> --- a/fs/ocfs2/dlm/dlmthread.c
> +++ b/fs/ocfs2/dlm/dlmthread.c
> @@ -158,6 +158,8 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
>  	int master;
>  	int ret = 0;
>  
> +	assert_spin_locked(&dlm->spinlock);
> +
>  	spin_lock(&res->spinlock);
>  	if (!__dlm_lockres_unused(res)) {
>  		mlog(0, "%s:%.*s: tried to purge but not unused\n",
> @@ -211,18 +213,30 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
>  	}
>  
>  	spin_lock(&res->spinlock);
> +	/*
> +	 * we dropped dlm->spinlock and res->spinlock when sending the DEREF
> +	 * request, there is a chance that a recovery happend on this lockres.
> +	 * in that case, we have to DEREF to the new master when recovery
> +	 * finished. otherwise, there can be an incorrect ref on the lockres
> +	 * on the new master  on behalf of this node.
> +	 */
> +	if (unlikely(res->state & DLM_LOCK_RES_RECOVERING)) {
> +		spin_unlock(&res->spinlock);
> +		return -EAGAIN;
> +	}
> +
>  	if (!list_empty(&res->purge)) {
>  		mlog(0, "removing lockres %.*s:%p from purgelist, "
>  		     "master = %d\n", res->lockname.len, res->lockname.name,
>  		     res, master);
>  		list_del_init(&res->purge);
> -		spin_unlock(&res->spinlock);
> +		/* not the last ref */
>  		dlm_lockres_put(res);
>  		dlm->purge_count--;
> -	} else
> -		spin_unlock(&res->spinlock);
> +	}
>  
>  	__dlm_unhash_lockres(res);
> +	spin_unlock(&res->spinlock);
>  
>  	/* lockres is not in the hash now.  drop the flag and wake up
>  	 * any processes waiting in dlm_get_lock_resource. */
> @@ -241,6 +255,9 @@ static void dlm_run_purge_list(struct dlm_ctxt *dlm,
>  	unsigned int run_max, unused;
>  	unsigned long purge_jiffies;
>  	struct dlm_lock_resource *lockres;
> +	int ret;
> +
> +#define DLM_WAIT_RECOVERY_FINISH_MS 500
>  
>  	spin_lock(&dlm->spinlock);
>  	run_max = dlm->purge_count;
> @@ -280,8 +297,25 @@ static void dlm_run_purge_list(struct dlm_ctxt *dlm,
>  
>  		/* This may drop and reacquire the dlm spinlock if it
>  		 * has to do migration. */
> -		if (dlm_purge_lockres(dlm, lockres))
> -			BUG();
> +		ret = dlm_purge_lockres(dlm, lockres);
> +		if (ret) {
> +			if (ret == -EAGAIN) {
> +				/*
> +				 * recovery occured on this lockres. try to
> +				 * DEREF to the new master.
> +				 */
> +				dlm_lockres_put(lockres);
> +				spin_unlock(&dlm->spinlock);
> +				mlog(ML_ERROR, "retry to purge %.*s after %d"
> +				     "ms\n", lockres->lockname.len,
> +				     lockres->lockname.name,
> +				     DLM_WAIT_RECOVERY_FINISH_MS);
> +				msleep(DLM_WAIT_RECOVERY_FINISH_MS);
> +				spin_lock(&dlm->spinlock);
> +				continue;
> +			} else
> +				BUG();
> +		}
>  
>  		dlm_lockres_put(lockres);
>  
> -- 
> 1.6.6.1
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Ocfs2-devel] [PATCH 1/1] ocfs2/dlm: resend deref to new master if recovery occures
@ 2010-05-24 19:48 Srinivas Eeda
  2010-05-25  2:01 ` Wengang Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Srinivas Eeda @ 2010-05-24 19:48 UTC (permalink / raw)
  To: ocfs2-devel

thanks for doing this patch. I have a little comment, wondering if there 
could be a window where node B sent the lock info to node C as part of 
recovery and removed flag DLM_LOCK_RES_RECOVERING while dlm_thread was 
still purging it. In that case dlm_thread will still continue to remove 
it from hash list.

Also, this patch puts dlm_thread to sleep ... may be it's ok, but 
wondering if we can avoid that.

delay deref message if DLM_LOCK_RES_RECOVERING is set (which means 
recovery got to the lockres before dlm_thread could), move the lockres 
to the end of the purgelist and retry later.

do not inform recovery master if DLM_LOCK_RES_DROPPING_REF is set (which 
means dlm_thread got to the lockres before recovery). So in the case you 
described, node C will not know about node B dropping the dereference 
and node B will just go ahead and remove it from hash list and free it.

Wengang Wang wrote:
> When purge a lockres, we unhash the lockres ignore the result of deref request
> and ignore the lockres state.
> There is a problem that rarely happen. It can happen when recovery take places.
> Say node A is the master of the lockres with node B wants to deref and there is
> a node C. If things happen in the following order, the bug is triggered.
>
> 1) node B send DEREF to node A for lockres A and waiting for result.
> 2) node A crashed, node C become the recovery master.
> 3) node C mastered lockres A with node B has a ref on it.
> 4) node B goes to unhashes the lockres A with a ref on node C.
>  
> After step 4), if a umount comes on node C, it will hang at migrate lockres A 
> since node B has a ref on it.
>
> The fix is that we check if recovery happened on lockres A after sending DEREF
> request. If that happened, we keep lockres A in hash and in purge list for
> another try to send DEREF to the new master(node C). So that node C can clear
> the incorrect refbit.
>
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> ---
>  fs/ocfs2/dlm/dlmthread.c |   44 +++++++++++++++++++++++++++++++++++++++-----
>  1 files changed, 39 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
> index d4f73ca..946ff1a 100644
> --- a/fs/ocfs2/dlm/dlmthread.c
> +++ b/fs/ocfs2/dlm/dlmthread.c
> @@ -158,6 +158,8 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
>  	int master;
>  	int ret = 0;
>  
> +	assert_spin_locked(&dlm->spinlock);
> +
>  	spin_lock(&res->spinlock);
>  	if (!__dlm_lockres_unused(res)) {
>  		mlog(0, "%s:%.*s: tried to purge but not unused\n",
> @@ -211,18 +213,30 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
>  	}
>  
>  	spin_lock(&res->spinlock);
> +	/*
> +	 * we dropped dlm->spinlock and res->spinlock when sending the DEREF
> +	 * request, there is a chance that a recovery happend on this lockres.
> +	 * in that case, we have to DEREF to the new master when recovery
> +	 * finished. otherwise, there can be an incorrect ref on the lockres
> +	 * on the new master  on behalf of this node.
> +	 */
> +	if (unlikely(res->state & DLM_LOCK_RES_RECOVERING)) {
> +		spin_unlock(&res->spinlock);
> +		return -EAGAIN;
> +	}
> +
>  	if (!list_empty(&res->purge)) {
>  		mlog(0, "removing lockres %.*s:%p from purgelist, "
>  		     "master = %d\n", res->lockname.len, res->lockname.name,
>  		     res, master);
>  		list_del_init(&res->purge);
> -		spin_unlock(&res->spinlock);
> +		/* not the last ref */
>  		dlm_lockres_put(res);
>  		dlm->purge_count--;
> -	} else
> -		spin_unlock(&res->spinlock);
> +	}
>  
>  	__dlm_unhash_lockres(res);
> +	spin_unlock(&res->spinlock);
>  
>  	/* lockres is not in the hash now.  drop the flag and wake up
>  	 * any processes waiting in dlm_get_lock_resource. */
> @@ -241,6 +255,9 @@ static void dlm_run_purge_list(struct dlm_ctxt *dlm,
>  	unsigned int run_max, unused;
>  	unsigned long purge_jiffies;
>  	struct dlm_lock_resource *lockres;
> +	int ret;
> +
> +#define DLM_WAIT_RECOVERY_FINISH_MS 500
>  
>  	spin_lock(&dlm->spinlock);
>  	run_max = dlm->purge_count;
> @@ -280,8 +297,25 @@ static void dlm_run_purge_list(struct dlm_ctxt *dlm,
>  
>  		/* This may drop and reacquire the dlm spinlock if it
>  		 * has to do migration. */
> -		if (dlm_purge_lockres(dlm, lockres))
> -			BUG();
> +		ret = dlm_purge_lockres(dlm, lockres);
> +		if (ret) {
> +			if (ret == -EAGAIN) {
> +				/*
> +				 * recovery occured on this lockres. try to
> +				 * DEREF to the new master.
> +				 */
> +				dlm_lockres_put(lockres);
> +				spin_unlock(&dlm->spinlock);
> +				mlog(ML_ERROR, "retry to purge %.*s after %d"
> +				     "ms\n", lockres->lockname.len,
> +				     lockres->lockname.name,
> +				     DLM_WAIT_RECOVERY_FINISH_MS);
> +				msleep(DLM_WAIT_RECOVERY_FINISH_MS);
> +				spin_lock(&dlm->spinlock);
> +				continue;
> +			} else
> +				BUG();
> +		}
>  
>  		dlm_lockres_put(lockres);
>  
>   

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Ocfs2-devel] [PATCH 1/1] ocfs2/dlm: resend deref to new master if recovery occures
  2010-05-24 19:48 [Ocfs2-devel] [PATCH 1/1] ocfs2/dlm: resend deref to new master if recovery occures Srinivas Eeda
@ 2010-05-25  2:01 ` Wengang Wang
  2010-05-25  2:50   ` Wengang Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Wengang Wang @ 2010-05-25  2:01 UTC (permalink / raw)
  To: ocfs2-devel

On 10-05-24 12:48, Srinivas Eeda wrote:
> thanks for doing this patch. I have a little comment, wondering if
> there could be a window where node B sent the lock info to node C as
> part of recovery and removed flag DLM_LOCK_RES_RECOVERING while
> dlm_thread was still purging it. In that case dlm_thread will still
> continue to remove it from hash list.

Yes, you are right. There do is such a window. I missed that.

> 
> Also, this patch puts dlm_thread to sleep ... may be it's ok, but
> wondering if we can avoid that.

Yes. I considered about that too but failed at finding a simple way to avoid
that.

> delay deref message if DLM_LOCK_RES_RECOVERING is set (which means
> recovery got to the lockres before dlm_thread could), move the
> lockres to the end of the purgelist and retry later.

Good point! I meant that but the patch deosn't prove that :P.

> do not inform recovery master if DLM_LOCK_RES_DROPPING_REF is set
> (which means dlm_thread got to the lockres before recovery). So in
> the case you described, node C will not know about node B dropping
> the dereference and node B will just go ahead and remove it from
> hash list and free it.

Cool idea! let me try and re-create the patch!

thanks much Srini.
wengang.

> Wengang Wang wrote:
> >When purge a lockres, we unhash the lockres ignore the result of deref request
> >and ignore the lockres state.
> >There is a problem that rarely happen. It can happen when recovery take places.
> >Say node A is the master of the lockres with node B wants to deref and there is
> >a node C. If things happen in the following order, the bug is triggered.
> >
> >1) node B send DEREF to node A for lockres A and waiting for result.
> >2) node A crashed, node C become the recovery master.
> >3) node C mastered lockres A with node B has a ref on it.
> >4) node B goes to unhashes the lockres A with a ref on node C.
> >  After step 4), if a umount comes on node C, it will hang at
> >migrate lockres A since node B has a ref on it.
> >
> >The fix is that we check if recovery happened on lockres A after sending DEREF
> >request. If that happened, we keep lockres A in hash and in purge list for
> >another try to send DEREF to the new master(node C). So that node C can clear
> >the incorrect refbit.
> >

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Ocfs2-devel] [PATCH 1/1] ocfs2/dlm: resend deref to new master if recovery occures
  2010-05-25  2:01 ` Wengang Wang
@ 2010-05-25  2:50   ` Wengang Wang
  2010-05-25  4:54     ` Srinivas Eeda
  0 siblings, 1 reply; 9+ messages in thread
From: Wengang Wang @ 2010-05-25  2:50 UTC (permalink / raw)
  To: ocfs2-devel

> > delay deref message if DLM_LOCK_RES_RECOVERING is set (which means
> > recovery got to the lockres before dlm_thread could), move the
> > lockres to the end of the purgelist and retry later.

If you meant checking before sending DEREF, it could cause a high cpu
useage when there are only the ones in recovery left in the purge list
until they are recovered. --Is that acceptable?

regards,
wengang.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Ocfs2-devel] [PATCH 1/1] ocfs2/dlm: resend deref to new master if recovery occures
  2010-05-25  2:50   ` Wengang Wang
@ 2010-05-25  4:54     ` Srinivas Eeda
  2010-06-03 16:37       ` [Ocfs2-devel] [PATCH] ocfs2/dlm: cancel the migration or redo deref to recovery master Wengang Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Srinivas Eeda @ 2010-05-25  4:54 UTC (permalink / raw)
  To: ocfs2-devel

On 5/24/2010 7:50 PM, Wengang Wang wrote:
>>> delay deref message if DLM_LOCK_RES_RECOVERING is set (which means
>>> recovery got to the lockres before dlm_thread could), move the
>>> lockres to the end of the purgelist and retry later.
>>>       
>
> If you meant checking before sending DEREF, it could cause a high cpu
> useage when there are only the ones in recovery left in the purge list
> until they are recovered. --Is that acceptable?
>   
Yea, you are right about the cpu usage. I think the right thing would be 
to exit the loop after all locks are traversed once.
> regards,
> wengang.
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20100524/d283fc49/attachment.html 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Ocfs2-devel] [PATCH 1/1] ocfs2/dlm: resend deref to new master if recovery occures
@ 2010-05-25  7:31 Wengang Wang
  2010-05-25  7:35 ` Wengang Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Wengang Wang @ 2010-05-25  7:31 UTC (permalink / raw)
  To: ocfs2-devel

When purge a lockres, we unhash the lockres ignore the result of deref request
and ignore the lockres state.
There is a problem that rarely happen. It can happen when recovery take places.
Say node A is the master of the lockres with node B wants to deref and there is
a node C. If things happen in the following order, the bug is triggered.

1) node B send DEREF to node A for lockres A and waiting for result.
2) node A crashed, node C become the recovery master.
3) node C mastered lockres A with node B has a ref on it.
4) node B goes to unhashes the lockres A with a ref on node C.
 
After step 4), if a umount comes on node C, it will hang at migrate lockres A 
since node B has a ref on it.

The fix is that we check if recovery happened on lockres A after sending DEREF
request. If that happened, we keep lockres A in hash and in purge list for
another try to send DEREF to the new master(node C). So that node C can clear
the incorrect refbit.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
---
 fs/ocfs2/dlm/dlmthread.c |   44 +++++++++++++++++++++++++++++++++++++++-----
 1 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
index d4f73ca..946ff1a 100644
--- a/fs/ocfs2/dlm/dlmthread.c
+++ b/fs/ocfs2/dlm/dlmthread.c
@@ -158,6 +158,8 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
 	int master;
 	int ret = 0;
 
+	assert_spin_locked(&dlm->spinlock);
+
 	spin_lock(&res->spinlock);
 	if (!__dlm_lockres_unused(res)) {
 		mlog(0, "%s:%.*s: tried to purge but not unused\n",
@@ -211,18 +213,30 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
 	}
 
 	spin_lock(&res->spinlock);
+	/*
+	 * we dropped dlm->spinlock and res->spinlock when sending the DEREF
+	 * request, there is a chance that a recovery happend on this lockres.
+	 * in that case, we have to DEREF to the new master when recovery
+	 * finished. otherwise, there can be an incorrect ref on the lockres
+	 * on the new master  on behalf of this node.
+	 */
+	if (unlikely(res->state & DLM_LOCK_RES_RECOVERING)) {
+		spin_unlock(&res->spinlock);
+		return -EAGAIN;
+	}
+
 	if (!list_empty(&res->purge)) {
 		mlog(0, "removing lockres %.*s:%p from purgelist, "
 		     "master = %d\n", res->lockname.len, res->lockname.name,
 		     res, master);
 		list_del_init(&res->purge);
-		spin_unlock(&res->spinlock);
+		/* not the last ref */
 		dlm_lockres_put(res);
 		dlm->purge_count--;
-	} else
-		spin_unlock(&res->spinlock);
+	}
 
 	__dlm_unhash_lockres(res);
+	spin_unlock(&res->spinlock);
 
 	/* lockres is not in the hash now.  drop the flag and wake up
 	 * any processes waiting in dlm_get_lock_resource. */
@@ -241,6 +255,9 @@ static void dlm_run_purge_list(struct dlm_ctxt *dlm,
 	unsigned int run_max, unused;
 	unsigned long purge_jiffies;
 	struct dlm_lock_resource *lockres;
+	int ret;
+
+#define DLM_WAIT_RECOVERY_FINISH_MS 500
 
 	spin_lock(&dlm->spinlock);
 	run_max = dlm->purge_count;
@@ -280,8 +297,25 @@ static void dlm_run_purge_list(struct dlm_ctxt *dlm,
 
 		/* This may drop and reacquire the dlm spinlock if it
 		 * has to do migration. */
-		if (dlm_purge_lockres(dlm, lockres))
-			BUG();
+		ret = dlm_purge_lockres(dlm, lockres);
+		if (ret) {
+			if (ret == -EAGAIN) {
+				/*
+				 * recovery occured on this lockres. try to
+				 * DEREF to the new master.
+				 */
+				dlm_lockres_put(lockres);
+				spin_unlock(&dlm->spinlock);
+				mlog(ML_ERROR, "retry to purge %.*s after %d"
+				     "ms\n", lockres->lockname.len,
+				     lockres->lockname.name,
+				     DLM_WAIT_RECOVERY_FINISH_MS);
+				msleep(DLM_WAIT_RECOVERY_FINISH_MS);
+				spin_lock(&dlm->spinlock);
+				continue;
+			} else
+				BUG();
+		}
 
 		dlm_lockres_put(lockres);
 
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Ocfs2-devel] [PATCH 1/1] ocfs2/dlm: resend deref to new master if recovery occures
  2010-05-25  7:31 [Ocfs2-devel] [PATCH 1/1] ocfs2/dlm: resend deref to new master if recovery occures Wengang Wang
@ 2010-05-25  7:35 ` Wengang Wang
  0 siblings, 0 replies; 9+ messages in thread
From: Wengang Wang @ 2010-05-25  7:35 UTC (permalink / raw)
  To: ocfs2-devel

please ignore this email, it's sent by a mis-operation.

regards,
wengang.
On 10-05-25 15:31, Wengang Wang wrote:
> When purge a lockres, we unhash the lockres ignore the result of deref request
> and ignore the lockres state.
> There is a problem that rarely happen. It can happen when recovery take places.
> Say node A is the master of the lockres with node B wants to deref and there is
> a node C. If things happen in the following order, the bug is triggered.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2/dlm: cancel the migration or redo deref to recovery master
  2010-05-25  4:54     ` Srinivas Eeda
@ 2010-06-03 16:37       ` Wengang Wang
  0 siblings, 0 replies; 9+ messages in thread
From: Wengang Wang @ 2010-06-03 16:37 UTC (permalink / raw)
  To: ocfs2-devel

Changes to V1:
1 move the msleep to the second runs when the lockres is in recovery so the
  purging work on other lockres' can go.
2 do not inform recovery master if DLM_LOCK_RES_DROPPING_REF is set and don't
  resend deref in this case.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
---
 fs/ocfs2/dlm/dlmcommon.h   |    1 +
 fs/ocfs2/dlm/dlmrecovery.c |   25 +++++++++++++++
 fs/ocfs2/dlm/dlmthread.c   |   73 ++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 90 insertions(+), 9 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h
index 4b6ae2c..4194087 100644
--- a/fs/ocfs2/dlm/dlmcommon.h
+++ b/fs/ocfs2/dlm/dlmcommon.h
@@ -280,6 +280,7 @@ static inline void __dlm_set_joining_node(struct dlm_ctxt *dlm,
 #define DLM_LOCK_RES_IN_PROGRESS          0x00000010
 #define DLM_LOCK_RES_MIGRATING            0x00000020
 #define DLM_LOCK_RES_DROPPING_REF         0x00000040
+#define DLM_LOCK_RES_DE_DROP_REF          0x00000080
 #define DLM_LOCK_RES_BLOCK_DIRTY          0x00001000
 #define DLM_LOCK_RES_SETREF_INPROG        0x00002000
 
diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
index f8b75ce..7241070 100644
--- a/fs/ocfs2/dlm/dlmrecovery.c
+++ b/fs/ocfs2/dlm/dlmrecovery.c
@@ -913,6 +913,27 @@ static void dlm_request_all_locks_worker(struct dlm_work_item *item, void *data)
 	/* any errors returned will be due to the new_master dying,
 	 * the dlm_reco_thread should detect this */
 	list_for_each_entry(res, &resources, recovering) {
+		int ignore_mig = 0;
+		spin_lock(&res->spinlock);
+		/*
+		 * if we are dropping the lockres, no need to let the new master
+		 * know the reference of this node. That is don't migrate the
+		 * lockres to the new master. Also make sure we don't send DEREF
+		 * request for the same lockres to the new master either.
+		 */
+		if (unlikely(res->state & DLM_LOCK_RES_DROPPING_REF)) {
+			ignore_mig = 1;
+			res->state |= DLM_LOCK_RES_DE_DROP_REF;
+		}
+		spin_unlock(&res->spinlock);
+		if (ignore_mig) {
+			mlog(ML_NOTICE, "ignore migrating %.*s to recovery "
+			     "master %u as we are dropping it\n",
+			     res->lockname.len, res->lockname.name,
+			     reco_master);
+			continue;
+		}
+
 		ret = dlm_send_one_lockres(dlm, res, mres, reco_master,
 				   	DLM_MRES_RECOVERY);
 		if (ret < 0) {
@@ -1997,7 +2018,11 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt *dlm,
 	struct list_head *queue;
 	struct dlm_lock *lock, *next;
 
+	assert_spin_locked(&res->spinlock);
+
 	res->state |= DLM_LOCK_RES_RECOVERING;
+	res->state &= ~DLM_LOCK_RES_DE_DROP_REF;
+
 	if (!list_empty(&res->recovering)) {
 		mlog(0,
 		     "Recovering res %s:%.*s, is already on recovery list!\n",
diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
index d4f73ca..c4aa2ec 100644
--- a/fs/ocfs2/dlm/dlmthread.c
+++ b/fs/ocfs2/dlm/dlmthread.c
@@ -157,6 +157,9 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
 {
 	int master;
 	int ret = 0;
+	int remote_drop = 1;
+
+	assert_spin_locked(&dlm->spinlock);
 
 	spin_lock(&res->spinlock);
 	if (!__dlm_lockres_unused(res)) {
@@ -184,12 +187,19 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
 
 	if (!master)
 		res->state |= DLM_LOCK_RES_DROPPING_REF;
+
+	/*
+	 * If we didn't migrate this lockres to recovery master, don't send
+	 * DEREF request to it.
+	 */
+	if (res->state & DLM_LOCK_RES_DE_DROP_REF)
+		remote_drop = 0;
 	spin_unlock(&res->spinlock);
 
 	mlog(0, "purging lockres %.*s, master = %d\n", res->lockname.len,
 	     res->lockname.name, master);
 
-	if (!master) {
+	if (!master && remote_drop) {
 		/* drop spinlock...  retake below */
 		spin_unlock(&dlm->spinlock);
 
@@ -211,18 +221,34 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
 	}
 
 	spin_lock(&res->spinlock);
+	/*
+	 * we dropped dlm->spinlock and res->spinlock when sending the DEREF
+	 * request, there is a chance that a recovery happened on this lockres.
+	 * in that case, we have to DEREF to the new master(recovery master)
+	 * when recovery finished. otherwise, there can be an incorrect ref on
+	 * the lockres on the new master on behalf of this node.
+	 */
+	if (unlikely(res->state & DLM_LOCK_RES_RECOVERING)) {
+		spin_unlock(&res->spinlock);
+		/*
+		 * try deref again, keeping DLM_LOCK_RES_DROPPING_REF prevents
+		 * this lockres from being "in use" again.
+		 */
+		return -EAGAIN;
+	}
+
 	if (!list_empty(&res->purge)) {
 		mlog(0, "removing lockres %.*s:%p from purgelist, "
 		     "master = %d\n", res->lockname.len, res->lockname.name,
 		     res, master);
 		list_del_init(&res->purge);
-		spin_unlock(&res->spinlock);
+		/* not the last ref */
 		dlm_lockres_put(res);
 		dlm->purge_count--;
-	} else
-		spin_unlock(&res->spinlock);
+	}
 
 	__dlm_unhash_lockres(res);
+	spin_unlock(&res->spinlock);
 
 	/* lockres is not in the hash now.  drop the flag and wake up
 	 * any processes waiting in dlm_get_lock_resource. */
@@ -241,6 +267,9 @@ static void dlm_run_purge_list(struct dlm_ctxt *dlm,
 	unsigned int run_max, unused;
 	unsigned long purge_jiffies;
 	struct dlm_lock_resource *lockres;
+	int ret;
+
+#define DLM_WAIT_RECOVERY_FINISH_MS 500
 
 	spin_lock(&dlm->spinlock);
 	run_max = dlm->purge_count;
@@ -258,10 +287,22 @@ static void dlm_run_purge_list(struct dlm_ctxt *dlm,
 		 * spinlock. */
 		spin_lock(&lockres->spinlock);
 		unused = __dlm_lockres_unused(lockres);
-		spin_unlock(&lockres->spinlock);
-
-		if (!unused)
+		if (!unused) {
+			spin_unlock(&lockres->spinlock);
 			continue;
+		}
+		if (lockres->state & DLM_LOCK_RES_RECOVERING) {
+			list_move_tail(&lockres->purge, &dlm->purge_list);
+			spin_unlock(&lockres->spinlock);
+			spin_unlock(&dlm->spinlock);
+			mlog(ML_NOTICE, "retry to purge %.*s after %dms\n",
+			     lockres->lockname.len, lockres->lockname.name,
+			     DLM_WAIT_RECOVERY_FINISH_MS);
+			msleep(DLM_WAIT_RECOVERY_FINISH_MS);
+			spin_lock(&dlm->spinlock);
+			continue;
+		}
+		spin_unlock(&lockres->spinlock);
 
 		purge_jiffies = lockres->last_used +
 			msecs_to_jiffies(DLM_PURGE_INTERVAL_MS);
@@ -280,8 +321,22 @@ static void dlm_run_purge_list(struct dlm_ctxt *dlm,
 
 		/* This may drop and reacquire the dlm spinlock if it
 		 * has to do migration. */
-		if (dlm_purge_lockres(dlm, lockres))
-			BUG();
+		ret = dlm_purge_lockres(dlm, lockres);
+		if (ret) {
+			if (ret == -EAGAIN) {
+				/*
+				 * recovery occured on this lockres. try to
+				 * DEREF to the new master.
+				 */
+				dlm_lockres_put(lockres);
+				spin_lock(&lockres->spinlock);
+				list_move_tail(&lockres->purge,
+					       &dlm->purge_list);
+				spin_unlock(&lockres->spinlock);
+				continue;
+			} else
+				BUG();
+		}
 
 		dlm_lockres_put(lockres);
 
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-06-03 16:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-24 19:48 [Ocfs2-devel] [PATCH 1/1] ocfs2/dlm: resend deref to new master if recovery occures Srinivas Eeda
2010-05-25  2:01 ` Wengang Wang
2010-05-25  2:50   ` Wengang Wang
2010-05-25  4:54     ` Srinivas Eeda
2010-06-03 16:37       ` [Ocfs2-devel] [PATCH] ocfs2/dlm: cancel the migration or redo deref to recovery master Wengang Wang
  -- strict thread matches above, loose matches on Subject: below --
2010-05-25  7:31 [Ocfs2-devel] [PATCH 1/1] ocfs2/dlm: resend deref to new master if recovery occures Wengang Wang
2010-05-25  7:35 ` Wengang Wang
2010-05-24 14:35 Wengang Wang
2010-05-24 14:41 ` Wengang Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).