All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] [PATCH] ocfs2/dlm: queue ast in lockres->spinlock
@ 2010-11-30 15:48 Wengang Wang
  2010-11-30 19:24 ` Sunil Mushran
  0 siblings, 1 reply; 6+ messages in thread
From: Wengang Wang @ 2010-11-30 15:48 UTC (permalink / raw)
  To: ocfs2-devel

There is bug hitting the BUG() in ocfs2_prepare_downconvert(), in 1.4 code
though.

static unsigned int ocfs2_prepare_downconvert(struct ocfs2_lock_res *lockres,
                                              int new_level)
{               
        assert_spin_locked(&lockres->l_lock);
                
        BUG_ON(lockres->l_blocking <= DLM_LOCK_NL);
                
        if (lockres->l_level <= new_level) {
                mlog(ML_ERROR, "lockres %s, lvl %d <= %d, blcklst %d, mask %d, "
                     "type %d, flags 0x%lx, hold %d %d, act %d %d, req %d, "
                     "block %d, pgen %d\n", lockres->l_name, lockres->l_level,
                     new_level, list_empty(&lockres->l_blocked_list),
                     list_empty(&lockres->l_mask_waiters), lockres->l_type,
                     lockres->l_flags, lockres->l_ro_holders,
                     lockres->l_ex_holders, lockres->l_action,
                     lockres->l_unlock_action, lockres->l_requested,
                     lockres->l_blocking, lockres->l_pending_gen);
                BUG();  
        } 
....

By checking vmcore, lockres->l_level == new_level == 0.

Though no tcpdump shows that, I think it's caused by a BAST before AST.
It can happen in the following case(steps in time order):

1) Node A send a "up convert" request(for a EX) to the master node, node B.

2) Node C send a "create lock" request(for a PR) to the master node, node B
against the same lockres.

3) The "convert request" was added to dlm_thread because the lock can't be
granted immediately. So that the "create lock" thread and the "up convert"
thread can run in parallel.

4) the "create lock" thread get the lockres->spinlock and found no conflict for
the lock request, so it put the lock to granted list. then lockres->spinlock is
released. (see dlmlock_master()).

5) the dlm_thread get the spinlcok and found the conflict lock, which was
granted in 4), and queue a BAST that will send to node C.
(see dlm_shuffle_lists()).

6) the "create lock" thread then continue to queue the AST for the granted lock
to node C. So the BAST and AST come to Node C in order.

7) node C crashes when responding the BAST.


This patch tries to fix it by moving the queue_ast into protection of
lockres->spinlock.

This patch is based on
http://oss.oracle.com/pipermail/ocfs2-devel/2010-November/007523.html
Otherwise, it could cause deadlock.(by different order of lockres->spinlock and
dlm->ast_lock)

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
---
 fs/ocfs2/dlm/dlmconvert.c |   14 ++++++--------
 fs/ocfs2/dlm/dlmlock.c    |    6 ++----
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmconvert.c b/fs/ocfs2/dlm/dlmconvert.c
index 9f30491..ff0b84f 100644
--- a/fs/ocfs2/dlm/dlmconvert.c
+++ b/fs/ocfs2/dlm/dlmconvert.c
@@ -90,15 +90,14 @@ enum dlm_status dlmconvert_master(struct dlm_ctxt *dlm,
 				     &call_ast, &kick_thread);
 
 	res->state &= ~DLM_LOCK_RES_IN_PROGRESS;
+	if (call_ast)
+		dlm_queue_ast(dlm, lock);
 	spin_unlock(&res->spinlock);
 	wake_up(&res->wq);
 	if (status != DLM_NORMAL && status != DLM_NOTQUEUED)
 		dlm_error(status);
 
-	/* either queue the ast or release it */
-	if (call_ast)
-		dlm_queue_ast(dlm, lock);
-	else
+	if (!call_ast)
 		dlm_lockres_release_ast(dlm, res);
 
 	if (kick_thread)
@@ -514,6 +513,8 @@ int dlm_convert_lock_handler(struct o2net_msg *msg, u32 len, void *data,
 					     cnv->requested_type,
 					     &call_ast, &kick_thread);
 		res->state &= ~DLM_LOCK_RES_IN_PROGRESS;
+		if (call_ast)
+			dlm_queue_ast(dlm, lock);
 		wake = 1;
 	}
 	spin_unlock(&res->spinlock);
@@ -530,10 +531,7 @@ leave:
 	if (lock)
 		dlm_lock_put(lock);
 
-	/* either queue the ast or release it, if reserved */
-	if (call_ast)
-		dlm_queue_ast(dlm, lock);
-	else if (ast_reserved)
+	if ((!call_ast) && ast_reserved)
 		dlm_lockres_release_ast(dlm, res);
 
 	if (kick_thread)
diff --git a/fs/ocfs2/dlm/dlmlock.c b/fs/ocfs2/dlm/dlmlock.c
index 69cf369..f84d1fb 100644
--- a/fs/ocfs2/dlm/dlmlock.c
+++ b/fs/ocfs2/dlm/dlmlock.c
@@ -157,6 +157,7 @@ static enum dlm_status dlmlock_master(struct dlm_ctxt *dlm,
 		if (!dlm_is_recovery_lock(res->lockname.name,
 					  res->lockname.len)) {
 			kick_thread = 1;
+			dlm_queue_ast(dlm, lock);
 			call_ast = 1;
 		} else {
 			mlog(0, "%s: returning DLM_NORMAL to "
@@ -188,10 +189,7 @@ static enum dlm_status dlmlock_master(struct dlm_ctxt *dlm,
 	spin_unlock(&res->spinlock);
 	wake_up(&res->wq);
 
-	/* either queue the ast or release it */
-	if (call_ast)
-		dlm_queue_ast(dlm, lock);
-	else
+	if (!call_ast)
 		dlm_lockres_release_ast(dlm, res);
 
 	dlm_lockres_calc_usage(dlm, res);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2/dlm: queue ast in lockres->spinlock
  2010-11-30 15:48 [Ocfs2-devel] [PATCH] ocfs2/dlm: queue ast in lockres->spinlock Wengang Wang
@ 2010-11-30 19:24 ` Sunil Mushran
  2010-12-01  2:52   ` Wengang Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Sunil Mushran @ 2010-11-30 19:24 UTC (permalink / raw)
  To: ocfs2-devel

http://oss.oracle.com/pipermail/ocfs2-devel/2010-January/005801.html

The above bug/fix fits better.

Having said that, I am not discounting your point. But I will need more
to analyse it. Questions inlined.

On 11/30/2010 07:48 AM, Wengang Wang wrote:
> There is bug hitting the BUG() in ocfs2_prepare_downconvert(), in 1.4 code
> though.
>
> static unsigned int ocfs2_prepare_downconvert(struct ocfs2_lock_res *lockres,
>                                                int new_level)
> {
>          assert_spin_locked(&lockres->l_lock);
>
>          BUG_ON(lockres->l_blocking<= DLM_LOCK_NL);
>
>          if (lockres->l_level<= new_level) {
>                  mlog(ML_ERROR, "lockres %s, lvl %d<= %d, blcklst %d, mask %d, "
>                       "type %d, flags 0x%lx, hold %d %d, act %d %d, req %d, "
>                       "block %d, pgen %d\n", lockres->l_name, lockres->l_level,
>                       new_level, list_empty(&lockres->l_blocked_list),
>                       list_empty(&lockres->l_mask_waiters), lockres->l_type,
>                       lockres->l_flags, lockres->l_ro_holders,
>                       lockres->l_ex_holders, lockres->l_action,
>                       lockres->l_unlock_action, lockres->l_requested,
>                       lockres->l_blocking, lockres->l_pending_gen);
>                  BUG();
>          }
> ....
>
> By checking vmcore, lockres->l_level == new_level == 0.
>
> Though no tcpdump shows that, I think it's caused by a BAST before AST.
> It can happen in the following case(steps in time order):
>
> 1) Node A send a "up convert" request(for a EX) to the master node, node B.
>
> 2) Node C send a "create lock" request(for a PR) to the master node, node B
> against the same lockres.
>
> 3) The "convert request" was added to dlm_thread because the lock can't be
> granted immediately. So that the "create lock" thread and the "up convert"
> thread can run in parallel.
>
> 4) the "create lock" thread get the lockres->spinlock and found no conflict for
> the lock request, so it put the lock to granted list. then lockres->spinlock is
> released. (see dlmlock_master()).
>
> 5) the dlm_thread get the spinlcok and found the conflict lock, which was
> granted in 4), and queue a BAST that will send to node C.
> (see dlm_shuffle_lists()).
>
> 6) the "create lock" thread then continue to queue the AST for the granted lock
> to node C. So the BAST and AST come to Node C in order.
>
> 7) node C crashes when responding the BAST.

So in your scenario, the upconvert request is before create lock. If that is
so, then even if the new level cannot be granted, the lock will be moved
to the convert queue. The create lock request is always added to the
blocked queue which always has the lowest priority.

Note that the dlm always processes convert queues before blocked queues.
Also, o2dlm, always processes all ASTs before BASTs.

That is not to say what you have said cannot happen. But I will need more
information. Like, Node A was up converting to EX from what level? Were
there any other locks on the resource?

>
> This patch tries to fix it by moving the queue_ast into protection of
> lockres->spinlock.
>
> This patch is based on
> http://oss.oracle.com/pipermail/ocfs2-devel/2010-November/007523.html
> Otherwise, it could cause deadlock.(by different order of lockres->spinlock and
> dlm->ast_lock)
>
> Signed-off-by: Wengang Wang<wen.gang.wang@oracle.com>
> ---
>   fs/ocfs2/dlm/dlmconvert.c |   14 ++++++--------
>   fs/ocfs2/dlm/dlmlock.c    |    6 ++----
>   2 files changed, 8 insertions(+), 12 deletions(-)
>
> diff --git a/fs/ocfs2/dlm/dlmconvert.c b/fs/ocfs2/dlm/dlmconvert.c
> index 9f30491..ff0b84f 100644
> --- a/fs/ocfs2/dlm/dlmconvert.c
> +++ b/fs/ocfs2/dlm/dlmconvert.c
> @@ -90,15 +90,14 @@ enum dlm_status dlmconvert_master(struct dlm_ctxt *dlm,
>   				&call_ast,&kick_thread);
>
>   	res->state&= ~DLM_LOCK_RES_IN_PROGRESS;
> +	if (call_ast)
> +		dlm_queue_ast(dlm, lock);
>   	spin_unlock(&res->spinlock);
>   	wake_up(&res->wq);
>   	if (status != DLM_NORMAL&&  status != DLM_NOTQUEUED)
>   		dlm_error(status);
>
> -	/* either queue the ast or release it */
> -	if (call_ast)
> -		dlm_queue_ast(dlm, lock);
> -	else
> +	if (!call_ast)
>   		dlm_lockres_release_ast(dlm, res);
>
>   	if (kick_thread)
> @@ -514,6 +513,8 @@ int dlm_convert_lock_handler(struct o2net_msg *msg, u32 len, void *data,
>   					     cnv->requested_type,
>   					&call_ast,&kick_thread);
>   		res->state&= ~DLM_LOCK_RES_IN_PROGRESS;
> +		if (call_ast)
> +			dlm_queue_ast(dlm, lock);
>   		wake = 1;
>   	}
>   	spin_unlock(&res->spinlock);
> @@ -530,10 +531,7 @@ leave:
>   	if (lock)
>   		dlm_lock_put(lock);
>
> -	/* either queue the ast or release it, if reserved */
> -	if (call_ast)
> -		dlm_queue_ast(dlm, lock);
> -	else if (ast_reserved)
> +	if ((!call_ast)&&  ast_reserved)
>   		dlm_lockres_release_ast(dlm, res);
>
>   	if (kick_thread)
> diff --git a/fs/ocfs2/dlm/dlmlock.c b/fs/ocfs2/dlm/dlmlock.c
> index 69cf369..f84d1fb 100644
> --- a/fs/ocfs2/dlm/dlmlock.c
> +++ b/fs/ocfs2/dlm/dlmlock.c
> @@ -157,6 +157,7 @@ static enum dlm_status dlmlock_master(struct dlm_ctxt *dlm,
>   		if (!dlm_is_recovery_lock(res->lockname.name,
>   					  res->lockname.len)) {
>   			kick_thread = 1;
> +			dlm_queue_ast(dlm, lock);
>   			call_ast = 1;
>   		} else {
>   			mlog(0, "%s: returning DLM_NORMAL to "
> @@ -188,10 +189,7 @@ static enum dlm_status dlmlock_master(struct dlm_ctxt *dlm,
>   	spin_unlock(&res->spinlock);
>   	wake_up(&res->wq);
>
> -	/* either queue the ast or release it */
> -	if (call_ast)
> -		dlm_queue_ast(dlm, lock);
> -	else
> +	if (!call_ast)
>   		dlm_lockres_release_ast(dlm, res);
>
>   	dlm_lockres_calc_usage(dlm, res);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2/dlm: queue ast in lockres->spinlock
  2010-11-30 19:24 ` Sunil Mushran
@ 2010-12-01  2:52   ` Wengang Wang
  2010-12-01  5:33     ` Sunil Mushran
  0 siblings, 1 reply; 6+ messages in thread
From: Wengang Wang @ 2010-12-01  2:52 UTC (permalink / raw)
  To: ocfs2-devel

On 10-11-30 11:24, Sunil Mushran wrote:
> http://oss.oracle.com/pipermail/ocfs2-devel/2010-January/005801.html
> 
> The above bug/fix fits better.
> 
> Having said that, I am not discounting your point. But I will need more
> to analyse it. Questions inlined.
> 
> On 11/30/2010 07:48 AM, Wengang Wang wrote:
> >There is bug hitting the BUG() in ocfs2_prepare_downconvert(), in 1.4 code
> >though.
> >
> >static unsigned int ocfs2_prepare_downconvert(struct ocfs2_lock_res *lockres,
> >                                               int new_level)
> >{
> >         assert_spin_locked(&lockres->l_lock);
> >
> >         BUG_ON(lockres->l_blocking<= DLM_LOCK_NL);
> >
> >         if (lockres->l_level<= new_level) {
> >                 mlog(ML_ERROR, "lockres %s, lvl %d<= %d, blcklst %d, mask %d, "
> >                      "type %d, flags 0x%lx, hold %d %d, act %d %d, req %d, "
> >                      "block %d, pgen %d\n", lockres->l_name, lockres->l_level,
> >                      new_level, list_empty(&lockres->l_blocked_list),
> >                      list_empty(&lockres->l_mask_waiters), lockres->l_type,
> >                      lockres->l_flags, lockres->l_ro_holders,
> >                      lockres->l_ex_holders, lockres->l_action,
> >                      lockres->l_unlock_action, lockres->l_requested,
> >                      lockres->l_blocking, lockres->l_pending_gen);
> >                 BUG();
> >         }
> >....
> >
> >By checking vmcore, lockres->l_level == new_level == 0.
> >
> >Though no tcpdump shows that, I think it's caused by a BAST before AST.
> >It can happen in the following case(steps in time order):
> >
> >1) Node A send a "up convert" request(for a EX) to the master node, node B.
> >
> >2) Node C send a "create lock" request(for a PR) to the master node, node B
> >against the same lockres.
> >
> >3) The "convert request" was added to dlm_thread because the lock can't be
> >granted immediately. So that the "create lock" thread and the "up convert"
> >thread can run in parallel.
> >
> >4) the "create lock" thread get the lockres->spinlock and found no conflict for
> >the lock request, so it put the lock to granted list. then lockres->spinlock is
> >released. (see dlmlock_master()).
> >
> >5) the dlm_thread get the spinlcok and found the conflict lock, which was
> >granted in 4), and queue a BAST that will send to node C.
> >(see dlm_shuffle_lists()).
> >
> >6) the "create lock" thread then continue to queue the AST for the granted lock
> >to node C. So the BAST and AST come to Node C in order.
> >
> >7) node C crashes when responding the BAST.
> 
> So in your scenario, the upconvert request is before create lock. If that is
> so, then even if the new level cannot be granted, the lock will be moved
> to the convert queue. The create lock request is always added to the
> blocked queue which always has the lowest priority.
> 
> Note that the dlm always processes convert queues before blocked queues.
> Also, o2dlm, always processes all ASTs before BASTs.
> 
> That is not to say what you have said cannot happen. But I will need more
> information. Like, Node A was up converting to EX from what level? Were
> there any other locks on the resource?
> 

First I have to say that the scenario is in my thought only, no tcpdump
shows that. Only the node C wanting a PR is the truth from the vmcore.

Let's consider this situation:

The create lock(CL) is for a PR;
the up convert(UC) is PR->EX;
there is an existing PR granted for another node(say node D) before the UC comes

The UC comes. Because of the existing PR, the UC lock is moved to converting list.
dlm_thread takes care it later.
The CL comes. no conflict found(see dlm_can_grant_new_lock, convert_type not
checked there). so it's not added to blocked list.

Seems the above scenario can trigger a BAST before AST.

regards,
wengang.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2/dlm: queue ast in lockres->spinlock
  2010-12-01  2:52   ` Wengang Wang
@ 2010-12-01  5:33     ` Sunil Mushran
  2010-12-01  6:16       ` Wengang Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Sunil Mushran @ 2010-12-01  5:33 UTC (permalink / raw)
  To: ocfs2-devel

On 11/30/2010 06:52 PM, Wengang Wang wrote:
> First I have to say that the scenario is in my thought only, no tcpdump
> shows that. Only the node C wanting a PR is the truth from the vmcore.
>
> Let's consider this situation:
>
> The create lock(CL) is for a PR;
> the up convert(UC) is PR->EX;
> there is an existing PR granted for another node(say node D) before the UC comes
>
> The UC comes. Because of the existing PR, the UC lock is moved to converting list.
> dlm_thread takes care it later.
> The CL comes. no conflict found(see dlm_can_grant_new_lock, convert_type not
> checked there). so it's not added to blocked list.
>
> Seems the above scenario can trigger a BAST before AST.

dlm_can_grant_new_lock() should be looking at convert_type also and
add the lock to the blocked list in this case.

This is not just to plug this hole but is as per the spec.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2/dlm: queue ast in lockres->spinlock
  2010-12-01  5:33     ` Sunil Mushran
@ 2010-12-01  6:16       ` Wengang Wang
  2010-12-01  6:24         ` Sunil Mushran
  0 siblings, 1 reply; 6+ messages in thread
From: Wengang Wang @ 2010-12-01  6:16 UTC (permalink / raw)
  To: ocfs2-devel

On 10-11-30 21:33, Sunil Mushran wrote:
> On 11/30/2010 06:52 PM, Wengang Wang wrote:
> >First I have to say that the scenario is in my thought only, no tcpdump
> >shows that. Only the node C wanting a PR is the truth from the vmcore.
> >
> >Let's consider this situation:
> >
> >The create lock(CL) is for a PR;
> >the up convert(UC) is PR->EX;
> >there is an existing PR granted for another node(say node D) before the UC comes
> >
> >The UC comes. Because of the existing PR, the UC lock is moved to converting list.
> >dlm_thread takes care it later.
> >The CL comes. no conflict found(see dlm_can_grant_new_lock, convert_type not
> >checked there). so it's not added to blocked list.
> >
> >Seems the above scenario can trigger a BAST before AST.
> 
> dlm_can_grant_new_lock() should be looking at convert_type also and
> add the lock to the blocked list in this case.
> 
> This is not just to plug this hole but is as per the spec.
So you meant o2dlm is (should be) implementing fair locking? Anywhere I
can take a look at the spec?

regards,
wengang.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2/dlm: queue ast in lockres->spinlock
  2010-12-01  6:16       ` Wengang Wang
@ 2010-12-01  6:24         ` Sunil Mushran
  0 siblings, 0 replies; 6+ messages in thread
From: Sunil Mushran @ 2010-12-01  6:24 UTC (permalink / raw)
  To: ocfs2-devel



On Nov 30, 2010, at 10:16 PM, Wengang Wang <wen.gang.wang@oracle.com> wrote:

> On 10-11-30 21:33, Sunil Mushran wrote:
>> On 11/30/2010 06:52 PM, Wengang Wang wrote:
>>> First I have to say that the scenario is in my thought only, no tcpdump
>>> shows that. Only the node C wanting a PR is the truth from the vmcore.
>>> 
>>> Let's consider this situation:
>>> 
>>> The create lock(CL) is for a PR;
>>> the up convert(UC) is PR->EX;
>>> there is an existing PR granted for another node(say node D) before the UC comes
>>> 
>>> The UC comes. Because of the existing PR, the UC lock is moved to converting list.
>>> dlm_thread takes care it later.
>>> The CL comes. no conflict found(see dlm_can_grant_new_lock, convert_type not
>>> checked there). so it's not added to blocked list.
>>> 
>>> Seems the above scenario can trigger a BAST before AST.
>> 
>> dlm_can_grant_new_lock() should be looking at convert_type also and
>> add the lock to the blocked list in this case.
>> 
>> This is not just to plug this hole but is as per the spec.
> So you meant o2dlm is (should be) implementing fair locking? Anywhere I
> can take a look at the spec?
> 

Google for Programming Locking Applications by Kristin Thomas. Or, grep the same in the source. The link is there. 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-12-01  6:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-30 15:48 [Ocfs2-devel] [PATCH] ocfs2/dlm: queue ast in lockres->spinlock Wengang Wang
2010-11-30 19:24 ` Sunil Mushran
2010-12-01  2:52   ` Wengang Wang
2010-12-01  5:33     ` Sunil Mushran
2010-12-01  6:16       ` Wengang Wang
2010-12-01  6:24         ` Sunil Mushran

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.