* [Ocfs2-devel] ocfs2/dlm: disable BUG_ON when DLM_LOCK_RES_DROPPING_REF, is cleared before dlm_deref_lockres_done_handler
@ 2016-07-10 10:01 piaojun
2016-07-11 1:55 ` Joseph Qi
0 siblings, 1 reply; 3+ messages in thread
From: piaojun @ 2016-07-10 10:01 UTC (permalink / raw)
To: ocfs2-devel
We found a BUG situation in which DLM_LOCK_RES_DROPPING_REF is cleared
unexpected that described below. To solve the bug, we disable the BUG_ON
and purge lockres in dlm_do_local_recovery_cleanup.
Node 1 Node 2(master)
dlm_purge_lockres
dlm_deref_lockres_handler
DLM_LOCK_RES_SETREF_INPROG is set
response DLM_DEREF_RESPONSE_INPROG
receive DLM_DEREF_RESPONSE_INPROG
stop puring in dlm_purge_lockres
and wait for DLM_DEREF_RESPONSE_DONE
dispatch dlm_deref_lockres_worker
response DLM_DEREF_RESPONSE_DONE
receive DLM_DEREF_RESPONSE_DONE and
prepare to purge lockres
Node 2 goes down
find Node2 down and do local
clean up for Node2:
dlm_do_local_recovery_cleanup
-> clear DLM_LOCK_RES_DROPPING_REF
when purging lockres, BUG_ON happens
because DLM_LOCK_RES_DROPPING_REF is clear:
dlm_deref_lockres_done_handler
->BUG_ON(!(res->state & DLM_LOCK_RES_DROPPING_REF));
Fixes: 60d663cb5273 ("ocfs2/dlm: add DEREF_DONE message")
Signed-off-by: Jun Piao <piaojun@huawei.com>
---
fs/ocfs2/dlm/dlmmaster.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 9aed6e2..f72e7ae 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -2416,7 +2416,16 @@ int dlm_deref_lockres_done_handler(struct o2net_msg *msg, u32 len, void *data,
}
spin_lock(&res->spinlock);
- BUG_ON(!(res->state & DLM_LOCK_RES_DROPPING_REF));
+ if (!(res->state & DLM_LOCK_RES_DROPPING_REF)) {
+ spin_unlock(&res->spinlock);
+ spin_unlock(&dlm->spinlock);
+ mlog(ML_NOTICE, "%s:%.*s: node %u sends deref done "
+ "but it is already derefed!\n", dlm->name,
+ res->lockname.len, res->lockname.name, node);
+ dlm_lockres_put(res);
+ goto done;
+ }
+
if (!list_empty(&res->purge)) {
mlog(0, "%s: Removing res %.*s from purgelist\n",
dlm->name, res->lockname.len, res->lockname.name);
@@ -2455,6 +2464,8 @@ int dlm_deref_lockres_done_handler(struct o2net_msg *msg, u32 len, void *data,
spin_unlock(&dlm->spinlock);
+ ret = 0;
+
done:
dlm_put(dlm);
return ret;
--
1.8.4.3
^ permalink raw reply related [flat|nested] 3+ messages in thread* [Ocfs2-devel] ocfs2/dlm: disable BUG_ON when DLM_LOCK_RES_DROPPING_REF, is cleared before dlm_deref_lockres_done_handler
2016-07-10 10:01 [Ocfs2-devel] ocfs2/dlm: disable BUG_ON when DLM_LOCK_RES_DROPPING_REF, is cleared before dlm_deref_lockres_done_handler piaojun
@ 2016-07-11 1:55 ` Joseph Qi
2016-07-11 2:17 ` piaojun
0 siblings, 1 reply; 3+ messages in thread
From: Joseph Qi @ 2016-07-11 1:55 UTC (permalink / raw)
To: ocfs2-devel
Hi Jun,
On 2016/7/10 18:01, piaojun wrote:
> We found a BUG situation in which DLM_LOCK_RES_DROPPING_REF is cleared
> unexpected that described below. To solve the bug, we disable the BUG_ON
> and purge lockres in dlm_do_local_recovery_cleanup.
>
> Node 1 Node 2(master)
> dlm_purge_lockres
> dlm_deref_lockres_handler
>
> DLM_LOCK_RES_SETREF_INPROG is set
> response DLM_DEREF_RESPONSE_INPROG
>
> receive DLM_DEREF_RESPONSE_INPROG
> stop puring in dlm_purge_lockres
> and wait for DLM_DEREF_RESPONSE_DONE
>
> dispatch dlm_deref_lockres_worker
> response DLM_DEREF_RESPONSE_DONE
>
> receive DLM_DEREF_RESPONSE_DONE and
> prepare to purge lockres
>
> Node 2 goes down
>
> find Node2 down and do local
> clean up for Node2:
> dlm_do_local_recovery_cleanup
> -> clear DLM_LOCK_RES_DROPPING_REF
>
> when purging lockres, BUG_ON happens
> because DLM_LOCK_RES_DROPPING_REF is clear:
> dlm_deref_lockres_done_handler
> ->BUG_ON(!(res->state & DLM_LOCK_RES_DROPPING_REF));
>
> Fixes: 60d663cb5273 ("ocfs2/dlm: add DEREF_DONE message")
> Signed-off-by: Jun Piao <piaojun@huawei.com>
> ---
> fs/ocfs2/dlm/dlmmaster.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
> index 9aed6e2..f72e7ae 100644
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -2416,7 +2416,16 @@ int dlm_deref_lockres_done_handler(struct o2net_msg *msg, u32 len, void *data,
> }
>
> spin_lock(&res->spinlock);
> - BUG_ON(!(res->state & DLM_LOCK_RES_DROPPING_REF));
> + if (!(res->state & DLM_LOCK_RES_DROPPING_REF)) {
> + spin_unlock(&res->spinlock);
> + spin_unlock(&dlm->spinlock);
> + mlog(ML_NOTICE, "%s:%.*s: node %u sends deref done "
> + "but it is already derefed!\n", dlm->name,
> + res->lockname.len, res->lockname.name, node);
> + dlm_lockres_put(res);
So we treat this case as normal?
If so, we'd better return 0 other than -EINVAL.
Thanks,
Joseph
> + goto done;
> + }
> +
> if (!list_empty(&res->purge)) {
> mlog(0, "%s: Removing res %.*s from purgelist\n",
> dlm->name, res->lockname.len, res->lockname.name);
> @@ -2455,6 +2464,8 @@ int dlm_deref_lockres_done_handler(struct o2net_msg *msg, u32 len, void *data,
>
> spin_unlock(&dlm->spinlock);
>
> + ret = 0;
> +
> done:
> dlm_put(dlm);
> return ret;
>
^ permalink raw reply [flat|nested] 3+ messages in thread* [Ocfs2-devel] ocfs2/dlm: disable BUG_ON when DLM_LOCK_RES_DROPPING_REF, is cleared before dlm_deref_lockres_done_handler
2016-07-11 1:55 ` Joseph Qi
@ 2016-07-11 2:17 ` piaojun
0 siblings, 0 replies; 3+ messages in thread
From: piaojun @ 2016-07-11 2:17 UTC (permalink / raw)
To: ocfs2-devel
On 2016-7-11 9:55, Joseph Qi wrote:
> Hi Jun,
>
> On 2016/7/10 18:01, piaojun wrote:
>> We found a BUG situation in which DLM_LOCK_RES_DROPPING_REF is cleared
>> unexpected that described below. To solve the bug, we disable the BUG_ON
>> and purge lockres in dlm_do_local_recovery_cleanup.
>>
>> Node 1 Node 2(master)
>> dlm_purge_lockres
>> dlm_deref_lockres_handler
>>
>> DLM_LOCK_RES_SETREF_INPROG is set
>> response DLM_DEREF_RESPONSE_INPROG
>>
>> receive DLM_DEREF_RESPONSE_INPROG
>> stop puring in dlm_purge_lockres
>> and wait for DLM_DEREF_RESPONSE_DONE
>>
>> dispatch dlm_deref_lockres_worker
>> response DLM_DEREF_RESPONSE_DONE
>>
>> receive DLM_DEREF_RESPONSE_DONE and
>> prepare to purge lockres
>>
>> Node 2 goes down
>>
>> find Node2 down and do local
>> clean up for Node2:
>> dlm_do_local_recovery_cleanup
>> -> clear DLM_LOCK_RES_DROPPING_REF
>>
>> when purging lockres, BUG_ON happens
>> because DLM_LOCK_RES_DROPPING_REF is clear:
>> dlm_deref_lockres_done_handler
>> ->BUG_ON(!(res->state & DLM_LOCK_RES_DROPPING_REF));
>>
>> Fixes: 60d663cb5273 ("ocfs2/dlm: add DEREF_DONE message")
>> Signed-off-by: Jun Piao <piaojun@huawei.com>
>> ---
>> fs/ocfs2/dlm/dlmmaster.c | 13 ++++++++++++-
>> 1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
>> index 9aed6e2..f72e7ae 100644
>> --- a/fs/ocfs2/dlm/dlmmaster.c
>> +++ b/fs/ocfs2/dlm/dlmmaster.c
>> @@ -2416,7 +2416,16 @@ int dlm_deref_lockres_done_handler(struct o2net_msg *msg, u32 len, void *data,
>> }
>>
>> spin_lock(&res->spinlock);
>> - BUG_ON(!(res->state & DLM_LOCK_RES_DROPPING_REF));
>> + if (!(res->state & DLM_LOCK_RES_DROPPING_REF)) {
>> + spin_unlock(&res->spinlock);
>> + spin_unlock(&dlm->spinlock);
>> + mlog(ML_NOTICE, "%s:%.*s: node %u sends deref done "
>> + "but it is already derefed!\n", dlm->name,
>> + res->lockname.len, res->lockname.name, node);
>> + dlm_lockres_put(res);
> So we treat this case as normal?
> If so, we'd better return 0 other than -EINVAL.
>
> Thanks,
> Joseph
>
Good suggestion, I will fix this problem in the following [PATCH v2].
Thanks,
Jun Piao
>> + goto done;
>> + }
>> +
>> if (!list_empty(&res->purge)) {
>> mlog(0, "%s: Removing res %.*s from purgelist\n",
>> dlm->name, res->lockname.len, res->lockname.name);
>> @@ -2455,6 +2464,8 @@ int dlm_deref_lockres_done_handler(struct o2net_msg *msg, u32 len, void *data,
>>
>> spin_unlock(&dlm->spinlock);
>>
>> + ret = 0;
>> +
>> done:
>> dlm_put(dlm);
>> return ret;
>>
>
>
>
> .
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-07-11 2:17 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-10 10:01 [Ocfs2-devel] ocfs2/dlm: disable BUG_ON when DLM_LOCK_RES_DROPPING_REF, is cleared before dlm_deref_lockres_done_handler piaojun
2016-07-11 1:55 ` Joseph Qi
2016-07-11 2:17 ` piaojun
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).