* [Ocfs2-devel] [PATCH V2] ocfs2/dlm: fix a race between purge and migration
@ 2015-12-11 3:09 Xue jiufei
2015-12-14 2:56 ` Junxiao Bi
0 siblings, 1 reply; 2+ messages in thread
From: Xue jiufei @ 2015-12-11 3:09 UTC (permalink / raw)
To: ocfs2-devel
We found a race between purge and migration when doing code review. Node
A put lockres to purgelist before receiving the migrate message from node
B which is the master. Node A call dlm_mig_lockres_handler to handle
this message.
dlm_mig_lockres_handler
dlm_lookup_lockres
>>>>>> race window, dlm_run_purge_list may run and send
deref message to master, waiting the response
spin_lock(&res->spinlock);
res->state |= DLM_LOCK_RES_MIGRATING;
spin_unlock(&res->spinlock);
dlm_mig_lockres_handler returns
>>>>>> dlm_thread receives the response from master for the deref
message and triggers the BUG because the lockres has the state
DLM_LOCK_RES_MIGRATING with the following message:
dlm_purge_lockres:209 ERROR: 6633EB681FA7474A9C280A4E1A836F0F:
res M0000000000000000030c0300000000 in use after deref
Signed-off-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
---
fs/ocfs2/dlm/dlmrecovery.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
index 58eaa5c..4055909 100644
--- a/fs/ocfs2/dlm/dlmrecovery.c
+++ b/fs/ocfs2/dlm/dlmrecovery.c
@@ -1373,6 +1373,7 @@ int dlm_mig_lockres_handler(struct o2net_msg *msg, u32 len, void *data,
char *buf = NULL;
struct dlm_work_item *item = NULL;
struct dlm_lock_resource *res = NULL;
+ unsigned int hash;
if (!dlm_grab(dlm))
return -EINVAL;
@@ -1400,7 +1401,10 @@ int dlm_mig_lockres_handler(struct o2net_msg *msg, u32 len, void *data,
/* lookup the lock to see if we have a secondary queue for this
* already... just add the locks in and this will have its owner
* and RECOVERY flag changed when it completes. */
- res = dlm_lookup_lockres(dlm, mres->lockname, mres->lockname_len);
+ hash = dlm_lockid_hash(mres->lockname, mres->lockname_len);
+ spin_lock(&dlm->spinlock);
+ res = __dlm_lookup_lockres(dlm, mres->lockname, mres->lockname_len,
+ hash);
if (res) {
/* this will get a ref on res */
/* mark it as recovering/migrating and hash it */
@@ -1421,13 +1425,16 @@ int dlm_mig_lockres_handler(struct o2net_msg *msg, u32 len, void *data,
mres->lockname_len, mres->lockname);
ret = -EFAULT;
spin_unlock(&res->spinlock);
+ spin_unlock(&dlm->spinlock);
dlm_lockres_put(res);
goto leave;
}
res->state |= DLM_LOCK_RES_MIGRATING;
}
spin_unlock(&res->spinlock);
+ spin_unlock(&dlm->spinlock);
} else {
+ spin_unlock(&dlm->spinlock);
/* need to allocate, just like if it was
* mastered here normally */
res = dlm_new_lockres(dlm, mres->lockname, mres->lockname_len);
--
1.8.4.3
^ permalink raw reply related [flat|nested] 2+ messages in thread* [Ocfs2-devel] [PATCH V2] ocfs2/dlm: fix a race between purge and migration
2015-12-11 3:09 [Ocfs2-devel] [PATCH V2] ocfs2/dlm: fix a race between purge and migration Xue jiufei
@ 2015-12-14 2:56 ` Junxiao Bi
0 siblings, 0 replies; 2+ messages in thread
From: Junxiao Bi @ 2015-12-14 2:56 UTC (permalink / raw)
To: ocfs2-devel
On 12/11/2015 11:09 AM, Xue jiufei wrote:
> We found a race between purge and migration when doing code review. Node
> A put lockres to purgelist before receiving the migrate message from node
> B which is the master. Node A call dlm_mig_lockres_handler to handle
> this message.
>
> dlm_mig_lockres_handler
> dlm_lookup_lockres
> >>>>>> race window, dlm_run_purge_list may run and send
> deref message to master, waiting the response
> spin_lock(&res->spinlock);
> res->state |= DLM_LOCK_RES_MIGRATING;
> spin_unlock(&res->spinlock);
> dlm_mig_lockres_handler returns
>
> >>>>>> dlm_thread receives the response from master for the deref
> message and triggers the BUG because the lockres has the state
> DLM_LOCK_RES_MIGRATING with the following message:
>
> dlm_purge_lockres:209 ERROR: 6633EB681FA7474A9C280A4E1A836F0F:
> res M0000000000000000030c0300000000 in use after deref
>
> Signed-off-by: Jiufei Xue <xuejiufei@huawei.com>
> Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Looks good.
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
> ---
> fs/ocfs2/dlm/dlmrecovery.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> index 58eaa5c..4055909 100644
> --- a/fs/ocfs2/dlm/dlmrecovery.c
> +++ b/fs/ocfs2/dlm/dlmrecovery.c
> @@ -1373,6 +1373,7 @@ int dlm_mig_lockres_handler(struct o2net_msg *msg, u32 len, void *data,
> char *buf = NULL;
> struct dlm_work_item *item = NULL;
> struct dlm_lock_resource *res = NULL;
> + unsigned int hash;
>
> if (!dlm_grab(dlm))
> return -EINVAL;
> @@ -1400,7 +1401,10 @@ int dlm_mig_lockres_handler(struct o2net_msg *msg, u32 len, void *data,
> /* lookup the lock to see if we have a secondary queue for this
> * already... just add the locks in and this will have its owner
> * and RECOVERY flag changed when it completes. */
> - res = dlm_lookup_lockres(dlm, mres->lockname, mres->lockname_len);
> + hash = dlm_lockid_hash(mres->lockname, mres->lockname_len);
> + spin_lock(&dlm->spinlock);
> + res = __dlm_lookup_lockres(dlm, mres->lockname, mres->lockname_len,
> + hash);
> if (res) {
> /* this will get a ref on res */
> /* mark it as recovering/migrating and hash it */
> @@ -1421,13 +1425,16 @@ int dlm_mig_lockres_handler(struct o2net_msg *msg, u32 len, void *data,
> mres->lockname_len, mres->lockname);
> ret = -EFAULT;
> spin_unlock(&res->spinlock);
> + spin_unlock(&dlm->spinlock);
> dlm_lockres_put(res);
> goto leave;
> }
> res->state |= DLM_LOCK_RES_MIGRATING;
> }
> spin_unlock(&res->spinlock);
> + spin_unlock(&dlm->spinlock);
> } else {
> + spin_unlock(&dlm->spinlock);
> /* need to allocate, just like if it was
> * mastered here normally */
> res = dlm_new_lockres(dlm, mres->lockname, mres->lockname_len);
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2015-12-14 2:56 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-11 3:09 [Ocfs2-devel] [PATCH V2] ocfs2/dlm: fix a race between purge and migration Xue jiufei
2015-12-14 2:56 ` Junxiao Bi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.