* [Ocfs2-devel] [PATCH 1/1] o2dlm: fix a race between purge and master query
@ 2014-10-28 22:24 Srinivas Eeda
2014-10-29 1:42 ` Wengang
2014-10-29 8:04 ` Joseph Qi
0 siblings, 2 replies; 3+ messages in thread
From: Srinivas Eeda @ 2014-10-28 22:24 UTC (permalink / raw)
To: ocfs2-devel
Node A sends master query request to node B which is the master. At this time
lockres happens to be on purgelist. dlm_master_request_handler gets the dlm
spinlock, finds the resource and releases the dlm spin lock. Right at this
dlm_thread on this node could purge the lockres. dlm_master_request_handler
can then acquire lockres spinlock and reply to Node A that node B is the
master even though lockres on node B is purged.
The above scenario will now make node A falsely think node B is the master
which is inconsistent. Further if another node C tries to master the same
resource, every node will respond they are not the master. Node C then masters
the resource and sends assert master to all nodes. This will now make node A
crash with the following message.
dlm_assert_master_handler:1831 ERROR: DIE! Mastery assert from 9, but current
owner is 10!
Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
---
fs/ocfs2/dlm/dlmmaster.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 215e41a..3689b35 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -1460,6 +1460,18 @@ way_up_top:
/* take care of the easy cases up front */
spin_lock(&res->spinlock);
+
+ /*
+ * Right after dlm spinlock was released, dlm_thread could have
+ * purged the lockres. Check if lockres got unhashed. If so
+ * start over.
+ */
+ if (hlist_unhashed(&res->hash_node)) {
+ spin_unlock(&res->spinlock);
+ dlm_lockres_put(res);
+ goto way_up_top;
+ }
+
if (res->state & (DLM_LOCK_RES_RECOVERING|
DLM_LOCK_RES_MIGRATING)) {
spin_unlock(&res->spinlock);
--
1.9.1
^ permalink raw reply related [flat|nested] 3+ messages in thread* [Ocfs2-devel] [PATCH 1/1] o2dlm: fix a race between purge and master query
2014-10-28 22:24 [Ocfs2-devel] [PATCH 1/1] o2dlm: fix a race between purge and master query Srinivas Eeda
@ 2014-10-29 1:42 ` Wengang
2014-10-29 8:04 ` Joseph Qi
1 sibling, 0 replies; 3+ messages in thread
From: Wengang @ 2014-10-29 1:42 UTC (permalink / raw)
To: ocfs2-devel
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
? 2014?10?29? 06:24, Srinivas Eeda ??:
> Node A sends master query request to node B which is the master. At this time
> lockres happens to be on purgelist. dlm_master_request_handler gets the dlm
> spinlock, finds the resource and releases the dlm spin lock. Right at this
> dlm_thread on this node could purge the lockres. dlm_master_request_handler
> can then acquire lockres spinlock and reply to Node A that node B is the
> master even though lockres on node B is purged.
>
> The above scenario will now make node A falsely think node B is the master
> which is inconsistent. Further if another node C tries to master the same
> resource, every node will respond they are not the master. Node C then masters
> the resource and sends assert master to all nodes. This will now make node A
> crash with the following message.
>
> dlm_assert_master_handler:1831 ERROR: DIE! Mastery assert from 9, but current
> owner is 10!
>
> Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
> ---
> fs/ocfs2/dlm/dlmmaster.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
> index 215e41a..3689b35 100644
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -1460,6 +1460,18 @@ way_up_top:
>
> /* take care of the easy cases up front */
> spin_lock(&res->spinlock);
> +
> + /*
> + * Right after dlm spinlock was released, dlm_thread could have
> + * purged the lockres. Check if lockres got unhashed. If so
> + * start over.
> + */
> + if (hlist_unhashed(&res->hash_node)) {
> + spin_unlock(&res->spinlock);
> + dlm_lockres_put(res);
> + goto way_up_top;
> + }
> +
> if (res->state & (DLM_LOCK_RES_RECOVERING|
> DLM_LOCK_RES_MIGRATING)) {
> spin_unlock(&res->spinlock);
^ permalink raw reply [flat|nested] 3+ messages in thread* [Ocfs2-devel] [PATCH 1/1] o2dlm: fix a race between purge and master query
2014-10-28 22:24 [Ocfs2-devel] [PATCH 1/1] o2dlm: fix a race between purge and master query Srinivas Eeda
2014-10-29 1:42 ` Wengang
@ 2014-10-29 8:04 ` Joseph Qi
1 sibling, 0 replies; 3+ messages in thread
From: Joseph Qi @ 2014-10-29 8:04 UTC (permalink / raw)
To: ocfs2-devel
We tested this patch and it works well.
Thanks.
Tested-by: Joseph Qi <joseph.qi@huawei.com>
On 2014/10/29 6:24, Srinivas Eeda wrote:
> Node A sends master query request to node B which is the master. At this time
> lockres happens to be on purgelist. dlm_master_request_handler gets the dlm
> spinlock, finds the resource and releases the dlm spin lock. Right at this
> dlm_thread on this node could purge the lockres. dlm_master_request_handler
> can then acquire lockres spinlock and reply to Node A that node B is the
> master even though lockres on node B is purged.
>
> The above scenario will now make node A falsely think node B is the master
> which is inconsistent. Further if another node C tries to master the same
> resource, every node will respond they are not the master. Node C then masters
> the resource and sends assert master to all nodes. This will now make node A
> crash with the following message.
>
> dlm_assert_master_handler:1831 ERROR: DIE! Mastery assert from 9, but current
> owner is 10!
>
> Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
> ---
> fs/ocfs2/dlm/dlmmaster.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
> index 215e41a..3689b35 100644
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -1460,6 +1460,18 @@ way_up_top:
>
> /* take care of the easy cases up front */
> spin_lock(&res->spinlock);
> +
> + /*
> + * Right after dlm spinlock was released, dlm_thread could have
> + * purged the lockres. Check if lockres got unhashed. If so
> + * start over.
> + */
> + if (hlist_unhashed(&res->hash_node)) {
> + spin_unlock(&res->spinlock);
> + dlm_lockres_put(res);
> + goto way_up_top;
> + }
> +
> if (res->state & (DLM_LOCK_RES_RECOVERING|
> DLM_LOCK_RES_MIGRATING)) {
> spin_unlock(&res->spinlock);
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-10-29 8:04 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-28 22:24 [Ocfs2-devel] [PATCH 1/1] o2dlm: fix a race between purge and master query Srinivas Eeda
2014-10-29 1:42 ` Wengang
2014-10-29 8:04 ` Joseph Qi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.