* [Ocfs2-devel] [PATCH 1/1] dlm: fix a race in lockres mastery
@ 2010-03-22 23:50 Srinivas Eeda
2010-03-23 0:41 ` Sunil Mushran
0 siblings, 1 reply; 6+ messages in thread
From: Srinivas Eeda @ 2010-03-22 23:50 UTC (permalink / raw)
To: ocfs2-devel
In dlm_assert_master_handler, spinlock is released little sooner which creates
a window for two nodes to race during mastery.
In a scenario where node A had a head start during lock mastery and dlm
spinlock is just released on node B in dlm_assert_master_handler. Right then
a process on node B started to master the resource. It finds the mle but
doesn't find lockres. Since mle *master* is not set it creates a lockres and
waits for the mastery and doesn't send a master request.
dlm_assert_master_handler doesn't know about this and doesn't set
have_lockres_ref(doesn't set DLM_ASSERT_RESPONSE_MASTERY_REF) which creates a
hole that results in loss of refmap bit on the master node.
Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
---
fs/ocfs2/dlm/dlmmaster.c | 4 +---
1 files changed, 1 insertions(+), 3 deletions(-)
diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 83bcaf2..6e694b6 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -1875,7 +1875,6 @@ int dlm_assert_master_handler(struct o2net_msg *msg, u32 len, void *data,
ok:
spin_unlock(&res->spinlock);
}
- spin_unlock(&dlm->spinlock);
// mlog(0, "woo! got an assert_master from node %u!\n",
// assert->node_idx);
@@ -1926,7 +1925,6 @@ ok:
/* master is known, detach if not already detached.
* ensures that only one assert_master call will happen
* on this mle. */
- spin_lock(&dlm->spinlock);
spin_lock(&dlm->master_lock);
rr = atomic_read(&mle->mle_refs.refcount);
@@ -1959,7 +1957,6 @@ ok:
__dlm_put_mle(mle);
}
spin_unlock(&dlm->master_lock);
- spin_unlock(&dlm->spinlock);
} else if (res) {
if (res->owner != assert->node_idx) {
mlog(0, "assert_master from %u, but current "
@@ -1967,6 +1964,7 @@ ok:
res->owner, namelen, name);
}
}
+ spin_unlock(&dlm->spinlock);
done:
ret = 0;
--
1.5.6.5
^ permalink raw reply related [flat|nested] 6+ messages in thread* [Ocfs2-devel] [PATCH 1/1] dlm: fix a race in lockres mastery
2010-03-22 23:50 [Ocfs2-devel] [PATCH 1/1] dlm: fix a race in lockres mastery Srinivas Eeda
@ 2010-03-23 0:41 ` Sunil Mushran
2010-03-23 1:15 ` Joel Becker
0 siblings, 1 reply; 6+ messages in thread
From: Sunil Mushran @ 2010-03-23 0:41 UTC (permalink / raw)
To: ocfs2-devel
=================================================================
In o2dlm, the resource master maintains a (ref)map of all nodes that
are informed of the fact that that node masters that resource. This is
done to prevent the master from purging the resource before the other
node can create a lock.
This patch plugs a race between the mastery handler thread and the
mastery thread that allows the node to discover the master of the
resource without informing the master node.
Fixes ossbz#1012
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1012
=================================================================
Add my sob and change the comment to the above.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Srinivas Eeda wrote:
> In dlm_assert_master_handler, spinlock is released little sooner which creates
> a window for two nodes to race during mastery.
>
> In a scenario where node A had a head start during lock mastery and dlm
> spinlock is just released on node B in dlm_assert_master_handler. Right then
> a process on node B started to master the resource. It finds the mle but
> doesn't find lockres. Since mle *master* is not set it creates a lockres and
> waits for the mastery and doesn't send a master request.
>
> dlm_assert_master_handler doesn't know about this and doesn't set
> have_lockres_ref(doesn't set DLM_ASSERT_RESPONSE_MASTERY_REF) which creates a
> hole that results in loss of refmap bit on the master node.
>
> Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
> ---
> fs/ocfs2/dlm/dlmmaster.c | 4 +---
> 1 files changed, 1 insertions(+), 3 deletions(-)
>
> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
> index 83bcaf2..6e694b6 100644
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -1875,7 +1875,6 @@ int dlm_assert_master_handler(struct o2net_msg *msg, u32 len, void *data,
> ok:
> spin_unlock(&res->spinlock);
> }
> - spin_unlock(&dlm->spinlock);
>
> // mlog(0, "woo! got an assert_master from node %u!\n",
> // assert->node_idx);
> @@ -1926,7 +1925,6 @@ ok:
> /* master is known, detach if not already detached.
> * ensures that only one assert_master call will happen
> * on this mle. */
> - spin_lock(&dlm->spinlock);
> spin_lock(&dlm->master_lock);
>
> rr = atomic_read(&mle->mle_refs.refcount);
> @@ -1959,7 +1957,6 @@ ok:
> __dlm_put_mle(mle);
> }
> spin_unlock(&dlm->master_lock);
> - spin_unlock(&dlm->spinlock);
> } else if (res) {
> if (res->owner != assert->node_idx) {
> mlog(0, "assert_master from %u, but current "
> @@ -1967,6 +1964,7 @@ ok:
> res->owner, namelen, name);
> }
> }
> + spin_unlock(&dlm->spinlock);
>
> done:
> ret = 0;
^ permalink raw reply [flat|nested] 6+ messages in thread* [Ocfs2-devel] [PATCH 1/1] dlm: fix a race in lockres mastery
2010-03-23 0:41 ` Sunil Mushran
@ 2010-03-23 1:15 ` Joel Becker
2010-03-23 1:20 ` Sunil Mushran
0 siblings, 1 reply; 6+ messages in thread
From: Joel Becker @ 2010-03-23 1:15 UTC (permalink / raw)
To: ocfs2-devel
On Mon, Mar 22, 2010 at 05:41:43PM -0700, Sunil Mushran wrote:
> =================================================================
> In o2dlm, the resource master maintains a (ref)map of all nodes that
> are informed of the fact that that node masters that resource. This is
> done to prevent the master from purging the resource before the other
> node can create a lock.
>
> This patch plugs a race between the mastery handler thread and the
> mastery thread that allows the node to discover the master of the
> resource without informing the master node.
>
> Fixes ossbz#1012
> http://oss.oracle.com/bugzilla/show_bug.cgi?id=1012
> =================================================================
>
> Add my sob and change the comment to the above.
> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
The comment above makes even less sense to me than Srini's.
Hiss, while a little confusing to read, made me feel like I would get it
if I just looked at the code. This text is...mumbo-jumbo. "the
resource master maintains a map of all nodes that are informed of the
fact that that node masters that resource" indeed.
Let me see if I can parse it:
------------------------------------------
In o2dlm, the master of a lock resource keeps a map of all interested
nodes. This prevents the master from purging the resource before an
interested node can create a lock.
A race between the mastery thread and the mastery handler allowed an
interested node to discover the who the master is without informing the
master directly. This is easily fixed by holding the dlm spinlock a
little longer in the mastery handler.
------------------------------------------
How's that?
I also had a question about the patch. It is safe to hold the
dlm spinlock across all the other spinlocks, right? I believe that the
dlm spinlock is toplevel, so I'm pretty sure it is, but I wanted to
verify.
Joel
--
"Friends may come and go, but enemies accumulate."
- Thomas Jones
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 6+ messages in thread* [Ocfs2-devel] [PATCH 1/1] dlm: fix a race in lockres mastery
2010-03-23 1:15 ` Joel Becker
@ 2010-03-23 1:20 ` Sunil Mushran
2010-03-23 1:47 ` Joel Becker
0 siblings, 1 reply; 6+ messages in thread
From: Sunil Mushran @ 2010-03-23 1:20 UTC (permalink / raw)
To: ocfs2-devel
yes, your wording is better. and yes, dlm->spinlock is the
top level lock.
Joel Becker wrote:
> On Mon, Mar 22, 2010 at 05:41:43PM -0700, Sunil Mushran wrote:
>
>> =================================================================
>> In o2dlm, the resource master maintains a (ref)map of all nodes that
>> are informed of the fact that that node masters that resource. This is
>> done to prevent the master from purging the resource before the other
>> node can create a lock.
>>
>> This patch plugs a race between the mastery handler thread and the
>> mastery thread that allows the node to discover the master of the
>> resource without informing the master node.
>>
>> Fixes ossbz#1012
>> http://oss.oracle.com/bugzilla/show_bug.cgi?id=1012
>> =================================================================
>>
>> Add my sob and change the comment to the above.
>> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
>>
>
> The comment above makes even less sense to me than Srini's.
> Hiss, while a little confusing to read, made me feel like I would get it
> if I just looked at the code. This text is...mumbo-jumbo. "the
> resource master maintains a map of all nodes that are informed of the
> fact that that node masters that resource" indeed.
> Let me see if I can parse it:
>
> ------------------------------------------
> In o2dlm, the master of a lock resource keeps a map of all interested
> nodes. This prevents the master from purging the resource before an
> interested node can create a lock.
>
> A race between the mastery thread and the mastery handler allowed an
> interested node to discover the who the master is without informing the
> master directly. This is easily fixed by holding the dlm spinlock a
> little longer in the mastery handler.
> ------------------------------------------
>
> How's that?
> I also had a question about the patch. It is safe to hold the
> dlm spinlock across all the other spinlocks, right? I believe that the
> dlm spinlock is toplevel, so I'm pretty sure it is, but I wanted to
> verify.
>
> Joel
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Ocfs2-devel] [PATCH 1/1] dlm: fix a race in lockres mastery
2010-03-23 1:20 ` Sunil Mushran
@ 2010-03-23 1:47 ` Joel Becker
2010-03-23 2:21 ` SRINIVAS EEDA
0 siblings, 1 reply; 6+ messages in thread
From: Joel Becker @ 2010-03-23 1:47 UTC (permalink / raw)
To: ocfs2-devel
On Mon, Mar 22, 2010 at 06:20:32PM -0700, Sunil Mushran wrote:
> yes, your wording is better. and yes, dlm->spinlock is the
> top level lock.
This patch is now in the 'fixes' branch of ocfs2.git.
Joel
--
Life's Little Instruction Book #314
"Never underestimate the power of forgiveness."
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-03-23 2:21 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-22 23:50 [Ocfs2-devel] [PATCH 1/1] dlm: fix a race in lockres mastery Srinivas Eeda
2010-03-23 0:41 ` Sunil Mushran
2010-03-23 1:15 ` Joel Becker
2010-03-23 1:20 ` Sunil Mushran
2010-03-23 1:47 ` Joel Becker
2010-03-23 2:21 ` SRINIVAS EEDA
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.