* [Ocfs2-devel] [PATCH] ocfs2/dlm: remove potential deadlock
@ 2010-06-16 6:52 Wengang Wang
2010-06-21 5:31 ` Wengang Wang
2010-06-21 13:20 ` Wengang Wang
0 siblings, 2 replies; 5+ messages in thread
From: Wengang Wang @ 2010-06-16 6:52 UTC (permalink / raw)
To: ocfs2-devel
When we need to take both dlm_domain_lock and dlm->spinlock, we should take
them in order of:
dlm_domain_lock then dlm->spinlock.
There is pathes disobey this order. That is calling dlm_lockres_put with dlm->
spinlock held. dlm_lockres_put() finally calls dlm_put() which take
dlm_domain_lock.
The fix is moving the locking on dlm_domain_lock to dlm_ctxt_release() from
dlm_put(). dlm_ctxt_release() is only called on the release of the last
reference. Any path should not be holding dlm->spinlock when dropping the "last"
reference.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
---
fs/ocfs2/dlm/dlmdomain.c | 12 ++++--------
1 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
index ab82add..754baf2 100644
--- a/fs/ocfs2/dlm/dlmdomain.c
+++ b/fs/ocfs2/dlm/dlmdomain.c
@@ -321,28 +321,24 @@ static void dlm_ctxt_release(struct kref *kref)
dlm = container_of(kref, struct dlm_ctxt, dlm_refs);
+ if (spin_is_locked(&dlm->spinlock))
+ BUG();
BUG_ON(dlm->num_joins);
BUG_ON(dlm->dlm_state == DLM_CTXT_JOINED);
+ spin_lock(&dlm_domain_lock);
/* we may still be in the list if we hit an error during join. */
list_del_init(&dlm->list);
-
spin_unlock(&dlm_domain_lock);
- mlog(0, "freeing memory from domain %s\n", dlm->name);
-
wake_up(&dlm_domain_events);
-
+ mlog(0, "freeing memory from domain %s\n", dlm->name);
dlm_free_ctxt_mem(dlm);
-
- spin_lock(&dlm_domain_lock);
}
void dlm_put(struct dlm_ctxt *dlm)
{
- spin_lock(&dlm_domain_lock);
kref_put(&dlm->dlm_refs, dlm_ctxt_release);
- spin_unlock(&dlm_domain_lock);
}
static void __dlm_get(struct dlm_ctxt *dlm)
--
1.6.6.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2/dlm: remove potential deadlock
2010-06-16 6:52 [Ocfs2-devel] [PATCH] ocfs2/dlm: remove potential deadlock Wengang Wang
@ 2010-06-21 5:31 ` Wengang Wang
2010-06-21 7:33 ` Tao Ma
2010-06-21 13:20 ` Wengang Wang
1 sibling, 1 reply; 5+ messages in thread
From: Wengang Wang @ 2010-06-21 5:31 UTC (permalink / raw)
To: ocfs2-devel
Why atomic operations on dlm_refs need spinlock's protect?
/* NOTE: Next three are protected by dlm_domain_lock */
struct kref dlm_refs;
enum dlm_ctxt_state dlm_state;
unsigned int num_joins;
regards,
wengang.
On 10-06-16 14:52, Wengang Wang wrote:
> When we need to take both dlm_domain_lock and dlm->spinlock, we should take
> them in order of:
> dlm_domain_lock then dlm->spinlock.
>
> There is pathes disobey this order. That is calling dlm_lockres_put with dlm->
> spinlock held. dlm_lockres_put() finally calls dlm_put() which take
> dlm_domain_lock.
>
> The fix is moving the locking on dlm_domain_lock to dlm_ctxt_release() from
> dlm_put(). dlm_ctxt_release() is only called on the release of the last
> reference. Any path should not be holding dlm->spinlock when dropping the "last"
> reference.
>
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> ---
> fs/ocfs2/dlm/dlmdomain.c | 12 ++++--------
> 1 files changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
> index ab82add..754baf2 100644
> --- a/fs/ocfs2/dlm/dlmdomain.c
> +++ b/fs/ocfs2/dlm/dlmdomain.c
> @@ -321,28 +321,24 @@ static void dlm_ctxt_release(struct kref *kref)
>
> dlm = container_of(kref, struct dlm_ctxt, dlm_refs);
>
> + if (spin_is_locked(&dlm->spinlock))
> + BUG();
> BUG_ON(dlm->num_joins);
> BUG_ON(dlm->dlm_state == DLM_CTXT_JOINED);
>
> + spin_lock(&dlm_domain_lock);
> /* we may still be in the list if we hit an error during join. */
> list_del_init(&dlm->list);
> -
> spin_unlock(&dlm_domain_lock);
>
> - mlog(0, "freeing memory from domain %s\n", dlm->name);
> -
> wake_up(&dlm_domain_events);
> -
> + mlog(0, "freeing memory from domain %s\n", dlm->name);
> dlm_free_ctxt_mem(dlm);
> -
> - spin_lock(&dlm_domain_lock);
> }
>
> void dlm_put(struct dlm_ctxt *dlm)
> {
> - spin_lock(&dlm_domain_lock);
> kref_put(&dlm->dlm_refs, dlm_ctxt_release);
> - spin_unlock(&dlm_domain_lock);
> }
>
> static void __dlm_get(struct dlm_ctxt *dlm)
> --
> 1.6.6.1
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2/dlm: remove potential deadlock
2010-06-21 5:31 ` Wengang Wang
@ 2010-06-21 7:33 ` Tao Ma
2010-06-21 13:44 ` Wengang Wang
0 siblings, 1 reply; 5+ messages in thread
From: Tao Ma @ 2010-06-21 7:33 UTC (permalink / raw)
To: ocfs2-devel
hi wengang,
On 06/21/2010 01:31 PM, Wengang Wang wrote:
> Why atomic operations on dlm_refs need spinlock's protect?
>
> /* NOTE: Next three are protected by dlm_domain_lock */
> struct kref dlm_refs;
> enum dlm_ctxt_state dlm_state;
> unsigned int num_joins;
I don't dive into the code, so if there is something wrong, please
correct me.
AFAICS, kref_put isn't an atomic operation, it need to call release to
free the container of kref.
Regards,
Tao
>
> regards,
> wengang.
>
> On 10-06-16 14:52, Wengang Wang wrote:
>> When we need to take both dlm_domain_lock and dlm->spinlock, we should take
>> them in order of:
>> dlm_domain_lock then dlm->spinlock.
>>
>> There is pathes disobey this order. That is calling dlm_lockres_put with dlm->
>> spinlock held. dlm_lockres_put() finally calls dlm_put() which take
>> dlm_domain_lock.
>>
>> The fix is moving the locking on dlm_domain_lock to dlm_ctxt_release() from
>> dlm_put(). dlm_ctxt_release() is only called on the release of the last
>> reference. Any path should not be holding dlm->spinlock when dropping the "last"
>> reference.
>>
>> Signed-off-by: Wengang Wang<wen.gang.wang@oracle.com>
>> ---
>> fs/ocfs2/dlm/dlmdomain.c | 12 ++++--------
>> 1 files changed, 4 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
>> index ab82add..754baf2 100644
>> --- a/fs/ocfs2/dlm/dlmdomain.c
>> +++ b/fs/ocfs2/dlm/dlmdomain.c
>> @@ -321,28 +321,24 @@ static void dlm_ctxt_release(struct kref *kref)
>>
>> dlm = container_of(kref, struct dlm_ctxt, dlm_refs);
>>
>> + if (spin_is_locked(&dlm->spinlock))
>> + BUG();
>> BUG_ON(dlm->num_joins);
>> BUG_ON(dlm->dlm_state == DLM_CTXT_JOINED);
>>
>> + spin_lock(&dlm_domain_lock);
>> /* we may still be in the list if we hit an error during join. */
>> list_del_init(&dlm->list);
>> -
>> spin_unlock(&dlm_domain_lock);
>>
>> - mlog(0, "freeing memory from domain %s\n", dlm->name);
>> -
>> wake_up(&dlm_domain_events);
>> -
>> + mlog(0, "freeing memory from domain %s\n", dlm->name);
>> dlm_free_ctxt_mem(dlm);
>> -
>> - spin_lock(&dlm_domain_lock);
>> }
>>
>> void dlm_put(struct dlm_ctxt *dlm)
>> {
>> - spin_lock(&dlm_domain_lock);
>> kref_put(&dlm->dlm_refs, dlm_ctxt_release);
>> - spin_unlock(&dlm_domain_lock);
>> }
>>
>> static void __dlm_get(struct dlm_ctxt *dlm)
>> --
>> 1.6.6.1
>>
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2/dlm: remove potential deadlock
2010-06-16 6:52 [Ocfs2-devel] [PATCH] ocfs2/dlm: remove potential deadlock Wengang Wang
2010-06-21 5:31 ` Wengang Wang
@ 2010-06-21 13:20 ` Wengang Wang
1 sibling, 0 replies; 5+ messages in thread
From: Wengang Wang @ 2010-06-21 13:20 UTC (permalink / raw)
To: ocfs2-devel
This patch is not good, please ignore it. I will post a revised one.
regards,
wengang.
On 10-06-16 14:52, Wengang Wang wrote:
> When we need to take both dlm_domain_lock and dlm->spinlock, we should take
> them in order of:
> dlm_domain_lock then dlm->spinlock.
>
> There is pathes disobey this order. That is calling dlm_lockres_put with dlm->
> spinlock held. dlm_lockres_put() finally calls dlm_put() which take
> dlm_domain_lock.
>
> The fix is moving the locking on dlm_domain_lock to dlm_ctxt_release() from
> dlm_put(). dlm_ctxt_release() is only called on the release of the last
> reference. Any path should not be holding dlm->spinlock when dropping the "last"
> reference.
>
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> ---
> fs/ocfs2/dlm/dlmdomain.c | 12 ++++--------
> 1 files changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
> index ab82add..754baf2 100644
> --- a/fs/ocfs2/dlm/dlmdomain.c
> +++ b/fs/ocfs2/dlm/dlmdomain.c
> @@ -321,28 +321,24 @@ static void dlm_ctxt_release(struct kref *kref)
>
> dlm = container_of(kref, struct dlm_ctxt, dlm_refs);
>
> + if (spin_is_locked(&dlm->spinlock))
> + BUG();
> BUG_ON(dlm->num_joins);
> BUG_ON(dlm->dlm_state == DLM_CTXT_JOINED);
>
> + spin_lock(&dlm_domain_lock);
> /* we may still be in the list if we hit an error during join. */
> list_del_init(&dlm->list);
> -
> spin_unlock(&dlm_domain_lock);
>
> - mlog(0, "freeing memory from domain %s\n", dlm->name);
> -
> wake_up(&dlm_domain_events);
> -
> + mlog(0, "freeing memory from domain %s\n", dlm->name);
> dlm_free_ctxt_mem(dlm);
> -
> - spin_lock(&dlm_domain_lock);
> }
>
> void dlm_put(struct dlm_ctxt *dlm)
> {
> - spin_lock(&dlm_domain_lock);
> kref_put(&dlm->dlm_refs, dlm_ctxt_release);
> - spin_unlock(&dlm_domain_lock);
> }
>
> static void __dlm_get(struct dlm_ctxt *dlm)
> --
> 1.6.6.1
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Ocfs2-devel] [PATCH] ocfs2/dlm: remove potential deadlock
2010-06-21 7:33 ` Tao Ma
@ 2010-06-21 13:44 ` Wengang Wang
0 siblings, 0 replies; 5+ messages in thread
From: Wengang Wang @ 2010-06-21 13:44 UTC (permalink / raw)
To: ocfs2-devel
On 10-06-21 15:33, Tao Ma wrote:
> hi wengang,
>
> On 06/21/2010 01:31 PM, Wengang Wang wrote:
> >Why atomic operations on dlm_refs need spinlock's protect?
> >
> > /* NOTE: Next three are protected by dlm_domain_lock */
> > struct kref dlm_refs;
> > enum dlm_ctxt_state dlm_state;
> > unsigned int num_joins;
> I don't dive into the code, so if there is something wrong, please
> correct me.
> AFAICS, kref_put isn't an atomic operation, it need to call release
> to free the container of kref.
Yes, it's bad :-(.
regards,
wengang.
>
> Regards,
> Tao
> >
> >regards,
> >wengang.
> >
> >On 10-06-16 14:52, Wengang Wang wrote:
> >>When we need to take both dlm_domain_lock and dlm->spinlock, we should take
> >>them in order of:
> >>dlm_domain_lock then dlm->spinlock.
> >>
> >>There is pathes disobey this order. That is calling dlm_lockres_put with dlm->
> >>spinlock held. dlm_lockres_put() finally calls dlm_put() which take
> >>dlm_domain_lock.
> >>
> >>The fix is moving the locking on dlm_domain_lock to dlm_ctxt_release() from
> >>dlm_put(). dlm_ctxt_release() is only called on the release of the last
> >>reference. Any path should not be holding dlm->spinlock when dropping the "last"
> >>reference.
> >>
> >>Signed-off-by: Wengang Wang<wen.gang.wang@oracle.com>
> >>---
> >> fs/ocfs2/dlm/dlmdomain.c | 12 ++++--------
> >> 1 files changed, 4 insertions(+), 8 deletions(-)
> >>
> >>diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
> >>index ab82add..754baf2 100644
> >>--- a/fs/ocfs2/dlm/dlmdomain.c
> >>+++ b/fs/ocfs2/dlm/dlmdomain.c
> >>@@ -321,28 +321,24 @@ static void dlm_ctxt_release(struct kref *kref)
> >>
> >> dlm = container_of(kref, struct dlm_ctxt, dlm_refs);
> >>
> >>+ if (spin_is_locked(&dlm->spinlock))
> >>+ BUG();
> >> BUG_ON(dlm->num_joins);
> >> BUG_ON(dlm->dlm_state == DLM_CTXT_JOINED);
> >>
> >>+ spin_lock(&dlm_domain_lock);
> >> /* we may still be in the list if we hit an error during join. */
> >> list_del_init(&dlm->list);
> >>-
> >> spin_unlock(&dlm_domain_lock);
> >>
> >>- mlog(0, "freeing memory from domain %s\n", dlm->name);
> >>-
> >> wake_up(&dlm_domain_events);
> >>-
> >>+ mlog(0, "freeing memory from domain %s\n", dlm->name);
> >> dlm_free_ctxt_mem(dlm);
> >>-
> >>- spin_lock(&dlm_domain_lock);
> >> }
> >>
> >> void dlm_put(struct dlm_ctxt *dlm)
> >> {
> >>- spin_lock(&dlm_domain_lock);
> >> kref_put(&dlm->dlm_refs, dlm_ctxt_release);
> >>- spin_unlock(&dlm_domain_lock);
> >> }
> >>
> >> static void __dlm_get(struct dlm_ctxt *dlm)
> >>--
> >>1.6.6.1
> >>
> >>
> >>_______________________________________________
> >>Ocfs2-devel mailing list
> >>Ocfs2-devel at oss.oracle.com
> >>http://oss.oracle.com/mailman/listinfo/ocfs2-devel
> >
> >_______________________________________________
> >Ocfs2-devel mailing list
> >Ocfs2-devel at oss.oracle.com
> >http://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-06-21 13:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-16 6:52 [Ocfs2-devel] [PATCH] ocfs2/dlm: remove potential deadlock Wengang Wang
2010-06-21 5:31 ` Wengang Wang
2010-06-21 7:33 ` Tao Ma
2010-06-21 13:44 ` Wengang Wang
2010-06-21 13:20 ` Wengang Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).