ocfs2-devel.oss.oracle.com archive mirror
 help / color / mirror / Atom feed
* [Ocfs2-devel] [PATCH] ocfs2/dlm: remove potential deadlock
@ 2010-06-16  6:52 Wengang Wang
  2010-06-21  5:31 ` Wengang Wang
  2010-06-21 13:20 ` Wengang Wang
  0 siblings, 2 replies; 5+ messages in thread
From: Wengang Wang @ 2010-06-16  6:52 UTC (permalink / raw)
  To: ocfs2-devel

When we need to take both dlm_domain_lock and dlm->spinlock, we should take
them in order of:
dlm_domain_lock then dlm->spinlock.

There is pathes disobey this order. That is calling dlm_lockres_put with dlm->
spinlock held. dlm_lockres_put() finally calls dlm_put() which take
dlm_domain_lock.

The fix is moving the locking on dlm_domain_lock to dlm_ctxt_release() from
dlm_put(). dlm_ctxt_release() is only called on the release of the last
reference. Any path should not be holding dlm->spinlock when dropping the "last"
reference.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
---
 fs/ocfs2/dlm/dlmdomain.c |   12 ++++--------
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
index ab82add..754baf2 100644
--- a/fs/ocfs2/dlm/dlmdomain.c
+++ b/fs/ocfs2/dlm/dlmdomain.c
@@ -321,28 +321,24 @@ static void dlm_ctxt_release(struct kref *kref)
 
 	dlm = container_of(kref, struct dlm_ctxt, dlm_refs);
 
+	if (spin_is_locked(&dlm->spinlock))
+		BUG();
 	BUG_ON(dlm->num_joins);
 	BUG_ON(dlm->dlm_state == DLM_CTXT_JOINED);
 
+	spin_lock(&dlm_domain_lock);
 	/* we may still be in the list if we hit an error during join. */
 	list_del_init(&dlm->list);
-
 	spin_unlock(&dlm_domain_lock);
 
-	mlog(0, "freeing memory from domain %s\n", dlm->name);
-
 	wake_up(&dlm_domain_events);
-
+	mlog(0, "freeing memory from domain %s\n", dlm->name);
 	dlm_free_ctxt_mem(dlm);
-
-	spin_lock(&dlm_domain_lock);
 }
 
 void dlm_put(struct dlm_ctxt *dlm)
 {
-	spin_lock(&dlm_domain_lock);
 	kref_put(&dlm->dlm_refs, dlm_ctxt_release);
-	spin_unlock(&dlm_domain_lock);
 }
 
 static void __dlm_get(struct dlm_ctxt *dlm)
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2/dlm: remove potential deadlock
  2010-06-16  6:52 [Ocfs2-devel] [PATCH] ocfs2/dlm: remove potential deadlock Wengang Wang
@ 2010-06-21  5:31 ` Wengang Wang
  2010-06-21  7:33   ` Tao Ma
  2010-06-21 13:20 ` Wengang Wang
  1 sibling, 1 reply; 5+ messages in thread
From: Wengang Wang @ 2010-06-21  5:31 UTC (permalink / raw)
  To: ocfs2-devel

Why atomic operations on dlm_refs need spinlock's protect?

        /* NOTE: Next three are protected by dlm_domain_lock */
        struct kref dlm_refs;
        enum dlm_ctxt_state dlm_state;
        unsigned int num_joins;

regards,
wengang.

On 10-06-16 14:52, Wengang Wang wrote:
> When we need to take both dlm_domain_lock and dlm->spinlock, we should take
> them in order of:
> dlm_domain_lock then dlm->spinlock.
> 
> There is pathes disobey this order. That is calling dlm_lockres_put with dlm->
> spinlock held. dlm_lockres_put() finally calls dlm_put() which take
> dlm_domain_lock.
> 
> The fix is moving the locking on dlm_domain_lock to dlm_ctxt_release() from
> dlm_put(). dlm_ctxt_release() is only called on the release of the last
> reference. Any path should not be holding dlm->spinlock when dropping the "last"
> reference.
> 
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> ---
>  fs/ocfs2/dlm/dlmdomain.c |   12 ++++--------
>  1 files changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
> index ab82add..754baf2 100644
> --- a/fs/ocfs2/dlm/dlmdomain.c
> +++ b/fs/ocfs2/dlm/dlmdomain.c
> @@ -321,28 +321,24 @@ static void dlm_ctxt_release(struct kref *kref)
>  
>  	dlm = container_of(kref, struct dlm_ctxt, dlm_refs);
>  
> +	if (spin_is_locked(&dlm->spinlock))
> +		BUG();
>  	BUG_ON(dlm->num_joins);
>  	BUG_ON(dlm->dlm_state == DLM_CTXT_JOINED);
>  
> +	spin_lock(&dlm_domain_lock);
>  	/* we may still be in the list if we hit an error during join. */
>  	list_del_init(&dlm->list);
> -
>  	spin_unlock(&dlm_domain_lock);
>  
> -	mlog(0, "freeing memory from domain %s\n", dlm->name);
> -
>  	wake_up(&dlm_domain_events);
> -
> +	mlog(0, "freeing memory from domain %s\n", dlm->name);
>  	dlm_free_ctxt_mem(dlm);
> -
> -	spin_lock(&dlm_domain_lock);
>  }
>  
>  void dlm_put(struct dlm_ctxt *dlm)
>  {
> -	spin_lock(&dlm_domain_lock);
>  	kref_put(&dlm->dlm_refs, dlm_ctxt_release);
> -	spin_unlock(&dlm_domain_lock);
>  }
>  
>  static void __dlm_get(struct dlm_ctxt *dlm)
> -- 
> 1.6.6.1
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2/dlm: remove potential deadlock
  2010-06-21  5:31 ` Wengang Wang
@ 2010-06-21  7:33   ` Tao Ma
  2010-06-21 13:44     ` Wengang Wang
  0 siblings, 1 reply; 5+ messages in thread
From: Tao Ma @ 2010-06-21  7:33 UTC (permalink / raw)
  To: ocfs2-devel

hi wengang,

On 06/21/2010 01:31 PM, Wengang Wang wrote:
> Why atomic operations on dlm_refs need spinlock's protect?
>
>          /* NOTE: Next three are protected by dlm_domain_lock */
>          struct kref dlm_refs;
>          enum dlm_ctxt_state dlm_state;
>          unsigned int num_joins;
I don't dive into the code, so if there is something wrong, please 
correct me.
AFAICS, kref_put isn't an atomic operation, it need to call release to 
free the container of kref.

Regards,
Tao
>
> regards,
> wengang.
>
> On 10-06-16 14:52, Wengang Wang wrote:
>> When we need to take both dlm_domain_lock and dlm->spinlock, we should take
>> them in order of:
>> dlm_domain_lock then dlm->spinlock.
>>
>> There is pathes disobey this order. That is calling dlm_lockres_put with dlm->
>> spinlock held. dlm_lockres_put() finally calls dlm_put() which take
>> dlm_domain_lock.
>>
>> The fix is moving the locking on dlm_domain_lock to dlm_ctxt_release() from
>> dlm_put(). dlm_ctxt_release() is only called on the release of the last
>> reference. Any path should not be holding dlm->spinlock when dropping the "last"
>> reference.
>>
>> Signed-off-by: Wengang Wang<wen.gang.wang@oracle.com>
>> ---
>>   fs/ocfs2/dlm/dlmdomain.c |   12 ++++--------
>>   1 files changed, 4 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
>> index ab82add..754baf2 100644
>> --- a/fs/ocfs2/dlm/dlmdomain.c
>> +++ b/fs/ocfs2/dlm/dlmdomain.c
>> @@ -321,28 +321,24 @@ static void dlm_ctxt_release(struct kref *kref)
>>
>>   	dlm = container_of(kref, struct dlm_ctxt, dlm_refs);
>>
>> +	if (spin_is_locked(&dlm->spinlock))
>> +		BUG();
>>   	BUG_ON(dlm->num_joins);
>>   	BUG_ON(dlm->dlm_state == DLM_CTXT_JOINED);
>>
>> +	spin_lock(&dlm_domain_lock);
>>   	/* we may still be in the list if we hit an error during join. */
>>   	list_del_init(&dlm->list);
>> -
>>   	spin_unlock(&dlm_domain_lock);
>>
>> -	mlog(0, "freeing memory from domain %s\n", dlm->name);
>> -
>>   	wake_up(&dlm_domain_events);
>> -
>> +	mlog(0, "freeing memory from domain %s\n", dlm->name);
>>   	dlm_free_ctxt_mem(dlm);
>> -
>> -	spin_lock(&dlm_domain_lock);
>>   }
>>
>>   void dlm_put(struct dlm_ctxt *dlm)
>>   {
>> -	spin_lock(&dlm_domain_lock);
>>   	kref_put(&dlm->dlm_refs, dlm_ctxt_release);
>> -	spin_unlock(&dlm_domain_lock);
>>   }
>>
>>   static void __dlm_get(struct dlm_ctxt *dlm)
>> --
>> 1.6.6.1
>>
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2/dlm: remove potential deadlock
  2010-06-16  6:52 [Ocfs2-devel] [PATCH] ocfs2/dlm: remove potential deadlock Wengang Wang
  2010-06-21  5:31 ` Wengang Wang
@ 2010-06-21 13:20 ` Wengang Wang
  1 sibling, 0 replies; 5+ messages in thread
From: Wengang Wang @ 2010-06-21 13:20 UTC (permalink / raw)
  To: ocfs2-devel

This patch is not good, please ignore it. I will post a revised one.

regards,
wengang.
On 10-06-16 14:52, Wengang Wang wrote:
> When we need to take both dlm_domain_lock and dlm->spinlock, we should take
> them in order of:
> dlm_domain_lock then dlm->spinlock.
> 
> There is pathes disobey this order. That is calling dlm_lockres_put with dlm->
> spinlock held. dlm_lockres_put() finally calls dlm_put() which take
> dlm_domain_lock.
> 
> The fix is moving the locking on dlm_domain_lock to dlm_ctxt_release() from
> dlm_put(). dlm_ctxt_release() is only called on the release of the last
> reference. Any path should not be holding dlm->spinlock when dropping the "last"
> reference.
> 
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> ---
>  fs/ocfs2/dlm/dlmdomain.c |   12 ++++--------
>  1 files changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
> index ab82add..754baf2 100644
> --- a/fs/ocfs2/dlm/dlmdomain.c
> +++ b/fs/ocfs2/dlm/dlmdomain.c
> @@ -321,28 +321,24 @@ static void dlm_ctxt_release(struct kref *kref)
>  
>  	dlm = container_of(kref, struct dlm_ctxt, dlm_refs);
>  
> +	if (spin_is_locked(&dlm->spinlock))
> +		BUG();
>  	BUG_ON(dlm->num_joins);
>  	BUG_ON(dlm->dlm_state == DLM_CTXT_JOINED);
>  
> +	spin_lock(&dlm_domain_lock);
>  	/* we may still be in the list if we hit an error during join. */
>  	list_del_init(&dlm->list);
> -
>  	spin_unlock(&dlm_domain_lock);
>  
> -	mlog(0, "freeing memory from domain %s\n", dlm->name);
> -
>  	wake_up(&dlm_domain_events);
> -
> +	mlog(0, "freeing memory from domain %s\n", dlm->name);
>  	dlm_free_ctxt_mem(dlm);
> -
> -	spin_lock(&dlm_domain_lock);
>  }
>  
>  void dlm_put(struct dlm_ctxt *dlm)
>  {
> -	spin_lock(&dlm_domain_lock);
>  	kref_put(&dlm->dlm_refs, dlm_ctxt_release);
> -	spin_unlock(&dlm_domain_lock);
>  }
>  
>  static void __dlm_get(struct dlm_ctxt *dlm)
> -- 
> 1.6.6.1
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Ocfs2-devel] [PATCH] ocfs2/dlm: remove potential deadlock
  2010-06-21  7:33   ` Tao Ma
@ 2010-06-21 13:44     ` Wengang Wang
  0 siblings, 0 replies; 5+ messages in thread
From: Wengang Wang @ 2010-06-21 13:44 UTC (permalink / raw)
  To: ocfs2-devel

On 10-06-21 15:33, Tao Ma wrote:
> hi wengang,
> 
> On 06/21/2010 01:31 PM, Wengang Wang wrote:
> >Why atomic operations on dlm_refs need spinlock's protect?
> >
> >         /* NOTE: Next three are protected by dlm_domain_lock */
> >         struct kref dlm_refs;
> >         enum dlm_ctxt_state dlm_state;
> >         unsigned int num_joins;
> I don't dive into the code, so if there is something wrong, please
> correct me.
> AFAICS, kref_put isn't an atomic operation, it need to call release
> to free the container of kref.

Yes, it's bad :-(.

regards,
wengang.
> 
> Regards,
> Tao
> >
> >regards,
> >wengang.
> >
> >On 10-06-16 14:52, Wengang Wang wrote:
> >>When we need to take both dlm_domain_lock and dlm->spinlock, we should take
> >>them in order of:
> >>dlm_domain_lock then dlm->spinlock.
> >>
> >>There is pathes disobey this order. That is calling dlm_lockres_put with dlm->
> >>spinlock held. dlm_lockres_put() finally calls dlm_put() which take
> >>dlm_domain_lock.
> >>
> >>The fix is moving the locking on dlm_domain_lock to dlm_ctxt_release() from
> >>dlm_put(). dlm_ctxt_release() is only called on the release of the last
> >>reference. Any path should not be holding dlm->spinlock when dropping the "last"
> >>reference.
> >>
> >>Signed-off-by: Wengang Wang<wen.gang.wang@oracle.com>
> >>---
> >>  fs/ocfs2/dlm/dlmdomain.c |   12 ++++--------
> >>  1 files changed, 4 insertions(+), 8 deletions(-)
> >>
> >>diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
> >>index ab82add..754baf2 100644
> >>--- a/fs/ocfs2/dlm/dlmdomain.c
> >>+++ b/fs/ocfs2/dlm/dlmdomain.c
> >>@@ -321,28 +321,24 @@ static void dlm_ctxt_release(struct kref *kref)
> >>
> >>  	dlm = container_of(kref, struct dlm_ctxt, dlm_refs);
> >>
> >>+	if (spin_is_locked(&dlm->spinlock))
> >>+		BUG();
> >>  	BUG_ON(dlm->num_joins);
> >>  	BUG_ON(dlm->dlm_state == DLM_CTXT_JOINED);
> >>
> >>+	spin_lock(&dlm_domain_lock);
> >>  	/* we may still be in the list if we hit an error during join. */
> >>  	list_del_init(&dlm->list);
> >>-
> >>  	spin_unlock(&dlm_domain_lock);
> >>
> >>-	mlog(0, "freeing memory from domain %s\n", dlm->name);
> >>-
> >>  	wake_up(&dlm_domain_events);
> >>-
> >>+	mlog(0, "freeing memory from domain %s\n", dlm->name);
> >>  	dlm_free_ctxt_mem(dlm);
> >>-
> >>-	spin_lock(&dlm_domain_lock);
> >>  }
> >>
> >>  void dlm_put(struct dlm_ctxt *dlm)
> >>  {
> >>-	spin_lock(&dlm_domain_lock);
> >>  	kref_put(&dlm->dlm_refs, dlm_ctxt_release);
> >>-	spin_unlock(&dlm_domain_lock);
> >>  }
> >>
> >>  static void __dlm_get(struct dlm_ctxt *dlm)
> >>--
> >>1.6.6.1
> >>
> >>
> >>_______________________________________________
> >>Ocfs2-devel mailing list
> >>Ocfs2-devel at oss.oracle.com
> >>http://oss.oracle.com/mailman/listinfo/ocfs2-devel
> >
> >_______________________________________________
> >Ocfs2-devel mailing list
> >Ocfs2-devel at oss.oracle.com
> >http://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-06-21 13:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-16  6:52 [Ocfs2-devel] [PATCH] ocfs2/dlm: remove potential deadlock Wengang Wang
2010-06-21  5:31 ` Wengang Wang
2010-06-21  7:33   ` Tao Ma
2010-06-21 13:44     ` Wengang Wang
2010-06-21 13:20 ` Wengang Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).