public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] slow-work: add (module*)work->owner to fix races with module clients
@ 2009-06-24 19:16 Gregory Haskins
  0 siblings, 0 replies; 7+ messages in thread
From: Gregory Haskins @ 2009-06-24 19:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: dhowells

(Applies to Linus' git master:626f380d)

Hi All,
  I found this while working on KVM.  I actually posted this patch with a KVM
series yesterday, but it seems to have not made it to the lists.  So this
is a repost with the patch by itself, and rebased to Linus' tree instead
of kvm.git.

Thoughts?

Regards,
-Greg

-----------------------------

slow-work: add (module*)work->owner to fix races with module clients

The slow_work facility was designed to use reference counting instead of
barriers for synchronization.  The reference counting mechanism is
implemented as a vtable op (->get_ref, ->put_ref) callback.  This is
problematic for module use of the slow_work facility because it is impossible
to synchronize against the .text installed in the callbacks: There is
no way to ensure that the slow-work threads have completely exited the
text in question and rmmod may yank it out from under the slow_work thread.

This patch attempts to address this issue by transparently mapping "struct
module* owner" to the slow_work item, and maintaining a module reference
count coincident with the more externally visible reference count.  Since
the slow_work facility is resident in kernel, it should be a race-free
location to issue a module_put() call.  This will ensure that modules
can properly cleanup before exiting.

A module_get()/module_put() pair on slow_work_enqueue() and the subsequent
dequeue technically adds the overhead of the atomic operations for every
work item scheduled.  However, slow_work is designed for deferring
relatively long-running and/or sleepy tasks to begin with, so this
overhead will hopefully be negligible.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
CC: David Howells <dhowells@redhat.com>
---

 include/linux/slow-work.h |    4 ++++
 kernel/slow-work.c        |    6 ++++++
 2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/include/linux/slow-work.h b/include/linux/slow-work.h
index b65c888..9f48dab 100644
--- a/include/linux/slow-work.h
+++ b/include/linux/slow-work.h
@@ -17,6 +17,7 @@
 #ifdef CONFIG_SLOW_WORK
 
 #include <linux/sysctl.h>
+#include <linux/module.h>
 
 struct slow_work;
 
@@ -42,6 +43,7 @@ struct slow_work_ops {
  *   queued
  */
 struct slow_work {
+	struct module          *owner;
 	unsigned long		flags;
 #define SLOW_WORK_PENDING	0	/* item pending (further) execution */
 #define SLOW_WORK_EXECUTING	1	/* item currently executing */
@@ -61,6 +63,7 @@ struct slow_work {
 static inline void slow_work_init(struct slow_work *work,
 				  const struct slow_work_ops *ops)
 {
+	work->owner = THIS_MODULE;
 	work->flags = 0;
 	work->ops = ops;
 	INIT_LIST_HEAD(&work->link);
@@ -78,6 +81,7 @@ static inline void slow_work_init(struct slow_work *work,
 static inline void vslow_work_init(struct slow_work *work,
 				   const struct slow_work_ops *ops)
 {
+	work->owner = THIS_MODULE;
 	work->flags = 1 << SLOW_WORK_VERY_SLOW;
 	work->ops = ops;
 	INIT_LIST_HEAD(&work->link);
diff --git a/kernel/slow-work.c b/kernel/slow-work.c
index 09d7519..1dc3486 100644
--- a/kernel/slow-work.c
+++ b/kernel/slow-work.c
@@ -220,6 +220,8 @@ static bool slow_work_execute(void)
 	}
 
 	work->ops->put_ref(work);
+	barrier(); /* ensure that put_ref is not re-ordered with module_put */
+	module_put(work->owner);
 	return true;
 
 auto_requeue:
@@ -299,6 +301,8 @@ int slow_work_enqueue(struct slow_work *work)
 		if (test_bit(SLOW_WORK_EXECUTING, &work->flags)) {
 			set_bit(SLOW_WORK_ENQ_DEFERRED, &work->flags);
 		} else {
+			if (!try_module_get(work->owner))
+				goto cant_get_mod;
 			if (work->ops->get_ref(work) < 0)
 				goto cant_get_ref;
 			if (test_bit(SLOW_WORK_VERY_SLOW, &work->flags))
@@ -313,6 +317,8 @@ int slow_work_enqueue(struct slow_work *work)
 	return 0;
 
 cant_get_ref:
+	module_put(work->owner);
+cant_get_mod:
 	spin_unlock_irqrestore(&slow_work_queue_lock, flags);
 	return -EAGAIN;
 }


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH] slow-work: add (module*)work->owner to fix races with module clients
@ 2009-06-24 19:28 Gregory Haskins
  0 siblings, 0 replies; 7+ messages in thread
From: Gregory Haskins @ 2009-06-24 19:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: dhowells

(Applies to Linus' git master:626f380d)

Hi All,
  I found this while working on KVM.  I actually posted this patch with a KVM
series yesterday, but it seems to have not made it to the lists.  So this
is a repost with the patch by itself, and rebased to Linus' tree instead
of kvm.git.

Thoughts?

Regards,
-Greg

-----------------------------

slow-work: add (module*)work->owner to fix races with module clients

The slow_work facility was designed to use reference counting instead of
barriers for synchronization.  The reference counting mechanism is
implemented as a vtable op (->get_ref, ->put_ref) callback.  This is
problematic for module use of the slow_work facility because it is impossible
to synchronize against the .text installed in the callbacks: There is
no way to ensure that the slow-work threads have completely exited the
text in question and rmmod may yank it out from under the slow_work thread.

This patch attempts to address this issue by transparently mapping "struct
module* owner" to the slow_work item, and maintaining a module reference
count coincident with the more externally visible reference count.  Since
the slow_work facility is resident in kernel, it should be a race-free
location to issue a module_put() call.  This will ensure that modules
can properly cleanup before exiting.

A module_get()/module_put() pair on slow_work_enqueue() and the subsequent
dequeue technically adds the overhead of the atomic operations for every
work item scheduled.  However, slow_work is designed for deferring
relatively long-running and/or sleepy tasks to begin with, so this
overhead will hopefully be negligible.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
CC: David Howells <dhowells@redhat.com>
---

 include/linux/slow-work.h |    4 ++++
 kernel/slow-work.c        |    6 ++++++
 2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/include/linux/slow-work.h b/include/linux/slow-work.h
index b65c888..9f48dab 100644
--- a/include/linux/slow-work.h
+++ b/include/linux/slow-work.h
@@ -17,6 +17,7 @@
 #ifdef CONFIG_SLOW_WORK
 
 #include <linux/sysctl.h>
+#include <linux/module.h>
 
 struct slow_work;
 
@@ -42,6 +43,7 @@ struct slow_work_ops {
  *   queued
  */
 struct slow_work {
+	struct module          *owner;
 	unsigned long		flags;
 #define SLOW_WORK_PENDING	0	/* item pending (further) execution */
 #define SLOW_WORK_EXECUTING	1	/* item currently executing */
@@ -61,6 +63,7 @@ struct slow_work {
 static inline void slow_work_init(struct slow_work *work,
 				  const struct slow_work_ops *ops)
 {
+	work->owner = THIS_MODULE;
 	work->flags = 0;
 	work->ops = ops;
 	INIT_LIST_HEAD(&work->link);
@@ -78,6 +81,7 @@ static inline void slow_work_init(struct slow_work *work,
 static inline void vslow_work_init(struct slow_work *work,
 				   const struct slow_work_ops *ops)
 {
+	work->owner = THIS_MODULE;
 	work->flags = 1 << SLOW_WORK_VERY_SLOW;
 	work->ops = ops;
 	INIT_LIST_HEAD(&work->link);
diff --git a/kernel/slow-work.c b/kernel/slow-work.c
index 09d7519..1dc3486 100644
--- a/kernel/slow-work.c
+++ b/kernel/slow-work.c
@@ -220,6 +220,8 @@ static bool slow_work_execute(void)
 	}
 
 	work->ops->put_ref(work);
+	barrier(); /* ensure that put_ref is not re-ordered with module_put */
+	module_put(work->owner);
 	return true;
 
 auto_requeue:
@@ -299,6 +301,8 @@ int slow_work_enqueue(struct slow_work *work)
 		if (test_bit(SLOW_WORK_EXECUTING, &work->flags)) {
 			set_bit(SLOW_WORK_ENQ_DEFERRED, &work->flags);
 		} else {
+			if (!try_module_get(work->owner))
+				goto cant_get_mod;
 			if (work->ops->get_ref(work) < 0)
 				goto cant_get_ref;
 			if (test_bit(SLOW_WORK_VERY_SLOW, &work->flags))
@@ -313,6 +317,8 @@ int slow_work_enqueue(struct slow_work *work)
 	return 0;
 
 cant_get_ref:
+	module_put(work->owner);
+cant_get_mod:
 	spin_unlock_irqrestore(&slow_work_queue_lock, flags);
 	return -EAGAIN;
 }


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH] slow-work: add (module*)work->owner to fix races with module clients
@ 2009-06-24 19:33 Gregory Haskins
  2009-06-24 19:42 ` Gregory Haskins
  2009-06-24 21:23 ` David Howells
  0 siblings, 2 replies; 7+ messages in thread
From: Gregory Haskins @ 2009-06-24 19:33 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org, dhowells

[-- Attachment #1: Type: text/plain, Size: 4360 bytes --]

(Applies to Linus' git master:626f380d)

Hi All,
  I found this while working on KVM.  I actually posted this patch with
a KVM
series yesterday and standalone earlier today, but neither seems to have
made it to the lists.  I suspect there is an issue with git-mail/postfix
on my system.

I digress. This is a repost with the patch by itself, and rebased to
Linus' tree instead of kvm.git.  Apologies if the system finally
corrects itself and the others show up.

Thoughts?

Regards,
-Greg

-----------------------------

slow-work: add (module*)work->owner to fix races with module clients

The slow_work facility was designed to use reference counting instead of
barriers for synchronization.  The reference counting mechanism is
implemented as a vtable op (->get_ref, ->put_ref) callback.  This is
problematic for module use of the slow_work facility because it is
impossible
to synchronize against the .text installed in the callbacks: There is
no way to ensure that the slow-work threads have completely exited the
.text in question and rmmod may yank it out from under the slow_work thread.

This patch attempts to address this issue by transparently mapping "struct
module* owner" to the slow_work item, and maintaining a module reference
count coincident with the more externally visible reference count.  Since
the slow_work facility is resident in kernel, it should be a race-free
location to issue a module_put() call.  This will ensure that modules
can properly cleanup before exiting.

A module_get()/module_put() pair on slow_work_enqueue() and the subsequent
dequeue technically adds the overhead of the atomic operations for every
work item scheduled.  However, slow_work is designed for deferring
relatively long-running and/or sleepy tasks to begin with, so this
overhead will hopefully be negligible.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
CC: David Howells <dhowells@redhat.com>
---

 include/linux/slow-work.h |    4 ++++
 kernel/slow-work.c        |    6 ++++++
 2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/include/linux/slow-work.h b/include/linux/slow-work.h
index b65c888..9f48dab 100644
--- a/include/linux/slow-work.h
+++ b/include/linux/slow-work.h
@@ -17,6 +17,7 @@
 #ifdef CONFIG_SLOW_WORK
 
 #include <linux/sysctl.h>
+#include <linux/module.h>
 
 struct slow_work;
 
@@ -42,6 +43,7 @@ struct slow_work_ops {
  *   queued
  */
 struct slow_work {
+    struct module          *owner;
     unsigned long        flags;
 #define SLOW_WORK_PENDING    0    /* item pending (further) execution */
 #define SLOW_WORK_EXECUTING    1    /* item currently executing */
@@ -61,6 +63,7 @@ struct slow_work {
 static inline void slow_work_init(struct slow_work *work,
                   const struct slow_work_ops *ops)
 {
+    work->owner = THIS_MODULE;
     work->flags = 0;
     work->ops = ops;
     INIT_LIST_HEAD(&work->link);
@@ -78,6 +81,7 @@ static inline void slow_work_init(struct slow_work *work,
 static inline void vslow_work_init(struct slow_work *work,
                    const struct slow_work_ops *ops)
 {
+    work->owner = THIS_MODULE;
     work->flags = 1 << SLOW_WORK_VERY_SLOW;
     work->ops = ops;
     INIT_LIST_HEAD(&work->link);
diff --git a/kernel/slow-work.c b/kernel/slow-work.c
index 09d7519..1dc3486 100644
--- a/kernel/slow-work.c
+++ b/kernel/slow-work.c
@@ -220,6 +220,8 @@ static bool slow_work_execute(void)
     }
 
     work->ops->put_ref(work);
+    barrier(); /* ensure that put_ref is not re-ordered with module_put */
+    module_put(work->owner);
     return true;
 
 auto_requeue:
@@ -299,6 +301,8 @@ int slow_work_enqueue(struct slow_work *work)
         if (test_bit(SLOW_WORK_EXECUTING, &work->flags)) {
             set_bit(SLOW_WORK_ENQ_DEFERRED, &work->flags);
         } else {
+            if (!try_module_get(work->owner))
+                goto cant_get_mod;
             if (work->ops->get_ref(work) < 0)
                 goto cant_get_ref;
             if (test_bit(SLOW_WORK_VERY_SLOW, &work->flags))
@@ -313,6 +317,8 @@ int slow_work_enqueue(struct slow_work *work)
     return 0;
 
 cant_get_ref:
+    module_put(work->owner);
+cant_get_mod:
     spin_unlock_irqrestore(&slow_work_queue_lock, flags);
     return -EAGAIN;
 }



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] slow-work: add (module*)work->owner to fix races with module clients
  2009-06-24 19:33 Gregory Haskins
@ 2009-06-24 19:42 ` Gregory Haskins
  2009-06-24 19:45   ` Gregory Haskins
  2009-06-24 21:23 ` David Howells
  1 sibling, 1 reply; 7+ messages in thread
From: Gregory Haskins @ 2009-06-24 19:42 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org; +Cc: dhowells

[-- Attachment #1: Type: text/plain, Size: 5247 bytes --]

Gregory Haskins wrote:
> (Applies to Linus' git master:626f380d)
>
> Hi All,
>   I found this while working on KVM.  I actually posted this patch with
> a KVM
> series yesterday and standalone earlier today, but neither seems to have
> made it to the lists.  I suspect there is an issue with git-mail/postfix
> on my system.
>
> I digress. This is a repost with the patch by itself, and rebased to
> Linus' tree instead of kvm.git.  Apologies if the system finally
> corrects itself and the others show up.
>
> Thoughts?
>
> Regards,
> -Greg
>
> -----------------------------
>
> slow-work: add (module*)work->owner to fix races with module clients
>
> The slow_work facility was designed to use reference counting instead of
> barriers for synchronization.  The reference counting mechanism is
> implemented as a vtable op (->get_ref, ->put_ref) callback.  This is
> problematic for module use of the slow_work facility because it is
> impossible
> to synchronize against the .text installed in the callbacks: There is
> no way to ensure that the slow-work threads have completely exited the
> .text in question and rmmod may yank it out from under the slow_work thread.
>
> This patch attempts to address this issue by transparently mapping "struct
> module* owner" to the slow_work item, and maintaining a module reference
> count coincident with the more externally visible reference count.  Since
> the slow_work facility is resident in kernel, it should be a race-free
> location to issue a module_put() call.  This will ensure that modules
> can properly cleanup before exiting.
>
> A module_get()/module_put() pair on slow_work_enqueue() and the subsequent
> dequeue technically adds the overhead of the atomic operations for every
> work item scheduled.  However, slow_work is designed for deferring
> relatively long-running and/or sleepy tasks to begin with, so this
> overhead will hopefully be negligible.
>
> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
> CC: David Howells <dhowells@redhat.com>
> ---
>
>  include/linux/slow-work.h |    4 ++++
>  kernel/slow-work.c        |    6 ++++++
>  2 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/slow-work.h b/include/linux/slow-work.h
> index b65c888..9f48dab 100644
> --- a/include/linux/slow-work.h
> +++ b/include/linux/slow-work.h
> @@ -17,6 +17,7 @@
>  #ifdef CONFIG_SLOW_WORK
>  
>  #include <linux/sysctl.h>
> +#include <linux/module.h>
>  
>  struct slow_work;
>  
> @@ -42,6 +43,7 @@ struct slow_work_ops {
>   *   queued
>   */
>  struct slow_work {
> +    struct module          *owner;
>      unsigned long        flags;
>  #define SLOW_WORK_PENDING    0    /* item pending (further) execution */
>  #define SLOW_WORK_EXECUTING    1    /* item currently executing */
> @@ -61,6 +63,7 @@ struct slow_work {
>  static inline void slow_work_init(struct slow_work *work,
>                    const struct slow_work_ops *ops)
>  {
> +    work->owner = THIS_MODULE;
>      work->flags = 0;
>      work->ops = ops;
>      INIT_LIST_HEAD(&work->link);
> @@ -78,6 +81,7 @@ static inline void slow_work_init(struct slow_work *work,
>  static inline void vslow_work_init(struct slow_work *work,
>                     const struct slow_work_ops *ops)
>  {
> +    work->owner = THIS_MODULE;
>      work->flags = 1 << SLOW_WORK_VERY_SLOW;
>      work->ops = ops;
>      INIT_LIST_HEAD(&work->link);
> diff --git a/kernel/slow-work.c b/kernel/slow-work.c
> index 09d7519..1dc3486 100644
> --- a/kernel/slow-work.c
> +++ b/kernel/slow-work.c
> @@ -220,6 +220,8 @@ static bool slow_work_execute(void)
>      }
>  
>      work->ops->put_ref(work);
>   

On this front: I also wonder if this put_ref is racing since we cannot
guarantee pointer stability if
the object is kfree'd as a result of dropping the last ref.  I do not
know enough about compilers to say whether work or work->ops
invalidation would cause problems with the call-return, but it seems
dangerous at best.  An alternative might be to copy the put_ref pointer
prior to the call.  Something like

    slowwork_putref_t put_ref = work->ops->put_ref;
    ....
    put_ref(work);

might be better.  However, I am not sure if it really matters so I did
not address this issue yet.

-Greg


> +    barrier(); /* ensure that put_ref is not re-ordered with module_put */
> +    module_put(work->owner);
>      return true;
>  
>  auto_requeue:
> @@ -299,6 +301,8 @@ int slow_work_enqueue(struct slow_work *work)
>          if (test_bit(SLOW_WORK_EXECUTING, &work->flags)) {
>              set_bit(SLOW_WORK_ENQ_DEFERRED, &work->flags);
>          } else {
> +            if (!try_module_get(work->owner))
> +                goto cant_get_mod;
>              if (work->ops->get_ref(work) < 0)
>                  goto cant_get_ref;
>              if (test_bit(SLOW_WORK_VERY_SLOW, &work->flags))
> @@ -313,6 +317,8 @@ int slow_work_enqueue(struct slow_work *work)
>      return 0;
>  
>  cant_get_ref:
> +    module_put(work->owner);
> +cant_get_mod:
>      spin_unlock_irqrestore(&slow_work_queue_lock, flags);
>      return -EAGAIN;
>  }
>
>
>   



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] slow-work: add (module*)work->owner to fix races with module clients
  2009-06-24 19:42 ` Gregory Haskins
@ 2009-06-24 19:45   ` Gregory Haskins
  0 siblings, 0 replies; 7+ messages in thread
From: Gregory Haskins @ 2009-06-24 19:45 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org; +Cc: dhowells

[-- Attachment #1: Type: text/plain, Size: 5576 bytes --]

Gregory Haskins wrote:
> Gregory Haskins wrote:
>   
>> (Applies to Linus' git master:626f380d)
>>
>> Hi All,
>>   I found this while working on KVM.  I actually posted this patch with
>> a KVM
>> series yesterday and standalone earlier today, but neither seems to have
>> made it to the lists.  I suspect there is an issue with git-mail/postfix
>> on my system.
>>
>> I digress. This is a repost with the patch by itself, and rebased to
>> Linus' tree instead of kvm.git.  Apologies if the system finally
>> corrects itself and the others show up.
>>
>> Thoughts?
>>
>> Regards,
>> -Greg
>>
>> -----------------------------
>>
>> slow-work: add (module*)work->owner to fix races with module clients
>>
>> The slow_work facility was designed to use reference counting instead of
>> barriers for synchronization.  The reference counting mechanism is
>> implemented as a vtable op (->get_ref, ->put_ref) callback.  This is
>> problematic for module use of the slow_work facility because it is
>> impossible
>> to synchronize against the .text installed in the callbacks: There is
>> no way to ensure that the slow-work threads have completely exited the
>> .text in question and rmmod may yank it out from under the slow_work thread.
>>
>> This patch attempts to address this issue by transparently mapping "struct
>> module* owner" to the slow_work item, and maintaining a module reference
>> count coincident with the more externally visible reference count.  Since
>> the slow_work facility is resident in kernel, it should be a race-free
>> location to issue a module_put() call.  This will ensure that modules
>> can properly cleanup before exiting.
>>
>> A module_get()/module_put() pair on slow_work_enqueue() and the subsequent
>> dequeue technically adds the overhead of the atomic operations for every
>> work item scheduled.  However, slow_work is designed for deferring
>> relatively long-running and/or sleepy tasks to begin with, so this
>> overhead will hopefully be negligible.
>>
>> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
>> CC: David Howells <dhowells@redhat.com>
>> ---
>>
>>  include/linux/slow-work.h |    4 ++++
>>  kernel/slow-work.c        |    6 ++++++
>>  2 files changed, 10 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/slow-work.h b/include/linux/slow-work.h
>> index b65c888..9f48dab 100644
>> --- a/include/linux/slow-work.h
>> +++ b/include/linux/slow-work.h
>> @@ -17,6 +17,7 @@
>>  #ifdef CONFIG_SLOW_WORK
>>  
>>  #include <linux/sysctl.h>
>> +#include <linux/module.h>
>>  
>>  struct slow_work;
>>  
>> @@ -42,6 +43,7 @@ struct slow_work_ops {
>>   *   queued
>>   */
>>  struct slow_work {
>> +    struct module          *owner;
>>      unsigned long        flags;
>>  #define SLOW_WORK_PENDING    0    /* item pending (further) execution */
>>  #define SLOW_WORK_EXECUTING    1    /* item currently executing */
>> @@ -61,6 +63,7 @@ struct slow_work {
>>  static inline void slow_work_init(struct slow_work *work,
>>                    const struct slow_work_ops *ops)
>>  {
>> +    work->owner = THIS_MODULE;
>>      work->flags = 0;
>>      work->ops = ops;
>>      INIT_LIST_HEAD(&work->link);
>> @@ -78,6 +81,7 @@ static inline void slow_work_init(struct slow_work *work,
>>  static inline void vslow_work_init(struct slow_work *work,
>>                     const struct slow_work_ops *ops)
>>  {
>> +    work->owner = THIS_MODULE;
>>      work->flags = 1 << SLOW_WORK_VERY_SLOW;
>>      work->ops = ops;
>>      INIT_LIST_HEAD(&work->link);
>> diff --git a/kernel/slow-work.c b/kernel/slow-work.c
>> index 09d7519..1dc3486 100644
>> --- a/kernel/slow-work.c
>> +++ b/kernel/slow-work.c
>> @@ -220,6 +220,8 @@ static bool slow_work_execute(void)
>>      }
>>  
>>      work->ops->put_ref(work);
>>   
>>     
>
> On this front: I also wonder if this put_ref is racing since we cannot
> guarantee pointer stability if
> the object is kfree'd as a result of dropping the last ref.  I do not
> know enough about compilers to say whether work or work->ops
> invalidation would cause problems with the call-return, but it seems
> dangerous at best.  An alternative might be to copy the put_ref pointer
> prior to the call.  Something like
>
>     slowwork_putref_t put_ref = work->ops->put_ref;
>     ....
>     put_ref(work);
>
> might be better.  However, I am not sure if it really matters so I did
> not address this issue yet.
>
> -Greg
>
>
>   
>> +    barrier(); /* ensure that put_ref is not re-ordered with module_put */
>> +    module_put(work->owner);
>>     

Ugg..speaking of using invalidated pointers!  I need to cache "owner"
here as well.

-Greg
>>      return true;
>>  
>>  auto_requeue:
>> @@ -299,6 +301,8 @@ int slow_work_enqueue(struct slow_work *work)
>>          if (test_bit(SLOW_WORK_EXECUTING, &work->flags)) {
>>              set_bit(SLOW_WORK_ENQ_DEFERRED, &work->flags);
>>          } else {
>> +            if (!try_module_get(work->owner))
>> +                goto cant_get_mod;
>>              if (work->ops->get_ref(work) < 0)
>>                  goto cant_get_ref;
>>              if (test_bit(SLOW_WORK_VERY_SLOW, &work->flags))
>> @@ -313,6 +317,8 @@ int slow_work_enqueue(struct slow_work *work)
>>      return 0;
>>  
>>  cant_get_ref:
>> +    module_put(work->owner);
>> +cant_get_mod:
>>      spin_unlock_irqrestore(&slow_work_queue_lock, flags);
>>      return -EAGAIN;
>>  }
>>
>>
>>   
>>     
>
>
>   



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] slow-work: add (module*)work->owner to fix races with module clients
  2009-06-24 19:33 Gregory Haskins
  2009-06-24 19:42 ` Gregory Haskins
@ 2009-06-24 21:23 ` David Howells
  2009-06-24 22:02   ` Gregory Haskins
  1 sibling, 1 reply; 7+ messages in thread
From: David Howells @ 2009-06-24 21:23 UTC (permalink / raw)
  To: Gregory Haskins; +Cc: dhowells, linux-kernel@vger.kernel.org

Gregory Haskins <ghaskins@novell.com> wrote:

>   I found this while working on KVM.  I actually posted this patch with
> a KVM
> series yesterday and standalone earlier today, but neither seems to have
> made it to the lists.  I suspect there is an issue with git-mail/postfix
> on my system.

Also, your mail client has damaged the whitespace in the patch.

>  struct slow_work {
> +    struct module          *owner;

Can you add it to slow_work_ops instead?

>      work->ops->put_ref(work);
> +    barrier(); /* ensure that put_ref is not re-ordered with module_put =
> */
> +    module_put(work->owner);

Ummm...  Can it be?  module_put() and put_ref() are both out of line - surely
the compiler isn't allowed to reorder them?  If it's the CPU doing it then
barrier() isn't going to save you.

Note, however, that work may not be dereferenced like this after put_ref() is
called, unless you're sure that there's still a reference outstanding.

> +            if (!try_module_get(work->owner))
> +                goto cant_get_mod;

Note that this may result in a module getting stuck in unloading.  It may need
to do some work to complete the unload, and this will prevent that.

A better way might be to have put_ref() return, say, a pointer to a completion
struct, and if not NULL, have the caller of put_ref() call complete() on it.
That way you don't need to touch the module count, but can have something in
put_ref() keep track of when the last object is released and have its caller
invoke a completion to celebrate this fact.

David

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] slow-work: add (module*)work->owner to fix races with module clients
  2009-06-24 21:23 ` David Howells
@ 2009-06-24 22:02   ` Gregory Haskins
  0 siblings, 0 replies; 7+ messages in thread
From: Gregory Haskins @ 2009-06-24 22:02 UTC (permalink / raw)
  To: David Howells; +Cc: linux-kernel@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2831 bytes --]

David Howells wrote:
> Gregory Haskins <ghaskins@novell.com> wrote:
>
>   
>>   I found this while working on KVM.  I actually posted this patch with
>> a KVM
>> series yesterday and standalone earlier today, but neither seems to have
>> made it to the lists.  I suspect there is an issue with git-mail/postfix
>> on my system.
>>     
>
> Also, your mail client has damaged the whitespace in the patch.
>   

Yeah, sorry about that.  When git-mail was failing I cut-n-pasted into
thunderbird and it munged it a bit.  v2 should be better as it came out
of git directly after I fixed the postfix misconfig.

>   
>>  struct slow_work {
>> +    struct module          *owner;
>>     
>
> Can you add it to slow_work_ops instead?
>   

Yeah, that makes sense.
>   
>>      work->ops->put_ref(work);
>> +    barrier(); /* ensure that put_ref is not re-ordered with module_put =
>> */
>> +    module_put(work->owner);
>>     
>
> Ummm...  Can it be?  module_put() and put_ref() are both out of line - surely
> the compiler isn't allowed to reorder them?  If it's the CPU doing it then
> barrier() isn't going to save you.
>   

Good point.  I added that at the last minute without engaging my brain.
:) Will remove.

> Note, however, that work may not be dereferenced like this after put_ref() is
> called, unless you're sure that there's still a reference outstanding.
>
>   
Yeah, I noticed that too immediately after sending.  It should be better
in v2 (which should be in your inbox already)

>> +            if (!try_module_get(work->owner))
>> +                goto cant_get_mod;
>>     
>
> Note that this may result in a module getting stuck in unloading.  It may need
> to do some work to complete the unload, and this will prevent that.
>   

Can we set the stake in the ground that you can only call
slow_work_enqueue() from a module if you know that there is at least one
reference to the module being held?  This seems like a core requirement
anyway.

The follow up question would be: if so, should we use __module_get()
instead ot try_module_get() to annotate that (in addition to a comment,
of course).

> A better way might be to have put_ref() return, say, a pointer to a completion
> struct, and if not NULL, have the caller of put_ref() call complete() on it.
> That way you don't need to touch the module count, but can have something in
> put_ref() keep track of when the last object is released and have its caller
> invoke a completion to celebrate this fact.
>   

That sounds interesting, but I am not sure if we would get into a
similar conundrum or be awkward to manage.  I am in a conf-call ATM so I
can't think clear enough to tell for sure. ;) Let me give it some
thought and get back to you, though.

Thanks David!
-Greg



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 266 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-06-24 22:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-24 19:16 [RFC PATCH] slow-work: add (module*)work->owner to fix races with module clients Gregory Haskins
  -- strict thread matches above, loose matches on Subject: below --
2009-06-24 19:28 Gregory Haskins
2009-06-24 19:33 Gregory Haskins
2009-06-24 19:42 ` Gregory Haskins
2009-06-24 19:45   ` Gregory Haskins
2009-06-24 21:23 ` David Howells
2009-06-24 22:02   ` Gregory Haskins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox