[PATCH 1/2] cpuhotplug: use rw_semaphore for cpu

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2] cpuhotplug: use rw_semaphore for cpu_hotplug
@ 2009-05-29  8:29 Lai Jiangshan
  2009-05-29 20:23 ` Andrew Morton
  2009-05-30  1:53 ` Paul E. McKenney
  0 siblings, 2 replies; 26+ messages in thread
From: Lai Jiangshan @ 2009-05-29  8:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rusty Russell, Ingo Molnar, Paul E. McKenney, LKML


Current get_online_cpus()/put_online_cpus() re-implement
a rw_semaphore, so it is converted to a real rw_semaphore in this fix.
It simplifies codes, and is good for read.

And misc fix:
1) Add comments for cpu_hotplug.active_writer.
2) The theoretical disadvantage described in cpu_hotplug_begin()'s
   comments is no longer existed when we use rw_semaphore,
   so this part of comments was removed.

[Impact: improve get_online_cpus()/put_online_cpus() ]

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 395b697..62198ec 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -14,6 +14,7 @@
 #include <linux/kthread.h>
 #include <linux/stop_machine.h>
 #include <linux/mutex.h>
+#include <linux/rwsem.h>
 
 #ifdef CONFIG_SMP
 /* Serializes the updates to cpu_online_mask, cpu_present_mask */
@@ -27,20 +28,21 @@ static __cpuinitdata RAW_NOTIFIER_HEAD(cpu_chain);
 static int cpu_hotplug_disabled;
 
 static struct {
-	struct task_struct *active_writer;
-	struct mutex lock; /* Synchronizes accesses to refcount, */
 	/*
-	 * Also blocks the new readers during
-	 * an ongoing cpu hotplug operation.
+	 * active_writer makes get_online_cpus()/put_online_cpus() are allowd
+	 * to be nested in cpu_hotplug_begin()/cpu_hotplug_done().
+	 *
+	 * Thus, get_online_cpus()/put_online_cpus() can be called in
+	 * CPU notifiers.
 	 */
-	int refcount;
+	struct task_struct *active_writer;
+	struct rw_semaphore rwlock;
 } cpu_hotplug;
 
 void __init cpu_hotplug_init(void)
 {
 	cpu_hotplug.active_writer = NULL;
-	mutex_init(&cpu_hotplug.lock);
-	cpu_hotplug.refcount = 0;
+	init_rwsem(&cpu_hotplug.rwlock);
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -50,9 +52,7 @@ void get_online_cpus(void)
 	might_sleep();
 	if (cpu_hotplug.active_writer == current)
 		return;
-	mutex_lock(&cpu_hotplug.lock);
-	cpu_hotplug.refcount++;
-	mutex_unlock(&cpu_hotplug.lock);
+	down_read(&cpu_hotplug.rwlock);
 
 }
 EXPORT_SYMBOL_GPL(get_online_cpus);
@@ -61,10 +61,7 @@ void put_online_cpus(void)
 {
 	if (cpu_hotplug.active_writer == current)
 		return;
-	mutex_lock(&cpu_hotplug.lock);
-	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
-		wake_up_process(cpu_hotplug.active_writer);
-	mutex_unlock(&cpu_hotplug.lock);
+	up_read(&cpu_hotplug.rwlock);
 
 }
 EXPORT_SYMBOL_GPL(put_online_cpus);
@@ -86,45 +83,25 @@ void cpu_maps_update_done(void)
 }
 
 /*
- * This ensures that the hotplug operation can begin only when the
- * refcount goes to zero.
+ * This ensures that the hotplug operation can begin only when
+ * there is no reader.
  *
  * Note that during a cpu-hotplug operation, the new readers, if any,
- * will be blocked by the cpu_hotplug.lock
+ * will be blocked by the cpu_hotplug.rwlock
  *
  * Since cpu_hotplug_begin() is always called after invoking
  * cpu_maps_update_begin(), we can be sure that only one writer is active.
- *
- * Note that theoretically, there is a possibility of a livelock:
- * - Refcount goes to zero, last reader wakes up the sleeping
- *   writer.
- * - Last reader unlocks the cpu_hotplug.lock.
- * - A new reader arrives at this moment, bumps up the refcount.
- * - The writer acquires the cpu_hotplug.lock finds the refcount
- *   non zero and goes to sleep again.
- *
- * However, this is very difficult to achieve in practice since
- * get_online_cpus() not an api which is called all that often.
- *
  */
 static void cpu_hotplug_begin(void)
 {
+	down_write(&cpu_hotplug.rwlock);
 	cpu_hotplug.active_writer = current;
-
-	for (;;) {
-		mutex_lock(&cpu_hotplug.lock);
-		if (likely(!cpu_hotplug.refcount))
-			break;
-		__set_current_state(TASK_UNINTERRUPTIBLE);
-		mutex_unlock(&cpu_hotplug.lock);
-		schedule();
-	}
 }
 
 static void cpu_hotplug_done(void)
 {
 	cpu_hotplug.active_writer = NULL;
-	mutex_unlock(&cpu_hotplug.lock);
+	up_write(&cpu_hotplug.rwlock);
 }
 /* Need to know about CPUs going up/down? */
 int __ref register_cpu_notifier(struct notifier_block *nb)



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] cpuhotplug: use rw_semaphore for cpu_hotplug
  2009-05-29  8:29 [PATCH 1/2] cpuhotplug: use rw_semaphore for cpu_hotplug Lai Jiangshan
@ 2009-05-29 20:23 ` Andrew Morton
  2009-05-29 21:07   ` Oleg Nesterov
  2009-05-30  1:53 ` Paul E. McKenney
  1 sibling, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2009-05-29 20:23 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: rusty, mingo, paulmck, linux-kernel, Oleg Nesterov,
	Linus Torvalds

On Fri, 29 May 2009 16:29:30 +0800
Lai Jiangshan <laijs@cn.fujitsu.com> wrote:

> 
> Current get_online_cpus()/put_online_cpus() re-implement
> a rw_semaphore,

It does appear that way.

> so it is converted to a real rw_semaphore in this fix.
> It simplifies codes, and is good for read.

It'd be a nice cleanup if it works.

> And misc fix:
> 1) Add comments for cpu_hotplug.active_writer.
> 2) The theoretical disadvantage described in cpu_hotplug_begin()'s
>    comments is no longer existed when we use rw_semaphore,
>    so this part of comments was removed.
> 
> [Impact: improve get_online_cpus()/put_online_cpus() ]

Unfortunately this code has been a large source of tricky problems.  I
bet that something nasty goes wrong if we change it :(

> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 395b697..62198ec 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -14,6 +14,7 @@
>  #include <linux/kthread.h>
>  #include <linux/stop_machine.h>
>  #include <linux/mutex.h>
> +#include <linux/rwsem.h>
>  
>  #ifdef CONFIG_SMP
>  /* Serializes the updates to cpu_online_mask, cpu_present_mask */
> @@ -27,20 +28,21 @@ static __cpuinitdata RAW_NOTIFIER_HEAD(cpu_chain);
>  static int cpu_hotplug_disabled;
>  
>  static struct {
> -	struct task_struct *active_writer;
> -	struct mutex lock; /* Synchronizes accesses to refcount, */
>  	/*
> -	 * Also blocks the new readers during
> -	 * an ongoing cpu hotplug operation.
> +	 * active_writer makes get_online_cpus()/put_online_cpus() are allowd
> +	 * to be nested in cpu_hotplug_begin()/cpu_hotplug_done().
> +	 *
> +	 * Thus, get_online_cpus()/put_online_cpus() can be called in
> +	 * CPU notifiers.
>  	 */
> -	int refcount;
> +	struct task_struct *active_writer;
> +	struct rw_semaphore rwlock;
>  } cpu_hotplug;
>  
>  void __init cpu_hotplug_init(void)
>  {
>  	cpu_hotplug.active_writer = NULL;
> -	mutex_init(&cpu_hotplug.lock);
> -	cpu_hotplug.refcount = 0;
> +	init_rwsem(&cpu_hotplug.rwlock);
>  }
>  
>  #ifdef CONFIG_HOTPLUG_CPU
> @@ -50,9 +52,7 @@ void get_online_cpus(void)
>  	might_sleep();
>  	if (cpu_hotplug.active_writer == current)
>  		return;
> -	mutex_lock(&cpu_hotplug.lock);
> -	cpu_hotplug.refcount++;
> -	mutex_unlock(&cpu_hotplug.lock);
> +	down_read(&cpu_hotplug.rwlock);
>  
>  }
>  EXPORT_SYMBOL_GPL(get_online_cpus);
> @@ -61,10 +61,7 @@ void put_online_cpus(void)
>  {
>  	if (cpu_hotplug.active_writer == current)
>  		return;
> -	mutex_lock(&cpu_hotplug.lock);
> -	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
> -		wake_up_process(cpu_hotplug.active_writer);
> -	mutex_unlock(&cpu_hotplug.lock);
> +	up_read(&cpu_hotplug.rwlock);
>  
>  }
>  EXPORT_SYMBOL_GPL(put_online_cpus);
> @@ -86,45 +83,25 @@ void cpu_maps_update_done(void)
>  }
>  
>  /*
> - * This ensures that the hotplug operation can begin only when the
> - * refcount goes to zero.
> + * This ensures that the hotplug operation can begin only when
> + * there is no reader.
>   *
>   * Note that during a cpu-hotplug operation, the new readers, if any,
> - * will be blocked by the cpu_hotplug.lock
> + * will be blocked by the cpu_hotplug.rwlock
>   *
>   * Since cpu_hotplug_begin() is always called after invoking
>   * cpu_maps_update_begin(), we can be sure that only one writer is active.
> - *
> - * Note that theoretically, there is a possibility of a livelock:
> - * - Refcount goes to zero, last reader wakes up the sleeping
> - *   writer.
> - * - Last reader unlocks the cpu_hotplug.lock.
> - * - A new reader arrives at this moment, bumps up the refcount.
> - * - The writer acquires the cpu_hotplug.lock finds the refcount
> - *   non zero and goes to sleep again.
> - *
> - * However, this is very difficult to achieve in practice since
> - * get_online_cpus() not an api which is called all that often.
> - *
>   */
>  static void cpu_hotplug_begin(void)
>  {
> +	down_write(&cpu_hotplug.rwlock);
>  	cpu_hotplug.active_writer = current;
> -
> -	for (;;) {
> -		mutex_lock(&cpu_hotplug.lock);
> -		if (likely(!cpu_hotplug.refcount))
> -			break;
> -		__set_current_state(TASK_UNINTERRUPTIBLE);
> -		mutex_unlock(&cpu_hotplug.lock);
> -		schedule();
> -	}
>  }
>  
>  static void cpu_hotplug_done(void)
>  {
>  	cpu_hotplug.active_writer = NULL;
> -	mutex_unlock(&cpu_hotplug.lock);
> +	up_write(&cpu_hotplug.rwlock);
>  }
>  /* Need to know about CPUs going up/down? */
>  int __ref register_cpu_notifier(struct notifier_block *nb)
> 

There are about 25 trees in linux-next which think they own
kernel/cpu.c and unfortunately one of them changed that file in a
relatively significant manner.  That patch ("cpuhotplug: remove
cpu_hotplug_init()") was writen by, err, you.

I could fix things up but it would be more effective were you to do
this, please.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] cpuhotplug: use rw_semaphore for cpu_hotplug
  2009-05-29 20:23 ` Andrew Morton
@ 2009-05-29 21:07   ` Oleg Nesterov
  2009-05-29 21:17     ` Oleg Nesterov
  2009-06-01  0:52     ` Lai Jiangshan
  0 siblings, 2 replies; 26+ messages in thread
From: Oleg Nesterov @ 2009-05-29 21:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Lai Jiangshan, rusty, mingo, paulmck, linux-kernel,
	Linus Torvalds

On 05/29, Andrew Morton wrote:
>
> On Fri, 29 May 2009 16:29:30 +0800
> Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
>
> >
> > Current get_online_cpus()/put_online_cpus() re-implement
> > a rw_semaphore,
> > so it is converted to a real rw_semaphore in this fix.
> > It simplifies codes, and is good for read.
>
> >  static struct {
> > -	struct task_struct *active_writer;
> > -	struct mutex lock; /* Synchronizes accesses to refcount, */
> >  	/*
> > -	 * Also blocks the new readers during
> > -	 * an ongoing cpu hotplug operation.
> > +	 * active_writer makes get_online_cpus()/put_online_cpus() are allowd
> > +	 * to be nested in cpu_hotplug_begin()/cpu_hotplug_done().
> > +	 *
> > +	 * Thus, get_online_cpus()/put_online_cpus() can be called in
> > +	 * CPU notifiers.
> >  	 */
> > -	int refcount;
> > +	struct task_struct *active_writer;
> > +	struct rw_semaphore rwlock;
> >  } cpu_hotplug;

But, afaics, down_write() blocks new readers.

This means that with this patch get_online_cpus() is not recursive, no?

Oleg.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] cpuhotplug: use rw_semaphore for cpu_hotplug
  2009-05-29 21:07   ` Oleg Nesterov
@ 2009-05-29 21:17     ` Oleg Nesterov
  2009-06-01  1:04       ` Lai Jiangshan
  2009-06-01  0:52     ` Lai Jiangshan
  1 sibling, 1 reply; 26+ messages in thread
From: Oleg Nesterov @ 2009-05-29 21:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Lai Jiangshan, rusty, mingo, paulmck, linux-kernel,
	Linus Torvalds

On 05/29, Oleg Nesterov wrote:
>
> On 05/29, Andrew Morton wrote:
> >
> > On Fri, 29 May 2009 16:29:30 +0800
> > Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> >
> > >
> > > Current get_online_cpus()/put_online_cpus() re-implement
> > > a rw_semaphore,
> > > so it is converted to a real rw_semaphore in this fix.
> > > It simplifies codes, and is good for read.
> >
> > >  static struct {
> > > -	struct task_struct *active_writer;
> > > -	struct mutex lock; /* Synchronizes accesses to refcount, */
> > >  	/*
> > > -	 * Also blocks the new readers during
> > > -	 * an ongoing cpu hotplug operation.
> > > +	 * active_writer makes get_online_cpus()/put_online_cpus() are allowd
> > > +	 * to be nested in cpu_hotplug_begin()/cpu_hotplug_done().
> > > +	 *
> > > +	 * Thus, get_online_cpus()/put_online_cpus() can be called in
> > > +	 * CPU notifiers.
> > >  	 */
> > > -	int refcount;
> > > +	struct task_struct *active_writer;
> > > +	struct rw_semaphore rwlock;
> > >  } cpu_hotplug;
>
> But, afaics, down_write() blocks new readers.
>
> This means that with this patch get_online_cpus() is not recursive, no?

And please note that the current code drops mutex when get_online_cpus()
succeeds. With your patch (if I read it correctly) the code under get_()
runs with cpu_hotplug->rwlock held for reading. I'm afraid this creates
the new possibilities for deadlocks.

Oleg.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] cpuhotplug: use rw_semaphore for cpu_hotplug
  2009-05-29 21:17     ` Oleg Nesterov
@ 2009-06-01  1:04       ` Lai Jiangshan
  0 siblings, 0 replies; 26+ messages in thread
From: Lai Jiangshan @ 2009-06-01  1:04 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, rusty, mingo, paulmck, linux-kernel,
	Linus Torvalds, Gautham R Shenoy

Oleg Nesterov wrote:
> On 05/29, Oleg Nesterov wrote:
>> On 05/29, Andrew Morton wrote:
>>> On Fri, 29 May 2009 16:29:30 +0800
>>> Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
>>>
>>>> Current get_online_cpus()/put_online_cpus() re-implement
>>>> a rw_semaphore,
>>>> so it is converted to a real rw_semaphore in this fix.
>>>> It simplifies codes, and is good for read.
>>>>  static struct {
>>>> -	struct task_struct *active_writer;
>>>> -	struct mutex lock; /* Synchronizes accesses to refcount, */
>>>>  	/*
>>>> -	 * Also blocks the new readers during
>>>> -	 * an ongoing cpu hotplug operation.
>>>> +	 * active_writer makes get_online_cpus()/put_online_cpus() are allowd
>>>> +	 * to be nested in cpu_hotplug_begin()/cpu_hotplug_done().
>>>> +	 *
>>>> +	 * Thus, get_online_cpus()/put_online_cpus() can be called in
>>>> +	 * CPU notifiers.
>>>>  	 */
>>>> -	int refcount;
>>>> +	struct task_struct *active_writer;
>>>> +	struct rw_semaphore rwlock;
>>>>  } cpu_hotplug;
>> But, afaics, down_write() blocks new readers.
>>
>> This means that with this patch get_online_cpus() is not recursive, no?
> 
> And please note that the current code drops mutex when get_online_cpus()
> succeeds. With your patch (if I read it correctly) the code under get_()
> runs with cpu_hotplug->rwlock held for reading. I'm afraid this creates
> the new possibilities for deadlocks.
> 

The current code drops mutex when get_online_cpus() succeeds, BUT it
increases the counter as what down_read() does. I think the current
code has the same deadlocks which the down_read()-implement has.

Since the current code use mutex + counter to implement a "down_read()",
why not use the down_read() directly?
And down_read() can be checked by lockdep.

Lai.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] cpuhotplug: use rw_semaphore for cpu_hotplug
  2009-05-29 21:07   ` Oleg Nesterov
  2009-05-29 21:17     ` Oleg Nesterov
@ 2009-06-01  0:52     ` Lai Jiangshan
  2009-06-01  2:22       ` Lai Jiangshan
  1 sibling, 1 reply; 26+ messages in thread
From: Lai Jiangshan @ 2009-06-01  0:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, rusty, mingo, paulmck, linux-kernel,
	Linus Torvalds, Gautham R Shenoy

Oleg Nesterov wrote:
> On 05/29, Andrew Morton wrote:
>> On Fri, 29 May 2009 16:29:30 +0800
>> Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
>>
>>> Current get_online_cpus()/put_online_cpus() re-implement
>>> a rw_semaphore,
>>> so it is converted to a real rw_semaphore in this fix.
>>> It simplifies codes, and is good for read.
>>>  static struct {
>>> -	struct task_struct *active_writer;
>>> -	struct mutex lock; /* Synchronizes accesses to refcount, */
>>>  	/*
>>> -	 * Also blocks the new readers during
>>> -	 * an ongoing cpu hotplug operation.
>>> +	 * active_writer makes get_online_cpus()/put_online_cpus() are allowd
>>> +	 * to be nested in cpu_hotplug_begin()/cpu_hotplug_done().
>>> +	 *
>>> +	 * Thus, get_online_cpus()/put_online_cpus() can be called in
>>> +	 * CPU notifiers.
>>>  	 */
>>> -	int refcount;
>>> +	struct task_struct *active_writer;
>>> +	struct rw_semaphore rwlock;
>>>  } cpu_hotplug;
> 
> But, afaics, down_write() blocks new readers.
> 
> This means that with this patch get_online_cpus() is not recursive, no?
> 

down_read()/up_read() can be nested within down_read()/up_read(),
so get_online_cpus() is recursive.

And thanks to cpu_hotplug.active_writer, get_online_cpus()/put_online_cpus()
are allowd to be nested in cpu_hotplug_begin()/cpu_hotplug_done().
So cpu_hotplug_begin() DO NOT blocks readers who are in CPU notifiers.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] cpuhotplug: use rw_semaphore for cpu_hotplug
  2009-06-01  0:52     ` Lai Jiangshan
@ 2009-06-01  2:22       ` Lai Jiangshan
  0 siblings, 0 replies; 26+ messages in thread
From: Lai Jiangshan @ 2009-06-01  2:22 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, rusty, mingo, paulmck, linux-kernel,
	Linus Torvalds, Gautham R Shenoy

Lai Jiangshan wrote:
> 
> down_read()/up_read() can be nested within down_read()/up_read(),
> so get_online_cpus() is recursive.
> 
> And thanks to cpu_hotplug.active_writer, get_online_cpus()/put_online_cpus()
> are allowd to be nested in cpu_hotplug_begin()/cpu_hotplug_done().
> So cpu_hotplug_begin() DO NOT blocks readers who are in CPU notifiers.
> 

Lai Jiangshan wrote:
> 
> The current code drops mutex when get_online_cpus() succeeds, BUT it
> increases the counter as what down_read() does. I think the current
> code has the same deadlocks which the down_read()-implement has.
> 
> Since the current code use mutex + counter to implement a "down_read()",
> why not use the down_read() directly?
> And down_read() can be checked by lockdep.
> 

Ouch, the kernel rw_semaphore is not Read-preference. All what I said
is garbage. I did miss this, sorry for bothered you all.

Lai


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] cpuhotplug: use rw_semaphore for cpu_hotplug
  2009-05-29  8:29 [PATCH 1/2] cpuhotplug: use rw_semaphore for cpu_hotplug Lai Jiangshan
  2009-05-29 20:23 ` Andrew Morton
@ 2009-05-30  1:53 ` Paul E. McKenney
  2009-05-30  4:37   ` Gautham R Shenoy
  1 sibling, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2009-05-30  1:53 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Andrew Morton, Rusty Russell, Ingo Molnar, LKML, ego

On Fri, May 29, 2009 at 04:29:30PM +0800, Lai Jiangshan wrote:
> 
> Current get_online_cpus()/put_online_cpus() re-implement
> a rw_semaphore, so it is converted to a real rw_semaphore in this fix.
> It simplifies codes, and is good for read.
> 
> And misc fix:
> 1) Add comments for cpu_hotplug.active_writer.
> 2) The theoretical disadvantage described in cpu_hotplug_begin()'s
>    comments is no longer existed when we use rw_semaphore,
>    so this part of comments was removed.
> 
> [Impact: improve get_online_cpus()/put_online_cpus() ]

Actually, it turns out that for my purposes it is only necessary to check:

	cpu_hotplug.active_writer != NULL

The only time that it is unsafe to invoke get_online_cpus() is when
in a notifier, and in that case the value of cpu_hotplug.active_writer
is stable.  There could be false positives, but these are harmless, as
the fallback is simply synchronize_sched().

Even this is only needed should the deadlock scenario you pointed out
arise in practice.

As Oleg noted, there are some "interesting" constraints on
get_online_cpus().  Adding Gautham Shenoy to CC for his views.

							Thanx, Paul

> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 395b697..62198ec 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -14,6 +14,7 @@
>  #include <linux/kthread.h>
>  #include <linux/stop_machine.h>
>  #include <linux/mutex.h>
> +#include <linux/rwsem.h>
> 
>  #ifdef CONFIG_SMP
>  /* Serializes the updates to cpu_online_mask, cpu_present_mask */
> @@ -27,20 +28,21 @@ static __cpuinitdata RAW_NOTIFIER_HEAD(cpu_chain);
>  static int cpu_hotplug_disabled;
> 
>  static struct {
> -	struct task_struct *active_writer;
> -	struct mutex lock; /* Synchronizes accesses to refcount, */
>  	/*
> -	 * Also blocks the new readers during
> -	 * an ongoing cpu hotplug operation.
> +	 * active_writer makes get_online_cpus()/put_online_cpus() are allowd
> +	 * to be nested in cpu_hotplug_begin()/cpu_hotplug_done().
> +	 *
> +	 * Thus, get_online_cpus()/put_online_cpus() can be called in
> +	 * CPU notifiers.
>  	 */
> -	int refcount;
> +	struct task_struct *active_writer;
> +	struct rw_semaphore rwlock;
>  } cpu_hotplug;
> 
>  void __init cpu_hotplug_init(void)
>  {
>  	cpu_hotplug.active_writer = NULL;
> -	mutex_init(&cpu_hotplug.lock);
> -	cpu_hotplug.refcount = 0;
> +	init_rwsem(&cpu_hotplug.rwlock);
>  }
> 
>  #ifdef CONFIG_HOTPLUG_CPU
> @@ -50,9 +52,7 @@ void get_online_cpus(void)
>  	might_sleep();
>  	if (cpu_hotplug.active_writer == current)
>  		return;
> -	mutex_lock(&cpu_hotplug.lock);
> -	cpu_hotplug.refcount++;
> -	mutex_unlock(&cpu_hotplug.lock);
> +	down_read(&cpu_hotplug.rwlock);
> 
>  }
>  EXPORT_SYMBOL_GPL(get_online_cpus);
> @@ -61,10 +61,7 @@ void put_online_cpus(void)
>  {
>  	if (cpu_hotplug.active_writer == current)
>  		return;
> -	mutex_lock(&cpu_hotplug.lock);
> -	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
> -		wake_up_process(cpu_hotplug.active_writer);
> -	mutex_unlock(&cpu_hotplug.lock);
> +	up_read(&cpu_hotplug.rwlock);
> 
>  }
>  EXPORT_SYMBOL_GPL(put_online_cpus);
> @@ -86,45 +83,25 @@ void cpu_maps_update_done(void)
>  }
> 
>  /*
> - * This ensures that the hotplug operation can begin only when the
> - * refcount goes to zero.
> + * This ensures that the hotplug operation can begin only when
> + * there is no reader.
>   *
>   * Note that during a cpu-hotplug operation, the new readers, if any,
> - * will be blocked by the cpu_hotplug.lock
> + * will be blocked by the cpu_hotplug.rwlock
>   *
>   * Since cpu_hotplug_begin() is always called after invoking
>   * cpu_maps_update_begin(), we can be sure that only one writer is active.
> - *
> - * Note that theoretically, there is a possibility of a livelock:
> - * - Refcount goes to zero, last reader wakes up the sleeping
> - *   writer.
> - * - Last reader unlocks the cpu_hotplug.lock.
> - * - A new reader arrives at this moment, bumps up the refcount.
> - * - The writer acquires the cpu_hotplug.lock finds the refcount
> - *   non zero and goes to sleep again.
> - *
> - * However, this is very difficult to achieve in practice since
> - * get_online_cpus() not an api which is called all that often.
> - *
>   */
>  static void cpu_hotplug_begin(void)
>  {
> +	down_write(&cpu_hotplug.rwlock);
>  	cpu_hotplug.active_writer = current;
> -
> -	for (;;) {
> -		mutex_lock(&cpu_hotplug.lock);
> -		if (likely(!cpu_hotplug.refcount))
> -			break;
> -		__set_current_state(TASK_UNINTERRUPTIBLE);
> -		mutex_unlock(&cpu_hotplug.lock);
> -		schedule();
> -	}
>  }
> 
>  static void cpu_hotplug_done(void)
>  {
>  	cpu_hotplug.active_writer = NULL;
> -	mutex_unlock(&cpu_hotplug.lock);
> +	up_write(&cpu_hotplug.rwlock);
>  }
>  /* Need to know about CPUs going up/down? */
>  int __ref register_cpu_notifier(struct notifier_block *nb)
> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] cpuhotplug: use rw_semaphore for cpu_hotplug
  2009-05-30  1:53 ` Paul E. McKenney
@ 2009-05-30  4:37   ` Gautham R Shenoy
  2009-06-04  6:58     ` [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2 Lai Jiangshan
  0 siblings, 1 reply; 26+ messages in thread
From: Gautham R Shenoy @ 2009-05-30  4:37 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Lai Jiangshan, Andrew Morton, Rusty Russell, Ingo Molnar, LKML

On Fri, May 29, 2009 at 06:53:42PM -0700, Paul E. McKenney wrote:
> On Fri, May 29, 2009 at 04:29:30PM +0800, Lai Jiangshan wrote:
> > 
> > Current get_online_cpus()/put_online_cpus() re-implement
> > a rw_semaphore, so it is converted to a real rw_semaphore in this fix.
> > It simplifies codes, and is good for read.
> > 
> > And misc fix:
> > 1) Add comments for cpu_hotplug.active_writer.
> > 2) The theoretical disadvantage described in cpu_hotplug_begin()'s
> >    comments is no longer existed when we use rw_semaphore,
> >    so this part of comments was removed.
> > 
> > [Impact: improve get_online_cpus()/put_online_cpus() ]
> 
> Actually, it turns out that for my purposes it is only necessary to check:
> 
> 	cpu_hotplug.active_writer != NULL
> 
> The only time that it is unsafe to invoke get_online_cpus() is when
> in a notifier, and in that case the value of cpu_hotplug.active_writer
> is stable.  There could be false positives, but these are harmless, as
> the fallback is simply synchronize_sched().
> 
> Even this is only needed should the deadlock scenario you pointed out
> arise in practice.
> 
> As Oleg noted, there are some "interesting" constraints on
> get_online_cpus().  Adding Gautham Shenoy to CC for his views.

So, to put it in a sentence, get_online_cpus()/put_online_cpus() is a
read-write semaphore with read-preference while allowing writer to
downgrade to a reader when required.

Read-preference was one of the ways of allowing unsuspecting functions
which need the protection against cpu-hotplug to end up seeking help of
functions which also need protection against cpu-hotplug. IOW allow a
single context to call get_online_cpus() without giving away to circular
deadlock. A fair reader-write lock wouldn't allow that since in the
presence of a write, the recursive reads would block, thereby causing a
deadlock.

Also, around the time when this design was chosen, we had a whole bunch
of functions which did try to take the original "cpu_hotplug_mutex"
recursively. We could do well to use Lai's implementation if such
functions have mended their ways since this would make it a lot simpler
:-) . But I suspect it is easier said than done!

BTW, I second the idea of try_get_online_cpus(). I had myself proposed
this idea a year back. http://lkml.org/lkml/2008/4/29/222.



> 
> 							Thanx, Paul
> 
> > Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> > ---
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index 395b697..62198ec 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -14,6 +14,7 @@
> >  #include <linux/kthread.h>
> >  #include <linux/stop_machine.h>
> >  #include <linux/mutex.h>
> > +#include <linux/rwsem.h>
> > 
> >  #ifdef CONFIG_SMP
> >  /* Serializes the updates to cpu_online_mask, cpu_present_mask */
> > @@ -27,20 +28,21 @@ static __cpuinitdata RAW_NOTIFIER_HEAD(cpu_chain);
> >  static int cpu_hotplug_disabled;
> > 
> >  static struct {
> > -	struct task_struct *active_writer;
> > -	struct mutex lock; /* Synchronizes accesses to refcount, */
> >  	/*
> > -	 * Also blocks the new readers during
> > -	 * an ongoing cpu hotplug operation.
> > +	 * active_writer makes get_online_cpus()/put_online_cpus() are allowd
> > +	 * to be nested in cpu_hotplug_begin()/cpu_hotplug_done().
> > +	 *
> > +	 * Thus, get_online_cpus()/put_online_cpus() can be called in
> > +	 * CPU notifiers.
> >  	 */
> > -	int refcount;
> > +	struct task_struct *active_writer;
> > +	struct rw_semaphore rwlock;
> >  } cpu_hotplug;
> > 
> >  void __init cpu_hotplug_init(void)
> >  {
> >  	cpu_hotplug.active_writer = NULL;
> > -	mutex_init(&cpu_hotplug.lock);
> > -	cpu_hotplug.refcount = 0;
> > +	init_rwsem(&cpu_hotplug.rwlock);
> >  }
> > 
> >  #ifdef CONFIG_HOTPLUG_CPU
> > @@ -50,9 +52,7 @@ void get_online_cpus(void)
> >  	might_sleep();
> >  	if (cpu_hotplug.active_writer == current)
> >  		return;
> > -	mutex_lock(&cpu_hotplug.lock);
> > -	cpu_hotplug.refcount++;
> > -	mutex_unlock(&cpu_hotplug.lock);
> > +	down_read(&cpu_hotplug.rwlock);
> > 
> >  }
> >  EXPORT_SYMBOL_GPL(get_online_cpus);
> > @@ -61,10 +61,7 @@ void put_online_cpus(void)
> >  {
> >  	if (cpu_hotplug.active_writer == current)
> >  		return;
> > -	mutex_lock(&cpu_hotplug.lock);
> > -	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
> > -		wake_up_process(cpu_hotplug.active_writer);
> > -	mutex_unlock(&cpu_hotplug.lock);
> > +	up_read(&cpu_hotplug.rwlock);
> > 
> >  }
> >  EXPORT_SYMBOL_GPL(put_online_cpus);
> > @@ -86,45 +83,25 @@ void cpu_maps_update_done(void)
> >  }
> > 
> >  /*
> > - * This ensures that the hotplug operation can begin only when the
> > - * refcount goes to zero.
> > + * This ensures that the hotplug operation can begin only when
> > + * there is no reader.
> >   *
> >   * Note that during a cpu-hotplug operation, the new readers, if any,
> > - * will be blocked by the cpu_hotplug.lock
> > + * will be blocked by the cpu_hotplug.rwlock
> >   *
> >   * Since cpu_hotplug_begin() is always called after invoking
> >   * cpu_maps_update_begin(), we can be sure that only one writer is active.
> > - *
> > - * Note that theoretically, there is a possibility of a livelock:
> > - * - Refcount goes to zero, last reader wakes up the sleeping
> > - *   writer.
> > - * - Last reader unlocks the cpu_hotplug.lock.
> > - * - A new reader arrives at this moment, bumps up the refcount.
> > - * - The writer acquires the cpu_hotplug.lock finds the refcount
> > - *   non zero and goes to sleep again.
> > - *
> > - * However, this is very difficult to achieve in practice since
> > - * get_online_cpus() not an api which is called all that often.
> > - *
> >   */
> >  static void cpu_hotplug_begin(void)
> >  {
> > +	down_write(&cpu_hotplug.rwlock);
> >  	cpu_hotplug.active_writer = current;
> > -
> > -	for (;;) {
> > -		mutex_lock(&cpu_hotplug.lock);
> > -		if (likely(!cpu_hotplug.refcount))
> > -			break;
> > -		__set_current_state(TASK_UNINTERRUPTIBLE);
> > -		mutex_unlock(&cpu_hotplug.lock);
> > -		schedule();
> > -	}
> >  }
> > 
> >  static void cpu_hotplug_done(void)
> >  {
> >  	cpu_hotplug.active_writer = NULL;
> > -	mutex_unlock(&cpu_hotplug.lock);
> > +	up_write(&cpu_hotplug.rwlock);
> >  }
> >  /* Need to know about CPUs going up/down? */
> >  int __ref register_cpu_notifier(struct notifier_block *nb)
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Thanks and Regards
gautham

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2
  2009-05-30  4:37   ` Gautham R Shenoy
@ 2009-06-04  6:58     ` Lai Jiangshan
  2009-06-04 20:49       ` Oleg Nesterov
  2009-06-05 15:37       ` Paul E. McKenney
  0 siblings, 2 replies; 26+ messages in thread
From: Lai Jiangshan @ 2009-06-04  6:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Gautham R Shenoy, Paul E. McKenney, Rusty Russell, Ingo Molnar,
	LKML, Peter Zijlstra, Oleg Nesterov

Gautham R Shenoy wrote:
> On Fri, May 29, 2009 at 06:53:42PM -0700, Paul E. McKenney wrote:
>> On Fri, May 29, 2009 at 04:29:30PM +0800, Lai Jiangshan wrote:
>>> Current get_online_cpus()/put_online_cpus() re-implement
>>> a rw_semaphore, so it is converted to a real rw_semaphore in this fix.
>>> It simplifies codes, and is good for read.
>>>
>>> And misc fix:
>>> 1) Add comments for cpu_hotplug.active_writer.
>>> 2) The theoretical disadvantage described in cpu_hotplug_begin()'s
>>>    comments is no longer existed when we use rw_semaphore,
>>>    so this part of comments was removed.
>>>
>>> [Impact: improve get_online_cpus()/put_online_cpus() ]
>> Actually, it turns out that for my purposes it is only necessary to check:
>>
>> 	cpu_hotplug.active_writer != NULL
>>
>> The only time that it is unsafe to invoke get_online_cpus() is when
>> in a notifier, and in that case the value of cpu_hotplug.active_writer
>> is stable.  There could be false positives, but these are harmless, as
>> the fallback is simply synchronize_sched().
>>
>> Even this is only needed should the deadlock scenario you pointed out
>> arise in practice.
>>
>> As Oleg noted, there are some "interesting" constraints on
>> get_online_cpus().  Adding Gautham Shenoy to CC for his views.
> 
> So, to put it in a sentence, get_online_cpus()/put_online_cpus() is a
> read-write semaphore with read-preference while allowing writer to
> downgrade to a reader when required.
> 
> Read-preference was one of the ways of allowing unsuspecting functions
> which need the protection against cpu-hotplug to end up seeking help of
> functions which also need protection against cpu-hotplug. IOW allow a
> single context to call get_online_cpus() without giving away to circular
> deadlock. A fair reader-write lock wouldn't allow that since in the
> presence of a write, the recursive reads would block, thereby causing a
> deadlock.
> 
> Also, around the time when this design was chosen, we had a whole bunch
> of functions which did try to take the original "cpu_hotplug_mutex"
> recursively. We could do well to use Lai's implementation if such
> functions have mended their ways since this would make it a lot simpler
> :-) . But I suspect it is easier said than done!
> 
> BTW, I second the idea of try_get_online_cpus(). I had myself proposed
> this idea a year back. http://lkml.org/lkml/2008/4/29/222.
> 
> 
> 

Add CC to Peter Zijlstra <peterz@infradead.org>

(This patch is against mainline but not for inclusion, it will adapted
against -mm when request)

Requst For Comments and Reviewing Hungeringly.

- Lockless for get_online_cpus()'s fast path
- Introduce try_get_online_cpus()

---
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 2643d84..63b216c 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -104,6 +104,7 @@ extern struct sysdev_class cpu_sysdev_class;
 
 extern void get_online_cpus(void);
 extern void put_online_cpus(void);
+extern int try_get_online_cpus(void);
 #define hotcpu_notifier(fn, pri) {				\
 	static struct notifier_block fn##_nb __cpuinitdata =	\
 		{ .notifier_call = fn, .priority = pri };	\
@@ -117,6 +118,7 @@ int cpu_down(unsigned int cpu);
 
 #define get_online_cpus()	do { } while (0)
 #define put_online_cpus()	do { } while (0)
+static inline int try_get_online_cpus(void) { return 1; }
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 395b697..54d6e0d 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -14,6 +14,8 @@
 #include <linux/kthread.h>
 #include <linux/stop_machine.h>
 #include <linux/mutex.h>
+#include <asm/atomic.h>
+#include <linux/wait.h>
 
 #ifdef CONFIG_SMP
 /* Serializes the updates to cpu_online_mask, cpu_present_mask */
@@ -26,21 +28,26 @@ static __cpuinitdata RAW_NOTIFIER_HEAD(cpu_chain);
  */
 static int cpu_hotplug_disabled;
 
+/*
+ * @cpu_hotplug is a special read-write semaphore with these semantics:
+ * 1) It is read-preference and allows reader-in-reader recursion.
+ * 2) It allows writer to downgrade to a reader when required.
+ *    (allows reader-in-writer recursion.)
+ * 3) It allows only one thread to require the write-side lock at most.
+ *    (cpu_add_remove_lock ensures it.)
+ */
 static struct {
 	struct task_struct *active_writer;
-	struct mutex lock; /* Synchronizes accesses to refcount, */
-	/*
-	 * Also blocks the new readers during
-	 * an ongoing cpu hotplug operation.
-	 */
-	int refcount;
+	wait_queue_head_t sleeping_readers;
+	/* refcount = 0 means the writer owns the lock. */
+	atomic_t refcount;
 } cpu_hotplug;
 
 void __init cpu_hotplug_init(void)
 {
 	cpu_hotplug.active_writer = NULL;
-	mutex_init(&cpu_hotplug.lock);
-	cpu_hotplug.refcount = 0;
+	init_waitqueue_head(&cpu_hotplug.sleeping_readers);
+	atomic_set(&cpu_hotplug.refcount, 1);
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -50,10 +57,20 @@ void get_online_cpus(void)
 	might_sleep();
 	if (cpu_hotplug.active_writer == current)
 		return;
-	mutex_lock(&cpu_hotplug.lock);
-	cpu_hotplug.refcount++;
-	mutex_unlock(&cpu_hotplug.lock);
 
+	if (unlikely(!atomic_inc_not_zero(&cpu_hotplug.refcount))) {
+		DEFINE_WAIT(wait);
+
+		for (;;) {
+			prepare_to_wait(&cpu_hotplug.sleeping_readers, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (atomic_inc_not_zero(&cpu_hotplug.refcount))
+				break;
+			schedule();
+		}
+
+		finish_wait(&cpu_hotplug.sleeping_readers, &wait);
+	}
 }
 EXPORT_SYMBOL_GPL(get_online_cpus);
 
@@ -61,14 +78,27 @@ void put_online_cpus(void)
 {
 	if (cpu_hotplug.active_writer == current)
 		return;
-	mutex_lock(&cpu_hotplug.lock);
-	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
-		wake_up_process(cpu_hotplug.active_writer);
-	mutex_unlock(&cpu_hotplug.lock);
 
+	if (unlikely(atomic_dec_and_test(&cpu_hotplug.refcount))) {
+		smp_mb__after_atomic_dec();
+		BUG_ON(!cpu_hotplug.active_writer);
+		wake_up_process(cpu_hotplug.active_writer);
+	}
 }
 EXPORT_SYMBOL_GPL(put_online_cpus);
 
+int try_get_online_cpus(void)
+{
+	if (cpu_hotplug.active_writer == current)
+		return 1;
+
+	if (likely(atomic_inc_not_zero(&cpu_hotplug.refcount)))
+		return 1;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(try_get_online_cpus);
+
 #endif	/* CONFIG_HOTPLUG_CPU */
 
 /*
@@ -86,46 +116,41 @@ void cpu_maps_update_done(void)
 }
 
 /*
- * This ensures that the hotplug operation can begin only when the
- * refcount goes to zero.
+ * This ensures that the hotplug operation can begin only when
+ * there is no ongoing reader.
  *
  * Note that during a cpu-hotplug operation, the new readers, if any,
- * will be blocked by the cpu_hotplug.lock
+ * will be blocked and queued at cpu_hotplug.sleeping_readers.
  *
  * Since cpu_hotplug_begin() is always called after invoking
  * cpu_maps_update_begin(), we can be sure that only one writer is active.
  *
- * Note that theoretically, there is a possibility of a livelock:
- * - Refcount goes to zero, last reader wakes up the sleeping
- *   writer.
- * - Last reader unlocks the cpu_hotplug.lock.
- * - A new reader arrives at this moment, bumps up the refcount.
- * - The writer acquires the cpu_hotplug.lock finds the refcount
- *   non zero and goes to sleep again.
- *
- * However, this is very difficult to achieve in practice since
- * get_online_cpus() not an api which is called all that often.
- *
  */
 static void cpu_hotplug_begin(void)
 {
 	cpu_hotplug.active_writer = current;
+	smp_mb__before_atomic_dec();
+	atomic_dec(&cpu_hotplug.refcount);
 
 	for (;;) {
-		mutex_lock(&cpu_hotplug.lock);
-		if (likely(!cpu_hotplug.refcount))
+		set_current_state(TASK_UNINTERRUPTIBLE);
+		if (!atomic_read(&cpu_hotplug.refcount))
 			break;
-		__set_current_state(TASK_UNINTERRUPTIBLE);
-		mutex_unlock(&cpu_hotplug.lock);
 		schedule();
 	}
+
+	__set_current_state(TASK_RUNNING);
 }
 
 static void cpu_hotplug_done(void)
 {
 	cpu_hotplug.active_writer = NULL;
-	mutex_unlock(&cpu_hotplug.lock);
+	atomic_inc(&cpu_hotplug.refcount);
+
+	if (waitqueue_active(&cpu_hotplug.sleeping_readers))
+		wake_up(&cpu_hotplug.sleeping_readers);
 }
+
 /* Need to know about CPUs going up/down? */
 int __ref register_cpu_notifier(struct notifier_block *nb)
 {



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2
  2009-06-04  6:58     ` [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2 Lai Jiangshan
@ 2009-06-04 20:49       ` Oleg Nesterov
  2009-06-05  1:32         ` Lai Jiangshan
  2009-06-05 15:37       ` Paul E. McKenney
  1 sibling, 1 reply; 26+ messages in thread
From: Oleg Nesterov @ 2009-06-04 20:49 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Andrew Morton, Gautham R Shenoy, Paul E. McKenney, Rusty Russell,
	Ingo Molnar, LKML, Peter Zijlstra

On 06/04, Lai Jiangshan wrote:
>
> - Lockless for get_online_cpus()'s fast path
> - Introduce try_get_online_cpus()

I think this can work...

> @@ -50,10 +57,20 @@ void get_online_cpus(void)
>  	might_sleep();
>  	if (cpu_hotplug.active_writer == current)
>  		return;
> -	mutex_lock(&cpu_hotplug.lock);
> -	cpu_hotplug.refcount++;
> -	mutex_unlock(&cpu_hotplug.lock);
>
> +	if (unlikely(!atomic_inc_not_zero(&cpu_hotplug.refcount))) {
> +		DEFINE_WAIT(wait);
> +
> +		for (;;) {
> +			prepare_to_wait(&cpu_hotplug.sleeping_readers, &wait,
> +					TASK_UNINTERRUPTIBLE);
> +			if (atomic_inc_not_zero(&cpu_hotplug.refcount))
> +				break;
> +			schedule();
> +		}
> +
> +		finish_wait(&cpu_hotplug.sleeping_readers, &wait);
> +	}
>  }

Looks like the code above can be replaced with

	wait_event(atomic_inc_not_zero(&cpu_hotplug.refcount));

>  static void cpu_hotplug_done(void)
>  {
>  	cpu_hotplug.active_writer = NULL;
> -	mutex_unlock(&cpu_hotplug.lock);
> +	atomic_inc(&cpu_hotplug.refcount);
> +
> +	if (waitqueue_active(&cpu_hotplug.sleeping_readers))
> +		wake_up(&cpu_hotplug.sleeping_readers);
>  }

This looks racy.

Suppose that the new reader comes right before atomic_inc(). The first
inc_not_zero() fails, the readear does prepare_to_wait(), the 2nd
inc_not_zero() fails too.

cpu_hotplug_done() does atomic_inc().

What guarantees we must see waitqueue_active() == T?

I think cpu_hotplug_done() should do unconditional wake_up(). This path
is slow anyway, "if (waitqueue_active())" does not buy too much. In this
case .sleeping_readers->lock closes the race.

Unless I missed something, of course.

Minor, but I'd suggest to use wake_up_all(). This does not make any
difference because we do not have WQ_FLAG_EXCLUSIVE waiters, but imho
looks a bit cleaner.

Hmm. It seems to me that cpu_hotplug_done() needs mb__before_atomic_inc()
before atomic_inc. Otherwise, "active_writer = NULL" can be re-ordered with
atomic_inc(). If the new reader does get_online_cpus() + put_online_cpus()
quicky, it can see active_writer != NULL.

Oleg.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2
  2009-06-04 20:49       ` Oleg Nesterov
@ 2009-06-05  1:32         ` Lai Jiangshan
  2009-06-05  2:14           ` Oleg Nesterov
  0 siblings, 1 reply; 26+ messages in thread
From: Lai Jiangshan @ 2009-06-05  1:32 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Gautham R Shenoy, Paul E. McKenney, Rusty Russell,
	Ingo Molnar, LKML, Peter Zijlstra

Oleg Nesterov wrote:
> On 06/04, Lai Jiangshan wrote:
>> - Lockless for get_online_cpus()'s fast path
>> - Introduce try_get_online_cpus()
> 
> I think this can work...
> 
>> @@ -50,10 +57,20 @@ void get_online_cpus(void)
>>  	might_sleep();
>>  	if (cpu_hotplug.active_writer == current)
>>  		return;
>> -	mutex_lock(&cpu_hotplug.lock);
>> -	cpu_hotplug.refcount++;
>> -	mutex_unlock(&cpu_hotplug.lock);
>>
>> +	if (unlikely(!atomic_inc_not_zero(&cpu_hotplug.refcount))) {
>> +		DEFINE_WAIT(wait);
>> +
>> +		for (;;) {
>> +			prepare_to_wait(&cpu_hotplug.sleeping_readers, &wait,
>> +					TASK_UNINTERRUPTIBLE);
>> +			if (atomic_inc_not_zero(&cpu_hotplug.refcount))
>> +				break;
>> +			schedule();
>> +		}
>> +
>> +		finish_wait(&cpu_hotplug.sleeping_readers, &wait);
>> +	}
>>  }
> 
> Looks like the code above can be replaced with
> 
> 	wait_event(atomic_inc_not_zero(&cpu_hotplug.refcount));

You are right, but with the atomic_inc_not_zero() has side-effect,
I'm afraid that wait_event() will be changed in future, and it may
increases the cpu_hotplug.refcount twice.

#define wait_event(wq, condition) ......

I consider that @condition should not have side-effect, it should be
some thing like this:

some_number == 2, !some_condition, some_thing_has_done,
......

> 
>>  static void cpu_hotplug_done(void)
>>  {
>>  	cpu_hotplug.active_writer = NULL;
>> -	mutex_unlock(&cpu_hotplug.lock);
>> +	atomic_inc(&cpu_hotplug.refcount);
>> +
>> +	if (waitqueue_active(&cpu_hotplug.sleeping_readers))
>> +		wake_up(&cpu_hotplug.sleeping_readers);
>>  }
> 
> This looks racy.
> 
> Suppose that the new reader comes right before atomic_inc(). The first
> inc_not_zero() fails, the readear does prepare_to_wait(), the 2nd
> inc_not_zero() fails too.
> 
> cpu_hotplug_done() does atomic_inc().
> 
> What guarantees we must see waitqueue_active() == T?
> 
> I think cpu_hotplug_done() should do unconditional wake_up(). This path
> is slow anyway, "if (waitqueue_active())" does not buy too much. In this
> case .sleeping_readers->lock closes the race.
> 
> Unless I missed something, of course.

You are definitely right, cpu_hotplug_done() should do unconditional
wake_up(). waitqueue_active() has no synchronization codes.

> 
> 
> Minor, but I'd suggest to use wake_up_all(). This does not make any
> difference because we do not have WQ_FLAG_EXCLUSIVE waiters, but imho
> looks a bit cleaner.
> 
> 
> Hmm. It seems to me that cpu_hotplug_done() needs mb__before_atomic_inc()
> before atomic_inc. Otherwise, "active_writer = NULL" can be re-ordered with
> atomic_inc(). If the new reader does get_online_cpus() + put_online_cpus()
> quicky, it can see active_writer != NULL.
> 
> 

The lines "active_writer = NULL" and "atomic_inc()" can exchange,
there is no code need to synchronize to them.
get_online_cpus()/put_online_cpus() will see "active_writer != current",
it just what get_online_cpus()/put_online_cpus() needs.

Lai


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2
  2009-06-05  1:32         ` Lai Jiangshan
@ 2009-06-05  2:14           ` Oleg Nesterov
  0 siblings, 0 replies; 26+ messages in thread
From: Oleg Nesterov @ 2009-06-05  2:14 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Andrew Morton, Gautham R Shenoy, Paul E. McKenney, Rusty Russell,
	Ingo Molnar, LKML, Peter Zijlstra

On 06/05, Lai Jiangshan wrote:
>
> Oleg Nesterov wrote:
> > On 06/04, Lai Jiangshan wrote:
> >>
> >> +	if (unlikely(!atomic_inc_not_zero(&cpu_hotplug.refcount))) {
> >> +		DEFINE_WAIT(wait);
> >> +
> >> +		for (;;) {
> >> +			prepare_to_wait(&cpu_hotplug.sleeping_readers, &wait,
> >> +					TASK_UNINTERRUPTIBLE);
> >> +			if (atomic_inc_not_zero(&cpu_hotplug.refcount))
> >> +				break;
> >> +			schedule();
> >> +		}
> >> +
> >> +		finish_wait(&cpu_hotplug.sleeping_readers, &wait);
> >> +	}
> >>  }
> >
> > Looks like the code above can be replaced with
> >
> > 	wait_event(atomic_inc_not_zero(&cpu_hotplug.refcount));
>
> You are right, but with the atomic_inc_not_zero() has side-effect,
> I'm afraid that wait_event() will be changed in future, and it may
> increases the cpu_hotplug.refcount twice.

We already have side-effects in wait_event(), see use_module().
And  wait_event(atomic_inc_not_zero()) is much easier to understand.

Personally, I think wait_event() must work correctly if "condition"
has side-effects.

But this is subjective. So, of course, please do what you like more.

> > Hmm. It seems to me that cpu_hotplug_done() needs mb__before_atomic_inc()
> > before atomic_inc. Otherwise, "active_writer = NULL" can be re-ordered with
> > atomic_inc(). If the new reader does get_online_cpus() + put_online_cpus()
> > quicky, it can see active_writer != NULL.
> >
>
> The lines "active_writer = NULL" and "atomic_inc()" can exchange,
> there is no code need to synchronize to them.
> get_online_cpus()/put_online_cpus() will see "active_writer != current",
> it just what get_online_cpus()/put_online_cpus() needs.

I meant another problem, but I misread the patch. When I actually applied
it I don't see any problems with re-ordering.

This means I should find something else ;) put_online_cpus() does not
need smp_mb__after_atomic_dec(). atomic_dec_and_test() implies mb() on
both sides, this is even documented.

Oleg.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2
  2009-06-04  6:58     ` [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2 Lai Jiangshan
  2009-06-04 20:49       ` Oleg Nesterov
@ 2009-06-05 15:37       ` Paul E. McKenney
  2009-06-08  2:36         ` Lai Jiangshan
  2009-06-08  4:19         ` Gautham R Shenoy
  1 sibling, 2 replies; 26+ messages in thread
From: Paul E. McKenney @ 2009-06-05 15:37 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Andrew Morton, Gautham R Shenoy, Rusty Russell, Ingo Molnar, LKML,
	Peter Zijlstra, Oleg Nesterov, dipankar

On Thu, Jun 04, 2009 at 02:58:20PM +0800, Lai Jiangshan wrote:
> Gautham R Shenoy wrote:
> > On Fri, May 29, 2009 at 06:53:42PM -0700, Paul E. McKenney wrote:
> >> On Fri, May 29, 2009 at 04:29:30PM +0800, Lai Jiangshan wrote:
> >>> Current get_online_cpus()/put_online_cpus() re-implement
> >>> a rw_semaphore, so it is converted to a real rw_semaphore in this fix.
> >>> It simplifies codes, and is good for read.
> >>>
> >>> And misc fix:
> >>> 1) Add comments for cpu_hotplug.active_writer.
> >>> 2) The theoretical disadvantage described in cpu_hotplug_begin()'s
> >>>    comments is no longer existed when we use rw_semaphore,
> >>>    so this part of comments was removed.
> >>>
> >>> [Impact: improve get_online_cpus()/put_online_cpus() ]
> >> Actually, it turns out that for my purposes it is only necessary to check:
> >>
> >> 	cpu_hotplug.active_writer != NULL
> >>
> >> The only time that it is unsafe to invoke get_online_cpus() is when
> >> in a notifier, and in that case the value of cpu_hotplug.active_writer
> >> is stable.  There could be false positives, but these are harmless, as
> >> the fallback is simply synchronize_sched().
> >>
> >> Even this is only needed should the deadlock scenario you pointed out
> >> arise in practice.
> >>
> >> As Oleg noted, there are some "interesting" constraints on
> >> get_online_cpus().  Adding Gautham Shenoy to CC for his views.
> > 
> > So, to put it in a sentence, get_online_cpus()/put_online_cpus() is a
> > read-write semaphore with read-preference while allowing writer to
> > downgrade to a reader when required.
> > 
> > Read-preference was one of the ways of allowing unsuspecting functions
> > which need the protection against cpu-hotplug to end up seeking help of
> > functions which also need protection against cpu-hotplug. IOW allow a
> > single context to call get_online_cpus() without giving away to circular
> > deadlock. A fair reader-write lock wouldn't allow that since in the
> > presence of a write, the recursive reads would block, thereby causing a
> > deadlock.
> > 
> > Also, around the time when this design was chosen, we had a whole bunch
> > of functions which did try to take the original "cpu_hotplug_mutex"
> > recursively. We could do well to use Lai's implementation if such
> > functions have mended their ways since this would make it a lot simpler
> > :-) . But I suspect it is easier said than done!
> > 
> > BTW, I second the idea of try_get_online_cpus(). I had myself proposed
> > this idea a year back. http://lkml.org/lkml/2008/4/29/222.
> > 
> > 
> > 
> 
> Add CC to Peter Zijlstra <peterz@infradead.org>
> 
> (This patch is against mainline but not for inclusion, it will adapted
> against -mm when request)
> 
> Requst For Comments and Reviewing Hungeringly.
> 
> - Lockless for get_online_cpus()'s fast path
> - Introduce try_get_online_cpus()

One question for Gautham Shenoy -- are non-atomic CPU-hotplug notifiers
always invoked from the task that did the cpu_hotplug_begin()?

If so, well and good.  If not, then it would not be possible to
expedite RCU grace periods from within CPU-hotplug notifiers.

							Thanx, Paul

> ---
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index 2643d84..63b216c 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -104,6 +104,7 @@ extern struct sysdev_class cpu_sysdev_class;
> 
>  extern void get_online_cpus(void);
>  extern void put_online_cpus(void);
> +extern int try_get_online_cpus(void);
>  #define hotcpu_notifier(fn, pri) {				\
>  	static struct notifier_block fn##_nb __cpuinitdata =	\
>  		{ .notifier_call = fn, .priority = pri };	\
> @@ -117,6 +118,7 @@ int cpu_down(unsigned int cpu);
> 
>  #define get_online_cpus()	do { } while (0)
>  #define put_online_cpus()	do { } while (0)
> +static inline int try_get_online_cpus(void) { return 1; }
>  #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
>  /* These aren't inline functions due to a GCC bug. */
>  #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 395b697..54d6e0d 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -14,6 +14,8 @@
>  #include <linux/kthread.h>
>  #include <linux/stop_machine.h>
>  #include <linux/mutex.h>
> +#include <asm/atomic.h>
> +#include <linux/wait.h>
> 
>  #ifdef CONFIG_SMP
>  /* Serializes the updates to cpu_online_mask, cpu_present_mask */
> @@ -26,21 +28,26 @@ static __cpuinitdata RAW_NOTIFIER_HEAD(cpu_chain);
>   */
>  static int cpu_hotplug_disabled;
> 
> +/*
> + * @cpu_hotplug is a special read-write semaphore with these semantics:
> + * 1) It is read-preference and allows reader-in-reader recursion.
> + * 2) It allows writer to downgrade to a reader when required.
> + *    (allows reader-in-writer recursion.)
> + * 3) It allows only one thread to require the write-side lock at most.
> + *    (cpu_add_remove_lock ensures it.)
> + */
>  static struct {
>  	struct task_struct *active_writer;
> -	struct mutex lock; /* Synchronizes accesses to refcount, */
> -	/*
> -	 * Also blocks the new readers during
> -	 * an ongoing cpu hotplug operation.
> -	 */
> -	int refcount;
> +	wait_queue_head_t sleeping_readers;
> +	/* refcount = 0 means the writer owns the lock. */
> +	atomic_t refcount;
>  } cpu_hotplug;
> 
>  void __init cpu_hotplug_init(void)
>  {
>  	cpu_hotplug.active_writer = NULL;
> -	mutex_init(&cpu_hotplug.lock);
> -	cpu_hotplug.refcount = 0;
> +	init_waitqueue_head(&cpu_hotplug.sleeping_readers);
> +	atomic_set(&cpu_hotplug.refcount, 1);
>  }
> 
>  #ifdef CONFIG_HOTPLUG_CPU
> @@ -50,10 +57,20 @@ void get_online_cpus(void)
>  	might_sleep();
>  	if (cpu_hotplug.active_writer == current)
>  		return;
> -	mutex_lock(&cpu_hotplug.lock);
> -	cpu_hotplug.refcount++;
> -	mutex_unlock(&cpu_hotplug.lock);
> 
> +	if (unlikely(!atomic_inc_not_zero(&cpu_hotplug.refcount))) {
> +		DEFINE_WAIT(wait);
> +
> +		for (;;) {
> +			prepare_to_wait(&cpu_hotplug.sleeping_readers, &wait,
> +					TASK_UNINTERRUPTIBLE);
> +			if (atomic_inc_not_zero(&cpu_hotplug.refcount))
> +				break;
> +			schedule();
> +		}
> +
> +		finish_wait(&cpu_hotplug.sleeping_readers, &wait);
> +	}
>  }
>  EXPORT_SYMBOL_GPL(get_online_cpus);
> 
> @@ -61,14 +78,27 @@ void put_online_cpus(void)
>  {
>  	if (cpu_hotplug.active_writer == current)
>  		return;
> -	mutex_lock(&cpu_hotplug.lock);
> -	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
> -		wake_up_process(cpu_hotplug.active_writer);
> -	mutex_unlock(&cpu_hotplug.lock);
> 
> +	if (unlikely(atomic_dec_and_test(&cpu_hotplug.refcount))) {
> +		smp_mb__after_atomic_dec();
> +		BUG_ON(!cpu_hotplug.active_writer);
> +		wake_up_process(cpu_hotplug.active_writer);
> +	}
>  }
>  EXPORT_SYMBOL_GPL(put_online_cpus);
> 
> +int try_get_online_cpus(void)
> +{
> +	if (cpu_hotplug.active_writer == current)
> +		return 1;
> +
> +	if (likely(atomic_inc_not_zero(&cpu_hotplug.refcount)))
> +		return 1;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(try_get_online_cpus);
> +
>  #endif	/* CONFIG_HOTPLUG_CPU */
> 
>  /*
> @@ -86,46 +116,41 @@ void cpu_maps_update_done(void)
>  }
> 
>  /*
> - * This ensures that the hotplug operation can begin only when the
> - * refcount goes to zero.
> + * This ensures that the hotplug operation can begin only when
> + * there is no ongoing reader.
>   *
>   * Note that during a cpu-hotplug operation, the new readers, if any,
> - * will be blocked by the cpu_hotplug.lock
> + * will be blocked and queued at cpu_hotplug.sleeping_readers.
>   *
>   * Since cpu_hotplug_begin() is always called after invoking
>   * cpu_maps_update_begin(), we can be sure that only one writer is active.
>   *
> - * Note that theoretically, there is a possibility of a livelock:
> - * - Refcount goes to zero, last reader wakes up the sleeping
> - *   writer.
> - * - Last reader unlocks the cpu_hotplug.lock.
> - * - A new reader arrives at this moment, bumps up the refcount.
> - * - The writer acquires the cpu_hotplug.lock finds the refcount
> - *   non zero and goes to sleep again.
> - *
> - * However, this is very difficult to achieve in practice since
> - * get_online_cpus() not an api which is called all that often.
> - *
>   */
>  static void cpu_hotplug_begin(void)
>  {
>  	cpu_hotplug.active_writer = current;
> +	smp_mb__before_atomic_dec();
> +	atomic_dec(&cpu_hotplug.refcount);
> 
>  	for (;;) {
> -		mutex_lock(&cpu_hotplug.lock);
> -		if (likely(!cpu_hotplug.refcount))
> +		set_current_state(TASK_UNINTERRUPTIBLE);
> +		if (!atomic_read(&cpu_hotplug.refcount))
>  			break;
> -		__set_current_state(TASK_UNINTERRUPTIBLE);
> -		mutex_unlock(&cpu_hotplug.lock);
>  		schedule();
>  	}
> +
> +	__set_current_state(TASK_RUNNING);
>  }
> 
>  static void cpu_hotplug_done(void)
>  {
>  	cpu_hotplug.active_writer = NULL;
> -	mutex_unlock(&cpu_hotplug.lock);
> +	atomic_inc(&cpu_hotplug.refcount);
> +
> +	if (waitqueue_active(&cpu_hotplug.sleeping_readers))
> +		wake_up(&cpu_hotplug.sleeping_readers);
>  }
> +
>  /* Need to know about CPUs going up/down? */
>  int __ref register_cpu_notifier(struct notifier_block *nb)
>  {
> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2
  2009-06-05 15:37       ` Paul E. McKenney
@ 2009-06-08  2:36         ` Lai Jiangshan
  2009-06-08  4:19         ` Gautham R Shenoy
  1 sibling, 0 replies; 26+ messages in thread
From: Lai Jiangshan @ 2009-06-08  2:36 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov
  Cc: paulmck, Gautham R Shenoy, Rusty Russell, Ingo Molnar, LKML,
	Peter Zijlstra, dipankar

Paul E. McKenney wrote:
>> Add CC to Peter Zijlstra <peterz@infradead.org>
>>
>> (This patch is against mainline but not for inclusion, it will adapted
>> against -mm when request)
>>
>> Requst For Comments and Reviewing Hungeringly.
>>
>> - Lockless for get_online_cpus()'s fast path
>> - Introduce try_get_online_cpus()
> 
> One question for Gautham Shenoy -- are non-atomic CPU-hotplug notifiers
> always invoked from the task that did the cpu_hotplug_begin()?
> 
> If so, well and good.  If not, then it would not be possible to
> expedite RCU grace periods from within CPU-hotplug notifiers.
> 
> 							Thanx, Paul
> 


I have no proper machine for several days to test the next version
of this patch, so it is still being delayed a while.

Lai.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2
  2009-06-05 15:37       ` Paul E. McKenney
  2009-06-08  2:36         ` Lai Jiangshan
@ 2009-06-08  4:19         ` Gautham R Shenoy
  2009-06-08 14:25           ` Paul E. McKenney
  1 sibling, 1 reply; 26+ messages in thread
From: Gautham R Shenoy @ 2009-06-08  4:19 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Lai Jiangshan, Andrew Morton, Rusty Russell, Ingo Molnar, LKML,
	Peter Zijlstra, Oleg Nesterov, dipankar

Hi Paul,

On Fri, Jun 05, 2009 at 08:37:14AM -0700, Paul E. McKenney wrote:
> 
> One question for Gautham Shenoy -- are non-atomic CPU-hotplug notifiers
> always invoked from the task that did the cpu_hotplug_begin()?

Except for the notifiers handling two events, rest of the notifiers
are always invoked from the task that did the cpu_hotplug_begin().

The two events are CPU_DYING which is called from the context of the
stop_machine_thread and CPU_STARTING which is called from the context of
the idle thread on the CPU that has just come up. The notifiers handling
these two events are expected to be atomic.

> If so, well and good.  If not, then it would not be possible to
> expedite RCU grace periods from within CPU-hotplug notifiers.

I hope this would be good enough :-)
> 
> 							Thanx, Paul
> 

-- 
Thanks and Regards
gautham

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2
  2009-06-08  4:19         ` Gautham R Shenoy
@ 2009-06-08 14:25           ` Paul E. McKenney
  2009-06-09 12:07             ` [PATCH -mm] cpuhotplug: introduce try_get_online_cpus() take 3 Lai Jiangshan
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2009-06-08 14:25 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Lai Jiangshan, Andrew Morton, Rusty Russell, Ingo Molnar, LKML,
	Peter Zijlstra, Oleg Nesterov, dipankar

On Mon, Jun 08, 2009 at 09:49:34AM +0530, Gautham R Shenoy wrote:
> Hi Paul,
> 
> On Fri, Jun 05, 2009 at 08:37:14AM -0700, Paul E. McKenney wrote:
> > 
> > One question for Gautham Shenoy -- are non-atomic CPU-hotplug notifiers
> > always invoked from the task that did the cpu_hotplug_begin()?
> 
> Except for the notifiers handling two events, rest of the notifiers
> are always invoked from the task that did the cpu_hotplug_begin().
> 
> The two events are CPU_DYING which is called from the context of the
> stop_machine_thread and CPU_STARTING which is called from the context of
> the idle thread on the CPU that has just come up. The notifiers handling
> these two events are expected to be atomic.
> 
> > If so, well and good.  If not, then it would not be possible to
> > expedite RCU grace periods from within CPU-hotplug notifiers.
> 
> I hope this would be good enough :-)

Works for me!!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH -mm] cpuhotplug: introduce try_get_online_cpus() take 3
  2009-06-08 14:25           ` Paul E. McKenney
@ 2009-06-09 12:07             ` Lai Jiangshan
  2009-06-09 19:34               ` Andrew Morton
  0 siblings, 1 reply; 26+ messages in thread
From: Lai Jiangshan @ 2009-06-09 12:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: paulmck, Gautham R Shenoy, Rusty Russell, Ingo Molnar, LKML,
	Peter Zijlstra, Oleg Nesterov, dipankar

It's for -mm tree.

It also works for mainline if you apply this at first:
http://lkml.org/lkml/2009/2/17/58

Subject: [PATCH -mm] cpuhotplug: introduce try_get_online_cpus() take 3

get_online_cpus() is a typically coarsely granular lock.
It's a source of ABBA deadlock.

Thanks to the CPU notifiers, Some subsystem's global lock will
be required after cpu_hotplug.lock. Subsystem's global lock
is coarsely granular lock too, thus a lot's of lock in kernel
should be required after cpu_hotplug.lock(if we need
cpu_hotplug.lock held too)

Otherwise it may come to a ABBA deadlock like this:

thread 1                                      |        thread 2
_cpu_down()                                   |  Lock a-kernel-lock.
  cpu_hotplug_begin()                         |
    down_write(&cpu_hotplug.lock)             |
  __raw_notifier_call_chain(CPU_DOWN_PREPARE) |  get_online_cpus()
------------------------------------------------------------------------
    Lock a-kernel-lock.(wait thread2)         |    down_read(&cpu_hotplug.lock)
                                                   (wait thread 1)

But CPU online/offline are happened very rarely, get_online_cpus()
returns success quickly in all probability.
So it's an asinine behavior that get_online_cpus() is not allowed
to be required after we had held "a-kernel-lock".

To dispel the ABBA deadlock, this patch introduces
try_get_online_cpus(). It returns fail very rarely. It gives the
caller a chance to select an alternative way to finish works,
instead of sleeping or deadlock.

Changed from V1
Lockless for get_online_cpus()'s fast path

Changed from V2
Fix patch as Oleg Nesterov valuable suggestions.

Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 4d668e0..eeb9ca5 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -99,6 +99,7 @@ extern struct sysdev_class cpu_sysdev_class;
 
 extern void get_online_cpus(void);
 extern void put_online_cpus(void);
+extern int try_get_online_cpus(void);
 #define hotcpu_notifier(fn, pri) {				\
 	static struct notifier_block fn##_nb __cpuinitdata =	\
 		{ .notifier_call = fn, .priority = pri };	\
@@ -112,6 +113,7 @@ int cpu_down(unsigned int cpu);
 
 #define get_online_cpus()	do { } while (0)
 #define put_online_cpus()	do { } while (0)
+static inline int try_get_online_cpus(void) { return 1; }
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 609fae1..afecc95 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -14,6 +14,8 @@
 #include <linux/kthread.h>
 #include <linux/stop_machine.h>
 #include <linux/mutex.h>
+#include <asm/atomic.h>
+#include <linux/wait.h>
 
 #ifdef CONFIG_SMP
 /* Serializes the updates to cpu_online_mask, cpu_present_mask */
@@ -26,16 +28,23 @@ static __cpuinitdata RAW_NOTIFIER_HEAD(cpu_chain);
  */
 static int cpu_hotplug_disabled;
 
+/*
+ * @cpu_hotplug is a special read-write semaphore with these semantics:
+ * 1) It is read-preference and allows reader-in-reader recursion.
+ * 2) It allows writer to downgrade to a reader when required.
+ *    (allows reader-in-writer recursion.)
+ * 3) It allows only one thread to require the write-side lock at most.
+ *    (cpu_add_remove_lock ensures it.)
+ */
 static struct {
 	struct task_struct *active_writer;
-	struct mutex lock; /* Synchronizes accesses to refcount, */
-	/*
-	 * Also blocks the new readers during
-	 * an ongoing cpu hotplug operation.
-	 */
-	int refcount;
+	wait_queue_head_t sleeping_readers;
+	/* refcount = 0 means the writer owns the lock. */
+	atomic_t refcount;
 } cpu_hotplug = {
-	.lock = __MUTEX_INITIALIZER(cpu_hotplug.lock),
+	NULL,
+	__WAIT_QUEUE_HEAD_INITIALIZER(cpu_hotplug.sleeping_readers),
+	ATOMIC_INIT(1),
 };
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -45,10 +54,9 @@ void get_online_cpus(void)
 	might_sleep();
 	if (cpu_hotplug.active_writer == current)
 		return;
-	mutex_lock(&cpu_hotplug.lock);
-	cpu_hotplug.refcount++;
-	mutex_unlock(&cpu_hotplug.lock);
 
+	wait_event(cpu_hotplug.sleeping_readers,
+		   likely(atomic_inc_not_zero(&cpu_hotplug.refcount)));
 }
 EXPORT_SYMBOL_GPL(get_online_cpus);
 
@@ -56,14 +64,27 @@ void put_online_cpus(void)
 {
 	if (cpu_hotplug.active_writer == current)
 		return;
-	mutex_lock(&cpu_hotplug.lock);
-	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
-		wake_up_process(cpu_hotplug.active_writer);
-	mutex_unlock(&cpu_hotplug.lock);
 
+	if (unlikely(atomic_dec_and_test(&cpu_hotplug.refcount))) {
+		/* atomic_dec_and_test() implies smp_mb__after_atomic_dec() */
+		BUG_ON(!cpu_hotplug.active_writer);
+		wake_up_process(cpu_hotplug.active_writer);
+	}
 }
 EXPORT_SYMBOL_GPL(put_online_cpus);
 
+int try_get_online_cpus(void)
+{
+	if (cpu_hotplug.active_writer == current)
+		return 1;
+
+	if (likely(atomic_inc_not_zero(&cpu_hotplug.refcount)))
+		return 1;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(try_get_online_cpus);
+
 #endif	/* CONFIG_HOTPLUG_CPU */
 
 /*
@@ -81,46 +102,42 @@ void cpu_maps_update_done(void)
 }
 
 /*
- * This ensures that the hotplug operation can begin only when the
- * refcount goes to zero.
+ * This ensures that the hotplug operation can begin only when
+ * there is no ongoing reader.
  *
  * Note that during a cpu-hotplug operation, the new readers, if any,
- * will be blocked by the cpu_hotplug.lock
+ * will be blocked and queued at cpu_hotplug.sleeping_readers.
  *
  * Since cpu_hotplug_begin() is always called after invoking
  * cpu_maps_update_begin(), we can be sure that only one writer is active.
  *
- * Note that theoretically, there is a possibility of a livelock:
- * - Refcount goes to zero, last reader wakes up the sleeping
- *   writer.
- * - Last reader unlocks the cpu_hotplug.lock.
- * - A new reader arrives at this moment, bumps up the refcount.
- * - The writer acquires the cpu_hotplug.lock finds the refcount
- *   non zero and goes to sleep again.
- *
- * However, this is very difficult to achieve in practice since
- * get_online_cpus() not an api which is called all that often.
- *
  */
 static void cpu_hotplug_begin(void)
 {
 	cpu_hotplug.active_writer = current;
 
+	/* atomic_dec_and_test() implies smp_mb__before_atomic_dec() */
+	if (atomic_dec_and_test(&cpu_hotplug.refcount))
+		return;
+
 	for (;;) {
-		mutex_lock(&cpu_hotplug.lock);
-		if (likely(!cpu_hotplug.refcount))
+		set_current_state(TASK_UNINTERRUPTIBLE);
+		if (!atomic_read(&cpu_hotplug.refcount))
 			break;
-		__set_current_state(TASK_UNINTERRUPTIBLE);
-		mutex_unlock(&cpu_hotplug.lock);
 		schedule();
 	}
+
+	__set_current_state(TASK_RUNNING);
 }
 
 static void cpu_hotplug_done(void)
 {
+	atomic_inc(&cpu_hotplug.refcount);
+	wake_up_all(&cpu_hotplug.sleeping_readers);
+
 	cpu_hotplug.active_writer = NULL;
-	mutex_unlock(&cpu_hotplug.lock);
 }
+
 /* Need to know about CPUs going up/down? */
 int __ref register_cpu_notifier(struct notifier_block *nb)
 {


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH -mm] cpuhotplug: introduce try_get_online_cpus() take 3
  2009-06-09 12:07             ` [PATCH -mm] cpuhotplug: introduce try_get_online_cpus() take 3 Lai Jiangshan
@ 2009-06-09 19:34               ` Andrew Morton
  2009-06-09 23:47                 ` Paul E. McKenney
  2009-06-10  0:57                 ` [PATCH -mm] " Lai Jiangshan
  0 siblings, 2 replies; 26+ messages in thread
From: Andrew Morton @ 2009-06-09 19:34 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: paulmck, ego, rusty, mingo, linux-kernel, peterz, oleg, dipankar

On Tue, 09 Jun 2009 20:07:09 +0800
Lai Jiangshan <laijs@cn.fujitsu.com> wrote:

> get_online_cpus() is a typically coarsely granular lock.
> It's a source of ABBA deadlock.
> 
> Thanks to the CPU notifiers, Some subsystem's global lock will
> be required after cpu_hotplug.lock. Subsystem's global lock
> is coarsely granular lock too, thus a lot's of lock in kernel
> should be required after cpu_hotplug.lock(if we need
> cpu_hotplug.lock held too)
> 
> Otherwise it may come to a ABBA deadlock like this:
> 
> thread 1                                      |        thread 2
> _cpu_down()                                   |  Lock a-kernel-lock.
>   cpu_hotplug_begin()                         |
>     down_write(&cpu_hotplug.lock)             |
>   __raw_notifier_call_chain(CPU_DOWN_PREPARE) |  get_online_cpus()
> ------------------------------------------------------------------------
>     Lock a-kernel-lock.(wait thread2)         |    down_read(&cpu_hotplug.lock)
>                                                    (wait thread 1)

Confused.  cpu_hotplug_begin() doesn't do
down_write(&cpu_hotplug.lock).  If it _were_ to do that then yes, we'd
be vulnerable to the above deadlock.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -mm] cpuhotplug: introduce try_get_online_cpus() take 3
  2009-06-09 19:34               ` Andrew Morton
@ 2009-06-09 23:47                 ` Paul E. McKenney
  2009-06-10  1:13                   ` [PATCH -mm resend] " Lai Jiangshan
  2009-06-10  0:57                 ` [PATCH -mm] " Lai Jiangshan
  1 sibling, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2009-06-09 23:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Lai Jiangshan, ego, rusty, mingo, linux-kernel, peterz, oleg,
	dipankar

On Tue, Jun 09, 2009 at 12:34:38PM -0700, Andrew Morton wrote:
> On Tue, 09 Jun 2009 20:07:09 +0800
> Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> 
> > get_online_cpus() is a typically coarsely granular lock.
> > It's a source of ABBA deadlock.
> > 
> > Thanks to the CPU notifiers, Some subsystem's global lock will
> > be required after cpu_hotplug.lock. Subsystem's global lock
> > is coarsely granular lock too, thus a lot's of lock in kernel
> > should be required after cpu_hotplug.lock(if we need
> > cpu_hotplug.lock held too)
> > 
> > Otherwise it may come to a ABBA deadlock like this:
> > 
> > thread 1                                      |        thread 2
> > _cpu_down()                                   |  Lock a-kernel-lock.
> >   cpu_hotplug_begin()                         |
> >     down_write(&cpu_hotplug.lock)             |
> >   __raw_notifier_call_chain(CPU_DOWN_PREPARE) |  get_online_cpus()
> > ------------------------------------------------------------------------
> >     Lock a-kernel-lock.(wait thread2)         |    down_read(&cpu_hotplug.lock)
> >                                                    (wait thread 1)
> 
> Confused.  cpu_hotplug_begin() doesn't do
> down_write(&cpu_hotplug.lock).  If it _were_ to do that then yes, we'd
> be vulnerable to the above deadlock.

The current implementation is a bit more complex.  If you hold a kernel
mutex across get_online_cpus() and also acquire that same kernel mutex
in a hotplug notifier that permits sleeping, I believe that you really
can get a deadlock as follows:

Task 1					 |	Task 2
					 | mutex_lock(&mylock);
cpu_hotplug_begin()			 |
   mutex_lock(&cpu_hotplug.lock);	 |
   [assume cpu_hotplug.refcount == 0]	 | get_online_cpus()
---------------------------------------------------------------------------
   mutex_lock(&mylock);			 |   mutex_lock(&cpu_hotplug.lock);


That said, when I look at the raw_notifier_call_chain() and
unregister_cpu_notifier() code paths, it is not obvious to me that they
exclude each other or otherwise protect the cpu_chain list...

							Thanx, Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH -mm resend] cpuhotplug: introduce try_get_online_cpus() take 3
  2009-06-09 23:47                 ` Paul E. McKenney
@ 2009-06-10  1:13                   ` Lai Jiangshan
  2009-06-10  1:42                     ` Andrew Morton
  0 siblings, 1 reply; 26+ messages in thread
From: Lai Jiangshan @ 2009-06-10  1:13 UTC (permalink / raw)
  To: paulmck
  Cc: Andrew Morton, ego, rusty, mingo, linux-kernel, peterz, oleg,
	dipankar

It's for -mm tree.

It also works for mainline if you apply this at first:
http://lkml.org/lkml/2009/2/17/58

Subject: [PATCH -mm] cpuhotplug: introduce try_get_online_cpus() take 3

get_online_cpus() is a typically coarsely granular lock.
It's a source of ABBA or ABBCCA... deadlock.

Thanks to the CPU notifiers, Some subsystem's global lock will
be required after cpu_hotplug.lock. Subsystem's global lock
is coarsely granular lock too, thus a lot's of lock in kernel
should be required after cpu_hotplug.lock(if we need
cpu_hotplug.lock held too)

Otherwise it may come to a ABBA deadlock like this:

thread 1                                      |        thread 2
_cpu_down()                                   |  Lock a-kernel-lock.
  cpu_hotplug_begin()                         |
    mutex_lock(&cpu_hotplug.lock)             |
  __raw_notifier_call_chain(CPU_DOWN_PREPARE) |  get_online_cpus()
------------------------------------------------------------------------
    Lock a-kernel-lock.(wait thread2)         |    mutex_lock(&cpu_hotplug.lock)
                                                   (wait thread 1)

But CPU online/offline are happened very rarely, get_online_cpus()
returns success quickly in all probability.
So it's an asinine behavior that get_online_cpus() is not allowed
to be required after we had held "a-kernel-lock".

To dispel the ABBA deadlock, this patch introduces
try_get_online_cpus(). It returns fail very rarely. It gives the
caller a chance to select an alternative way to finish works,
instead of sleeping or deadlock.

Changed from V1
Lockless for get_online_cpus()'s fast path

Changed from V2
Fix patch as Oleg Nesterov's valuable suggestions.

Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 4d668e0..eeb9ca5 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -99,6 +99,7 @@ extern struct sysdev_class cpu_sysdev_class;
 
 extern void get_online_cpus(void);
 extern void put_online_cpus(void);
+extern int try_get_online_cpus(void);
 #define hotcpu_notifier(fn, pri) {				\
 	static struct notifier_block fn##_nb __cpuinitdata =	\
 		{ .notifier_call = fn, .priority = pri };	\
@@ -112,6 +113,7 @@ int cpu_down(unsigned int cpu);
 
 #define get_online_cpus()	do { } while (0)
 #define put_online_cpus()	do { } while (0)
+static inline int try_get_online_cpus(void) { return 1; }
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 609fae1..afecc95 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -14,6 +14,8 @@
 #include <linux/kthread.h>
 #include <linux/stop_machine.h>
 #include <linux/mutex.h>
+#include <asm/atomic.h>
+#include <linux/wait.h>
 
 #ifdef CONFIG_SMP
 /* Serializes the updates to cpu_online_mask, cpu_present_mask */
@@ -26,16 +28,23 @@ static __cpuinitdata RAW_NOTIFIER_HEAD(cpu_chain);
  */
 static int cpu_hotplug_disabled;
 
+/*
+ * @cpu_hotplug is a special read-write semaphore with these semantics:
+ * 1) It is read-preference and allows reader-in-reader recursion.
+ * 2) It allows writer to downgrade to a reader when required.
+ *    (allows reader-in-writer recursion.)
+ * 3) It allows only one thread to require the write-side lock at most.
+ *    (cpu_add_remove_lock ensures it.)
+ */
 static struct {
 	struct task_struct *active_writer;
-	struct mutex lock; /* Synchronizes accesses to refcount, */
-	/*
-	 * Also blocks the new readers during
-	 * an ongoing cpu hotplug operation.
-	 */
-	int refcount;
+	wait_queue_head_t sleeping_readers;
+	/* refcount = 0 means the writer owns the lock. */
+	atomic_t refcount;
 } cpu_hotplug = {
-	.lock = __MUTEX_INITIALIZER(cpu_hotplug.lock),
+	NULL,
+	__WAIT_QUEUE_HEAD_INITIALIZER(cpu_hotplug.sleeping_readers),
+	ATOMIC_INIT(1),
 };
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -45,10 +54,9 @@ void get_online_cpus(void)
 	might_sleep();
 	if (cpu_hotplug.active_writer == current)
 		return;
-	mutex_lock(&cpu_hotplug.lock);
-	cpu_hotplug.refcount++;
-	mutex_unlock(&cpu_hotplug.lock);
 
+	wait_event(cpu_hotplug.sleeping_readers,
+		   likely(atomic_inc_not_zero(&cpu_hotplug.refcount)));
 }
 EXPORT_SYMBOL_GPL(get_online_cpus);
 
@@ -56,14 +64,27 @@ void put_online_cpus(void)
 {
 	if (cpu_hotplug.active_writer == current)
 		return;
-	mutex_lock(&cpu_hotplug.lock);
-	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
-		wake_up_process(cpu_hotplug.active_writer);
-	mutex_unlock(&cpu_hotplug.lock);
 
+	if (unlikely(atomic_dec_and_test(&cpu_hotplug.refcount))) {
+		/* atomic_dec_and_test() implies smp_mb__after_atomic_dec() */
+		BUG_ON(!cpu_hotplug.active_writer);
+		wake_up_process(cpu_hotplug.active_writer);
+	}
 }
 EXPORT_SYMBOL_GPL(put_online_cpus);
 
+int try_get_online_cpus(void)
+{
+	if (cpu_hotplug.active_writer == current)
+		return 1;
+
+	if (likely(atomic_inc_not_zero(&cpu_hotplug.refcount)))
+		return 1;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(try_get_online_cpus);
+
 #endif	/* CONFIG_HOTPLUG_CPU */
 
 /*
@@ -81,46 +102,42 @@ void cpu_maps_update_done(void)
 }
 
 /*
- * This ensures that the hotplug operation can begin only when the
- * refcount goes to zero.
+ * This ensures that the hotplug operation can begin only when
+ * there is no ongoing reader.
  *
  * Note that during a cpu-hotplug operation, the new readers, if any,
- * will be blocked by the cpu_hotplug.lock
+ * will be blocked and queued at cpu_hotplug.sleeping_readers.
  *
  * Since cpu_hotplug_begin() is always called after invoking
  * cpu_maps_update_begin(), we can be sure that only one writer is active.
  *
- * Note that theoretically, there is a possibility of a livelock:
- * - Refcount goes to zero, last reader wakes up the sleeping
- *   writer.
- * - Last reader unlocks the cpu_hotplug.lock.
- * - A new reader arrives at this moment, bumps up the refcount.
- * - The writer acquires the cpu_hotplug.lock finds the refcount
- *   non zero and goes to sleep again.
- *
- * However, this is very difficult to achieve in practice since
- * get_online_cpus() not an api which is called all that often.
- *
  */
 static void cpu_hotplug_begin(void)
 {
 	cpu_hotplug.active_writer = current;
 
+	/* atomic_dec_and_test() implies smp_mb__before_atomic_dec() */
+	if (atomic_dec_and_test(&cpu_hotplug.refcount))
+		return;
+
 	for (;;) {
-		mutex_lock(&cpu_hotplug.lock);
-		if (likely(!cpu_hotplug.refcount))
+		set_current_state(TASK_UNINTERRUPTIBLE);
+		if (!atomic_read(&cpu_hotplug.refcount))
 			break;
-		__set_current_state(TASK_UNINTERRUPTIBLE);
-		mutex_unlock(&cpu_hotplug.lock);
 		schedule();
 	}
+
+	__set_current_state(TASK_RUNNING);
 }
 
 static void cpu_hotplug_done(void)
 {
+	atomic_inc(&cpu_hotplug.refcount);
+	wake_up_all(&cpu_hotplug.sleeping_readers);
+
 	cpu_hotplug.active_writer = NULL;
-	mutex_unlock(&cpu_hotplug.lock);
 }
+
 /* Need to know about CPUs going up/down? */
 int __ref register_cpu_notifier(struct notifier_block *nb)
 {



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH -mm resend] cpuhotplug: introduce try_get_online_cpus() take 3
  2009-06-10  1:13                   ` [PATCH -mm resend] " Lai Jiangshan
@ 2009-06-10  1:42                     ` Andrew Morton
  2009-06-11  8:41                       ` Lai Jiangshan
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2009-06-10  1:42 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: paulmck, ego, rusty, mingo, linux-kernel, peterz, oleg, dipankar

On Wed, 10 Jun 2009 09:13:58 +0800 Lai Jiangshan <laijs@cn.fujitsu.com> wrote:

> It's for -mm tree.
> 
> It also works for mainline if you apply this at first:
> http://lkml.org/lkml/2009/2/17/58
> 
> Subject: [PATCH -mm] cpuhotplug: introduce try_get_online_cpus() take 3
> 
> get_online_cpus() is a typically coarsely granular lock.
> It's a source of ABBA or ABBCCA... deadlock.
> 
> Thanks to the CPU notifiers, Some subsystem's global lock will
> be required after cpu_hotplug.lock. Subsystem's global lock
> is coarsely granular lock too, thus a lot's of lock in kernel
> should be required after cpu_hotplug.lock(if we need
> cpu_hotplug.lock held too)
> 
> Otherwise it may come to a ABBA deadlock like this:
> 
> thread 1                                      |        thread 2
> _cpu_down()                                   |  Lock a-kernel-lock.
>   cpu_hotplug_begin()                         |
>     mutex_lock(&cpu_hotplug.lock)             |
>   __raw_notifier_call_chain(CPU_DOWN_PREPARE) |  get_online_cpus()
> ------------------------------------------------------------------------
>     Lock a-kernel-lock.(wait thread2)         |    mutex_lock(&cpu_hotplug.lock)
>                                                    (wait thread 1)

uh, OK.

> But CPU online/offline are happened very rarely, get_online_cpus()
> returns success quickly in all probability.
> So it's an asinine behavior that get_online_cpus() is not allowed
> to be required after we had held "a-kernel-lock".
> 
> To dispel the ABBA deadlock, this patch introduces
> try_get_online_cpus(). It returns fail very rarely. It gives the
> caller a chance to select an alternative way to finish works,
> instead of sleeping or deadlock.

I still think we should really avoid having to do this.  trylocks are
nasty things.

Looking at the above, one would think that a correct fix would be to fix
the bug in "thread 2": take the locks in the correct order?  As
try_get_online_cpus() doesn't actually have any callers, it's hard to
take that thought any further.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -mm resend] cpuhotplug: introduce try_get_online_cpus() take 3
  2009-06-10  1:42                     ` Andrew Morton
@ 2009-06-11  8:41                       ` Lai Jiangshan
  2009-06-11 18:50                         ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Lai Jiangshan @ 2009-06-11  8:41 UTC (permalink / raw)
  To: Andrew Morton, paulmck
  Cc: ego, rusty, mingo, linux-kernel, peterz, oleg, dipankar

Andrew Morton wrote:
> 
> I still think we should really avoid having to do this.  trylocks are
> nasty things.
> 
> Looking at the above, one would think that a correct fix would be to fix
> the bug in "thread 2": take the locks in the correct order?  As
> try_get_online_cpus() doesn't actually have any callers, it's hard to
> take that thought any further.
> 
> 

Sometimes, we can not reorder the locks' order.
try_get_online_cpus() is really needless when no one uses it.

Paul's expedited RCU V7 may need it:
http://lkml.org/lkml/2009/5/22/332

So this patch can be omitted when Paul does not use it.
It's totally OK for me.

Lai


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -mm resend] cpuhotplug: introduce try_get_online_cpus() take 3
  2009-06-11  8:41                       ` Lai Jiangshan
@ 2009-06-11 18:50                         ` Paul E. McKenney
  2009-06-15  4:04                           ` Gautham R Shenoy
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2009-06-11 18:50 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Andrew Morton, ego, rusty, mingo, linux-kernel, peterz, oleg,
	dipankar

On Thu, Jun 11, 2009 at 04:41:42PM +0800, Lai Jiangshan wrote:
> Andrew Morton wrote:
> > 
> > I still think we should really avoid having to do this.  trylocks are
> > nasty things.
> > 
> > Looking at the above, one would think that a correct fix would be to fix
> > the bug in "thread 2": take the locks in the correct order?  As
> > try_get_online_cpus() doesn't actually have any callers, it's hard to
> > take that thought any further.
> 
> Sometimes, we can not reorder the locks' order.
> try_get_online_cpus() is really needless when no one uses it.
> 
> Paul's expedited RCU V7 may need it:
> http://lkml.org/lkml/2009/5/22/332
> 
> So this patch can be omitted when Paul does not use it.
> It's totally OK for me.

Although my patch does not need it in and of itself, if someone were
to hold a kernel mutex across synchronize_sched_expedited(), and also
acquire that same kernel mutex in a hotplug notifier, the deadlock that
Lai calls out would occur.

Even if no one uses synchronize_sched_expedited() in this manner, I feel
that it is good to explore the possibility of dealing with it.  As
Andrew Morton pointed out, CPU-hotplug locking is touchy, so on-the-fly
fixes are to be avoided if possible.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -mm resend] cpuhotplug: introduce try_get_online_cpus() take 3
  2009-06-11 18:50                         ` Paul E. McKenney
@ 2009-06-15  4:04                           ` Gautham R Shenoy
  0 siblings, 0 replies; 26+ messages in thread
From: Gautham R Shenoy @ 2009-06-15  4:04 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Lai Jiangshan, Andrew Morton, rusty, mingo, linux-kernel, peterz,
	oleg, dipankar

On Thu, Jun 11, 2009 at 11:50:15AM -0700, Paul E. McKenney wrote:
> On Thu, Jun 11, 2009 at 04:41:42PM +0800, Lai Jiangshan wrote:
> > Andrew Morton wrote:
> > > 
> > > I still think we should really avoid having to do this.  trylocks are
> > > nasty things.
> > > 
> > > Looking at the above, one would think that a correct fix would be to fix
> > > the bug in "thread 2": take the locks in the correct order?  As
> > > try_get_online_cpus() doesn't actually have any callers, it's hard to
> > > take that thought any further.
> > 
> > Sometimes, we can not reorder the locks' order.
> > try_get_online_cpus() is really needless when no one uses it.
> > 
> > Paul's expedited RCU V7 may need it:
> > http://lkml.org/lkml/2009/5/22/332
> > 
> > So this patch can be omitted when Paul does not use it.
> > It's totally OK for me.
> 
> Although my patch does not need it in and of itself, if someone were
> to hold a kernel mutex across synchronize_sched_expedited(), and also
> acquire that same kernel mutex in a hotplug notifier, the deadlock that
> Lai calls out would occur.
> 
> Even if no one uses synchronize_sched_expedited() in this manner, I feel
> that it is good to explore the possibility of dealing with it.  As
> Andrew Morton pointed out, CPU-hotplug locking is touchy, so on-the-fly
> fixes are to be avoided if possible.

Agreed. Though I like the atomic refcount version of
get_online_cpus()/put_online_cpus() that Lai has proposed.

Anyways, to quote the need for try_get_online_cpus() when it was
proposed last year, it was to be used in worker thread context.

Because in those times we could not do a get_online_cpus() from
the worker thread context fearing the follwing deadlock during
a cpu-hotplug.

Thread 1:(cpu_offline)            |    Thread 2 ( worker_thread)
-----------------------------------------------------------------------
cpu_hotplug_begin();		  |
.				  |
.				  |    get_online_cpus(); /*Blocks */
.				  |
.				  |
CPU_DEAD:			  |
  workqueue_cpu_callback();	  |
     cleanup_workqueue_thread()	  |
     /* Waits for worker thread
      * to finish.
      * Hence a deadlock.
      */

This was fixed by introducing the CPU_POST_DEAD event, the notification

> 
> 							Thanx, Paul

-- 
Thanks and Regards
gautham

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -mm] cpuhotplug: introduce try_get_online_cpus() take 3
  2009-06-09 19:34               ` Andrew Morton
  2009-06-09 23:47                 ` Paul E. McKenney
@ 2009-06-10  0:57                 ` Lai Jiangshan
  1 sibling, 0 replies; 26+ messages in thread
From: Lai Jiangshan @ 2009-06-10  0:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: paulmck, ego, rusty, mingo, linux-kernel, peterz, oleg, dipankar

Andrew Morton wrote:
> On Tue, 09 Jun 2009 20:07:09 +0800
> Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> 
>> get_online_cpus() is a typically coarsely granular lock.
>> It's a source of ABBA deadlock.
>>
>> Thanks to the CPU notifiers, Some subsystem's global lock will
>> be required after cpu_hotplug.lock. Subsystem's global lock
>> is coarsely granular lock too, thus a lot's of lock in kernel
>> should be required after cpu_hotplug.lock(if we need
>> cpu_hotplug.lock held too)
>>
>> Otherwise it may come to a ABBA deadlock like this:
>>
>> thread 1                                      |        thread 2
>> _cpu_down()                                   |  Lock a-kernel-lock.
>>   cpu_hotplug_begin()                         |
>>     down_write(&cpu_hotplug.lock)             |
>>   __raw_notifier_call_chain(CPU_DOWN_PREPARE) |  get_online_cpus()
>> ------------------------------------------------------------------------
>>     Lock a-kernel-lock.(wait thread2)         |    down_read(&cpu_hotplug.lock)
>>                                                    (wait thread 1)
> 
> Confused.  cpu_hotplug_begin() doesn't do
> down_write(&cpu_hotplug.lock).  If it _were_ to do that then yes, we'd
> be vulnerable to the above deadlock.
> 

Ouch, this changelog is modified from the V1. But it not is modified
correctly. I apologize.

Lai.


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2009-06-15  4:04 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-29  8:29 [PATCH 1/2] cpuhotplug: use rw_semaphore for cpu_hotplug Lai Jiangshan
2009-05-29 20:23 ` Andrew Morton
2009-05-29 21:07   ` Oleg Nesterov
2009-05-29 21:17     ` Oleg Nesterov
2009-06-01  1:04       ` Lai Jiangshan
2009-06-01  0:52     ` Lai Jiangshan
2009-06-01  2:22       ` Lai Jiangshan
2009-05-30  1:53 ` Paul E. McKenney
2009-05-30  4:37   ` Gautham R Shenoy
2009-06-04  6:58     ` [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2 Lai Jiangshan
2009-06-04 20:49       ` Oleg Nesterov
2009-06-05  1:32         ` Lai Jiangshan
2009-06-05  2:14           ` Oleg Nesterov
2009-06-05 15:37       ` Paul E. McKenney
2009-06-08  2:36         ` Lai Jiangshan
2009-06-08  4:19         ` Gautham R Shenoy
2009-06-08 14:25           ` Paul E. McKenney
2009-06-09 12:07             ` [PATCH -mm] cpuhotplug: introduce try_get_online_cpus() take 3 Lai Jiangshan
2009-06-09 19:34               ` Andrew Morton
2009-06-09 23:47                 ` Paul E. McKenney
2009-06-10  1:13                   ` [PATCH -mm resend] " Lai Jiangshan
2009-06-10  1:42                     ` Andrew Morton
2009-06-11  8:41                       ` Lai Jiangshan
2009-06-11 18:50                         ` Paul E. McKenney
2009-06-15  4:04                           ` Gautham R Shenoy
2009-06-10  0:57                 ` [PATCH -mm] " Lai Jiangshan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox