All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Gautham R Shenoy <ego@in.ibm.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Ingo Molnar <mingo@elte.hu>, LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Oleg Nesterov <oleg@redhat.com>
Subject: [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2
Date: Thu, 04 Jun 2009 14:58:20 +0800	[thread overview]
Message-ID: <4A27708C.6030703@cn.fujitsu.com> (raw)
In-Reply-To: <20090530043739.GA12157@in.ibm.com>

Gautham R Shenoy wrote:
> On Fri, May 29, 2009 at 06:53:42PM -0700, Paul E. McKenney wrote:
>> On Fri, May 29, 2009 at 04:29:30PM +0800, Lai Jiangshan wrote:
>>> Current get_online_cpus()/put_online_cpus() re-implement
>>> a rw_semaphore, so it is converted to a real rw_semaphore in this fix.
>>> It simplifies codes, and is good for read.
>>>
>>> And misc fix:
>>> 1) Add comments for cpu_hotplug.active_writer.
>>> 2) The theoretical disadvantage described in cpu_hotplug_begin()'s
>>>    comments is no longer existed when we use rw_semaphore,
>>>    so this part of comments was removed.
>>>
>>> [Impact: improve get_online_cpus()/put_online_cpus() ]
>> Actually, it turns out that for my purposes it is only necessary to check:
>>
>> 	cpu_hotplug.active_writer != NULL
>>
>> The only time that it is unsafe to invoke get_online_cpus() is when
>> in a notifier, and in that case the value of cpu_hotplug.active_writer
>> is stable.  There could be false positives, but these are harmless, as
>> the fallback is simply synchronize_sched().
>>
>> Even this is only needed should the deadlock scenario you pointed out
>> arise in practice.
>>
>> As Oleg noted, there are some "interesting" constraints on
>> get_online_cpus().  Adding Gautham Shenoy to CC for his views.
> 
> So, to put it in a sentence, get_online_cpus()/put_online_cpus() is a
> read-write semaphore with read-preference while allowing writer to
> downgrade to a reader when required.
> 
> Read-preference was one of the ways of allowing unsuspecting functions
> which need the protection against cpu-hotplug to end up seeking help of
> functions which also need protection against cpu-hotplug. IOW allow a
> single context to call get_online_cpus() without giving away to circular
> deadlock. A fair reader-write lock wouldn't allow that since in the
> presence of a write, the recursive reads would block, thereby causing a
> deadlock.
> 
> Also, around the time when this design was chosen, we had a whole bunch
> of functions which did try to take the original "cpu_hotplug_mutex"
> recursively. We could do well to use Lai's implementation if such
> functions have mended their ways since this would make it a lot simpler
> :-) . But I suspect it is easier said than done!
> 
> BTW, I second the idea of try_get_online_cpus(). I had myself proposed
> this idea a year back. http://lkml.org/lkml/2008/4/29/222.
> 
> 
> 

Add CC to Peter Zijlstra <peterz@infradead.org>

(This patch is against mainline but not for inclusion, it will adapted
against -mm when request)

Requst For Comments and Reviewing Hungeringly.

- Lockless for get_online_cpus()'s fast path
- Introduce try_get_online_cpus()

---
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 2643d84..63b216c 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -104,6 +104,7 @@ extern struct sysdev_class cpu_sysdev_class;
 
 extern void get_online_cpus(void);
 extern void put_online_cpus(void);
+extern int try_get_online_cpus(void);
 #define hotcpu_notifier(fn, pri) {				\
 	static struct notifier_block fn##_nb __cpuinitdata =	\
 		{ .notifier_call = fn, .priority = pri };	\
@@ -117,6 +118,7 @@ int cpu_down(unsigned int cpu);
 
 #define get_online_cpus()	do { } while (0)
 #define put_online_cpus()	do { } while (0)
+static inline int try_get_online_cpus(void) { return 1; }
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 395b697..54d6e0d 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -14,6 +14,8 @@
 #include <linux/kthread.h>
 #include <linux/stop_machine.h>
 #include <linux/mutex.h>
+#include <asm/atomic.h>
+#include <linux/wait.h>
 
 #ifdef CONFIG_SMP
 /* Serializes the updates to cpu_online_mask, cpu_present_mask */
@@ -26,21 +28,26 @@ static __cpuinitdata RAW_NOTIFIER_HEAD(cpu_chain);
  */
 static int cpu_hotplug_disabled;
 
+/*
+ * @cpu_hotplug is a special read-write semaphore with these semantics:
+ * 1) It is read-preference and allows reader-in-reader recursion.
+ * 2) It allows writer to downgrade to a reader when required.
+ *    (allows reader-in-writer recursion.)
+ * 3) It allows only one thread to require the write-side lock at most.
+ *    (cpu_add_remove_lock ensures it.)
+ */
 static struct {
 	struct task_struct *active_writer;
-	struct mutex lock; /* Synchronizes accesses to refcount, */
-	/*
-	 * Also blocks the new readers during
-	 * an ongoing cpu hotplug operation.
-	 */
-	int refcount;
+	wait_queue_head_t sleeping_readers;
+	/* refcount = 0 means the writer owns the lock. */
+	atomic_t refcount;
 } cpu_hotplug;
 
 void __init cpu_hotplug_init(void)
 {
 	cpu_hotplug.active_writer = NULL;
-	mutex_init(&cpu_hotplug.lock);
-	cpu_hotplug.refcount = 0;
+	init_waitqueue_head(&cpu_hotplug.sleeping_readers);
+	atomic_set(&cpu_hotplug.refcount, 1);
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
@@ -50,10 +57,20 @@ void get_online_cpus(void)
 	might_sleep();
 	if (cpu_hotplug.active_writer == current)
 		return;
-	mutex_lock(&cpu_hotplug.lock);
-	cpu_hotplug.refcount++;
-	mutex_unlock(&cpu_hotplug.lock);
 
+	if (unlikely(!atomic_inc_not_zero(&cpu_hotplug.refcount))) {
+		DEFINE_WAIT(wait);
+
+		for (;;) {
+			prepare_to_wait(&cpu_hotplug.sleeping_readers, &wait,
+					TASK_UNINTERRUPTIBLE);
+			if (atomic_inc_not_zero(&cpu_hotplug.refcount))
+				break;
+			schedule();
+		}
+
+		finish_wait(&cpu_hotplug.sleeping_readers, &wait);
+	}
 }
 EXPORT_SYMBOL_GPL(get_online_cpus);
 
@@ -61,14 +78,27 @@ void put_online_cpus(void)
 {
 	if (cpu_hotplug.active_writer == current)
 		return;
-	mutex_lock(&cpu_hotplug.lock);
-	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
-		wake_up_process(cpu_hotplug.active_writer);
-	mutex_unlock(&cpu_hotplug.lock);
 
+	if (unlikely(atomic_dec_and_test(&cpu_hotplug.refcount))) {
+		smp_mb__after_atomic_dec();
+		BUG_ON(!cpu_hotplug.active_writer);
+		wake_up_process(cpu_hotplug.active_writer);
+	}
 }
 EXPORT_SYMBOL_GPL(put_online_cpus);
 
+int try_get_online_cpus(void)
+{
+	if (cpu_hotplug.active_writer == current)
+		return 1;
+
+	if (likely(atomic_inc_not_zero(&cpu_hotplug.refcount)))
+		return 1;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(try_get_online_cpus);
+
 #endif	/* CONFIG_HOTPLUG_CPU */
 
 /*
@@ -86,46 +116,41 @@ void cpu_maps_update_done(void)
 }
 
 /*
- * This ensures that the hotplug operation can begin only when the
- * refcount goes to zero.
+ * This ensures that the hotplug operation can begin only when
+ * there is no ongoing reader.
  *
  * Note that during a cpu-hotplug operation, the new readers, if any,
- * will be blocked by the cpu_hotplug.lock
+ * will be blocked and queued at cpu_hotplug.sleeping_readers.
  *
  * Since cpu_hotplug_begin() is always called after invoking
  * cpu_maps_update_begin(), we can be sure that only one writer is active.
  *
- * Note that theoretically, there is a possibility of a livelock:
- * - Refcount goes to zero, last reader wakes up the sleeping
- *   writer.
- * - Last reader unlocks the cpu_hotplug.lock.
- * - A new reader arrives at this moment, bumps up the refcount.
- * - The writer acquires the cpu_hotplug.lock finds the refcount
- *   non zero and goes to sleep again.
- *
- * However, this is very difficult to achieve in practice since
- * get_online_cpus() not an api which is called all that often.
- *
  */
 static void cpu_hotplug_begin(void)
 {
 	cpu_hotplug.active_writer = current;
+	smp_mb__before_atomic_dec();
+	atomic_dec(&cpu_hotplug.refcount);
 
 	for (;;) {
-		mutex_lock(&cpu_hotplug.lock);
-		if (likely(!cpu_hotplug.refcount))
+		set_current_state(TASK_UNINTERRUPTIBLE);
+		if (!atomic_read(&cpu_hotplug.refcount))
 			break;
-		__set_current_state(TASK_UNINTERRUPTIBLE);
-		mutex_unlock(&cpu_hotplug.lock);
 		schedule();
 	}
+
+	__set_current_state(TASK_RUNNING);
 }
 
 static void cpu_hotplug_done(void)
 {
 	cpu_hotplug.active_writer = NULL;
-	mutex_unlock(&cpu_hotplug.lock);
+	atomic_inc(&cpu_hotplug.refcount);
+
+	if (waitqueue_active(&cpu_hotplug.sleeping_readers))
+		wake_up(&cpu_hotplug.sleeping_readers);
 }
+
 /* Need to know about CPUs going up/down? */
 int __ref register_cpu_notifier(struct notifier_block *nb)
 {



  reply	other threads:[~2009-06-04  6:57 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-29  8:29 [PATCH 1/2] cpuhotplug: use rw_semaphore for cpu_hotplug Lai Jiangshan
2009-05-29 20:23 ` Andrew Morton
2009-05-29 21:07   ` Oleg Nesterov
2009-05-29 21:17     ` Oleg Nesterov
2009-06-01  1:04       ` Lai Jiangshan
2009-06-01  0:52     ` Lai Jiangshan
2009-06-01  2:22       ` Lai Jiangshan
2009-05-30  1:53 ` Paul E. McKenney
2009-05-30  4:37   ` Gautham R Shenoy
2009-06-04  6:58     ` Lai Jiangshan [this message]
2009-06-04 20:49       ` [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2 Oleg Nesterov
2009-06-05  1:32         ` Lai Jiangshan
2009-06-05  2:14           ` Oleg Nesterov
2009-06-05 15:37       ` Paul E. McKenney
2009-06-08  2:36         ` Lai Jiangshan
2009-06-08  4:19         ` Gautham R Shenoy
2009-06-08 14:25           ` Paul E. McKenney
2009-06-09 12:07             ` [PATCH -mm] cpuhotplug: introduce try_get_online_cpus() take 3 Lai Jiangshan
2009-06-09 19:34               ` Andrew Morton
2009-06-09 23:47                 ` Paul E. McKenney
2009-06-10  1:13                   ` [PATCH -mm resend] " Lai Jiangshan
2009-06-10  1:42                     ` Andrew Morton
2009-06-11  8:41                       ` Lai Jiangshan
2009-06-11 18:50                         ` Paul E. McKenney
2009-06-15  4:04                           ` Gautham R Shenoy
2009-06-10  0:57                 ` [PATCH -mm] " Lai Jiangshan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A27708C.6030703@cn.fujitsu.com \
    --to=laijs@cn.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=ego@in.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.