[RFC PATCH 0/4] Refcount Based Cpu-Hotplug Revisit.

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/4] Refcount Based Cpu-Hotplug Revisit.
@ 2007-10-16 10:33 Gautham R Shenoy
  2007-10-16 10:34 ` [RFC PATCH 1/4] Refcount Based Cpu-Hotplug Implementation Gautham R Shenoy
                   ` (4 more replies)
  0 siblings, 5 replies; 29+ messages in thread
From: Gautham R Shenoy @ 2007-10-16 10:33 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton
  Cc: linux-kernel, Srivatsa Vaddagiri, Rusty Russel, Dipankar Sarma,
	Oleg Nesterov, Ingo Molnar, Paul E McKenney

Hi, 
This patch series attempts to revisit the topic of 
cpu-hotplug locking model. 

Prior to this attempt, there were several different suggestions on 
how it should be implemented. 

The ones that were posted before were

a) Refcount + Waitqueue model:
   Here the threads that want to avoid a cpu hotplug operation while 
   they are operating in cpu-hotplug critical section, bump up the 
   reference to the global online cpu state. 
   The thread which wants to perform a cpu-hotplug, 
   blocks until the reference to the global online state goes
   to zero. Any threads which want to enter the cpu-hotplug critical
   section during an ongoing cpu-hotplug operatoin, are blocked using 
   a waitqueue.

   The advantange of this model was that it is along the lines of 
   the well known get/put model. Only that it allows sleeping of readers
   and writers.

   The disadvantage, as Andrew pointed out was that there do exist
   a whole bunch of lock_cpu_hotplug()'s whose existance is undocumented,
   and an approach like this will not improve such a situation.

b) Per Subsystem cpu-hotplug locks: Each subsystem which has cpu-hotplug
   critical data, uses a lock to protect that data. Such a subsystem
   needs to subscribe to the cpu-hotplug notification, especially the
   CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE events which are sent before
   and and after a cpu-hotplug operation. While handling these events
   respectively, the subsystem lock is taken or released. 

   The advantage this model offered was that lock was associated with the
   data, which made easy to understand the purpose of locking. 

   The disadvantage was that any cpu-hotplug aware function, could 
   not be called from a cpu-hotplug callback path, since we would have 
   acquired the subsystem lock during CPU_LOCK_ACQUIRE and attempting
   to reacquire it would result in a deadlock. 
   The case which pointed this limitation out was the implementation of
   synchronize_sched in preemptible rcu.

c) Freezer based cpu-hotplug: 
   The idea here was to freeze the system using the process freezer
   technique which is being used for suspend/hibernate purpose, before
   performing a cpu-hotplug operation. This would ensure that none of
   the kernel threads are accessing any of the cpu-hotplug critical
   data, because they are frozen at well known points. 

   This would have helped to remove all kinds of locks because when a 
   thread is accessing a cpu-hotplug critical data, it meant that the
   system was not frozen and hence there would be no cpu-hotplug 
   operation untill the thread either voluntarily calls try_to_freeze 
   or returns out of the kernel.

   The disadvantage of this approach was that any kind of dependencies
   between threads might call the freezer to fail. For eg, thread A is
   waiting for thread B to accomplish something, but thread B is already
   frozen, leading to a freeze failure. There could be other subtle
   races which might be difficult to track.

Some time in May 2007, Linus suggested using the refcount model, and
this patch series simplifies and reimplements the Refcount + waitqueue
model, based on the discussions in the past and inputs from
Vatsa and Rusty.

Patch 1/4: Implements the core refcount + waitqueue model.
Patch 2/4: Replaces all the lock_cpu_hotplug/unlock_cpu_hotplug instances
	   with get_online_cpus()/put_online_cpus()
Patch 3/4: Replaces the subsystem mutexes (we do have three of them now, 
           in sched.c, slab.c and workqueue.c) with get_online_cpus,
	   put_online_cpus.
Patch 4/4: Eliminates the CPU_DEAD and CPU_UP_CANCELLED event handling
 	   from workqueue.c

The patch series has survived an overnight test with kernbench on i386.
and has been tested with Paul Mckenney's latest preemptible rcu code.

Awaiting thy feedback!

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC PATCH 1/4] Refcount Based Cpu-Hotplug Implementation
  2007-10-16 10:33 [RFC PATCH 0/4] Refcount Based Cpu-Hotplug Revisit Gautham R Shenoy
@ 2007-10-16 10:34 ` Gautham R Shenoy
  2007-10-17  0:47   ` Rusty Russell
  2007-10-17 10:53   ` Paul Jackson
  2007-10-16 10:35 ` [RFC PATCH 2/4] Rename lock_cpu_hotplug to get_online_cpus Gautham R Shenoy
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 29+ messages in thread
From: Gautham R Shenoy @ 2007-10-16 10:34 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton
  Cc: linux-kernel, Srivatsa Vaddagiri, Rusty Russel, Dipankar Sarma,
	Oleg Nesterov, Ingo Molnar, Paul E McKenney

This patch implements a Refcount + Waitqueue based model for
cpu-hotplug. 

A thread which wants to prevent cpu-hotplug, will now bump up a 
global refcount and the thread which wants to perform a cpu-hotplug
operation will block till the global refcount goes to zero.

The readers, if any, during an ongoing cpu-hotplug operation are blocked
until the cpu-hotplug operation is over.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---
 include/linux/cpu.h |    2 +
 init/main.c         |    1 
 kernel/cpu.c        |   91 ++++++++++++++++++++++++++++++++++++----------------
 3 files changed, 67 insertions(+), 27 deletions(-)

Index: linux-2.6.23/init/main.c
===================================================================
--- linux-2.6.23.orig/init/main.c
+++ linux-2.6.23/init/main.c
@@ -614,6 +614,7 @@ asmlinkage void __init start_kernel(void
 	vfs_caches_init_early();
 	cpuset_init_early();
 	mem_init();
+	cpu_hotplug_init();
 	kmem_cache_init();
 	setup_per_cpu_pageset();
 	numa_policy_init();
Index: linux-2.6.23/include/linux/cpu.h
===================================================================
--- linux-2.6.23.orig/include/linux/cpu.h
+++ linux-2.6.23/include/linux/cpu.h
@@ -97,6 +97,7 @@ static inline void cpuhotplug_mutex_unlo
 	mutex_unlock(cpu_hp_mutex);
 }
 
+extern void cpu_hotplug_init(void);
 extern void lock_cpu_hotplug(void);
 extern void unlock_cpu_hotplug(void);
 #define hotcpu_notifier(fn, pri) {				\
@@ -116,6 +117,7 @@ static inline void cpuhotplug_mutex_lock
 static inline void cpuhotplug_mutex_unlock(struct mutex *cpu_hp_mutex)
 { }
 
+#define cpu_hotplug_init()     do { } while (0)
 #define lock_cpu_hotplug()	do { } while (0)
 #define unlock_cpu_hotplug()	do { } while (0)
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
Index: linux-2.6.23/kernel/cpu.c
===================================================================
--- linux-2.6.23.orig/kernel/cpu.c
+++ linux-2.6.23/kernel/cpu.c
@@ -17,7 +17,6 @@
 
 /* This protects CPUs going up and down... */
 static DEFINE_MUTEX(cpu_add_remove_lock);
-static DEFINE_MUTEX(cpu_bitmask_lock);
 
 static __cpuinitdata RAW_NOTIFIER_HEAD(cpu_chain);
 
@@ -26,45 +25,83 @@ static __cpuinitdata RAW_NOTIFIER_HEAD(c
  */
 static int cpu_hotplug_disabled;
 
+static struct {
+	struct task_struct *active_writer;
+	struct mutex lock; /* Synchronizes accesses to refcount, */
+	/*
+	 * Also blocks the new readers during
+	 * an ongoing cpu hotplug operation.
+	 */
+	int refcount;
+	struct completion readers_done;
+} cpu_hotplug;
+
+#define writer_exists() (cpu_hotplug.active_writer != NULL)
+
 #ifdef CONFIG_HOTPLUG_CPU
 
-/* Crappy recursive lock-takers in cpufreq! Complain loudly about idiots */
-static struct task_struct *recursive;
-static int recursive_depth;
+void __init cpu_hotplug_init(void)
+{
+	cpu_hotplug.active_writer = NULL;
+	mutex_init(&cpu_hotplug.lock);
+	cpu_hotplug.refcount = 0;
+	init_completion(&cpu_hotplug.readers_done);
+}
 
 void lock_cpu_hotplug(void)
 {
-	struct task_struct *tsk = current;
-
-	if (tsk == recursive) {
-		static int warnings = 10;
-		if (warnings) {
-			printk(KERN_ERR "Lukewarm IQ detected in hotplug locking\n");
-			WARN_ON(1);
-			warnings--;
-		}
-		recursive_depth++;
+	might_sleep();
+	if (cpu_hotplug.active_writer == current)
 		return;
-	}
-	mutex_lock(&cpu_bitmask_lock);
-	recursive = tsk;
+	mutex_lock(&cpu_hotplug.lock);
+	cpu_hotplug.refcount++;
+	mutex_unlock(&cpu_hotplug.lock);
+
 }
 EXPORT_SYMBOL_GPL(lock_cpu_hotplug);
 
 void unlock_cpu_hotplug(void)
 {
-	WARN_ON(recursive != current);
-	if (recursive_depth) {
-		recursive_depth--;
+	if (cpu_hotplug.active_writer == current)
 		return;
-	}
-	recursive = NULL;
-	mutex_unlock(&cpu_bitmask_lock);
+	mutex_lock(&cpu_hotplug.lock);
+	cpu_hotplug.refcount--;
+
+	if (unlikely(writer_exists()) && !cpu_hotplug.refcount)
+		complete(&cpu_hotplug.readers_done);
+
+	mutex_unlock(&cpu_hotplug.lock);
+
 }
 EXPORT_SYMBOL_GPL(unlock_cpu_hotplug);
 
 #endif	/* CONFIG_HOTPLUG_CPU */
 
+/*
+ * This ensures that the hotplug operation can begin only when the
+ * refcount goes to zero.
+ *
+ * Note that during a cpu-hotplug operation, the new readers, if any,
+ * will be blocked by the cpu_hotplug.lock
+ */
+static void cpu_hotplug_begin(void)
+{
+	mutex_lock(&cpu_hotplug.lock);
+	cpu_hotplug.active_writer = current;
+	while (cpu_hotplug.refcount) {
+		mutex_unlock(&cpu_hotplug.lock);
+		wait_for_completion(&cpu_hotplug.readers_done);
+		mutex_lock(&cpu_hotplug.lock);
+	}
+
+}
+
+static void cpu_hotplug_done(void)
+{
+	cpu_hotplug.active_writer = NULL;
+	mutex_unlock(&cpu_hotplug.lock);
+}
+
 /* Need to know about CPUs going up/down? */
 int __cpuinit register_cpu_notifier(struct notifier_block *nb)
 {
@@ -147,6 +184,7 @@ static int _cpu_down(unsigned int cpu, i
 	if (!cpu_online(cpu))
 		return -EINVAL;
 
+	cpu_hotplug_begin();
 	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_ACQUIRE, hcpu);
 	err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod,
 					hcpu, -1, &nr_calls);
@@ -166,9 +204,7 @@ static int _cpu_down(unsigned int cpu, i
 	cpu_clear(cpu, tmp);
 	set_cpus_allowed(current, tmp);
 
-	mutex_lock(&cpu_bitmask_lock);
 	p = __stop_machine_run(take_cpu_down, &tcd_param, cpu);
-	mutex_unlock(&cpu_bitmask_lock);
 
 	if (IS_ERR(p) || cpu_online(cpu)) {
 		/* CPU didn't die: tell everyone.  Can't complain. */
@@ -203,6 +239,7 @@ out_allowed:
 	set_cpus_allowed(current, old_allowed);
 out_release:
 	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_RELEASE, hcpu);
+	cpu_hotplug_done();
 	return err;
 }
 
@@ -231,6 +268,7 @@ static int __cpuinit _cpu_up(unsigned in
 	if (cpu_online(cpu) || !cpu_present(cpu))
 		return -EINVAL;
 
+	cpu_hotplug_begin();
 	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_ACQUIRE, hcpu);
 	ret = __raw_notifier_call_chain(&cpu_chain, CPU_UP_PREPARE | mod, hcpu,
 							-1, &nr_calls);
@@ -243,9 +281,7 @@ static int __cpuinit _cpu_up(unsigned in
 	}
 
 	/* Arch-specific enabling code. */
-	mutex_lock(&cpu_bitmask_lock);
 	ret = __cpu_up(cpu);
-	mutex_unlock(&cpu_bitmask_lock);
 	if (ret != 0)
 		goto out_notify;
 	BUG_ON(!cpu_online(cpu));
@@ -258,6 +294,7 @@ out_notify:
 		__raw_notifier_call_chain(&cpu_chain,
 				CPU_UP_CANCELED | mod, hcpu, nr_calls, NULL);
 	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_RELEASE, hcpu);
+	cpu_hotplug_done();
 
 	return ret;
 }
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC PATCH 2/4] Rename lock_cpu_hotplug to get_online_cpus
  2007-10-16 10:33 [RFC PATCH 0/4] Refcount Based Cpu-Hotplug Revisit Gautham R Shenoy
  2007-10-16 10:34 ` [RFC PATCH 1/4] Refcount Based Cpu-Hotplug Implementation Gautham R Shenoy
@ 2007-10-16 10:35 ` Gautham R Shenoy
  2007-10-17 16:13   ` Nathan Lynch
  2007-10-16 10:36 ` [RFC PATCH 3/4] Replace per-subsystem mutexes with get_online_cpus Gautham R Shenoy
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Gautham R Shenoy @ 2007-10-16 10:35 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton
  Cc: linux-kernel, Srivatsa Vaddagiri, Rusty Russel, Dipankar Sarma,
	Oleg Nesterov, Ingo Molnar, Paul E McKenney

Replace all lock_cpu_hotplug/unlock_cpu_hotplug from the kernel and use 
get_online_cpus and put_online_cpus instead as it highlights
the refcount semantics in these operations.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---
 arch/i386/kernel/cpu/mtrr/main.c             |    8 ++++----
 arch/i386/kernel/microcode.c                 |   16 ++++++++--------
 arch/mips/kernel/mips-mt-fpaff.c             |   10 +++++-----
 arch/powerpc/platforms/pseries/hotplug-cpu.c |    8 ++++----
 arch/powerpc/platforms/pseries/rtasd.c       |    8 ++++----
 drivers/lguest/core.c                        |    8 ++++----
 drivers/s390/char/sclp_config.c              |    4 ++--
 include/linux/cpu.h                          |   10 +++++-----
 kernel/cpu.c                                 |    8 ++++----
 kernel/cpuset.c                              |   14 +++++++-------
 kernel/rcutorture.c                          |    6 +++---
 kernel/stop_machine.c                        |    4 ++--
 net/core/flow.c                              |    4 ++--
 13 files changed, 54 insertions(+), 54 deletions(-)

Index: linux-2.6.23/include/linux/cpu.h
===================================================================
--- linux-2.6.23.orig/include/linux/cpu.h
+++ linux-2.6.23/include/linux/cpu.h
@@ -98,8 +98,8 @@ static inline void cpuhotplug_mutex_unlo
 }
 
 extern void cpu_hotplug_init(void);
-extern void lock_cpu_hotplug(void);
-extern void unlock_cpu_hotplug(void);
+extern void get_online_cpus(void);
+extern void put_online_cpus(void);
 #define hotcpu_notifier(fn, pri) {				\
 	static struct notifier_block fn##_nb =			\
 		{ .notifier_call = fn, .priority = pri };	\
@@ -117,9 +117,9 @@ static inline void cpuhotplug_mutex_lock
 static inline void cpuhotplug_mutex_unlock(struct mutex *cpu_hp_mutex)
 { }
 
-#define cpu_hotplug_init()     do { } while (0)
-#define lock_cpu_hotplug()	do { } while (0)
-#define unlock_cpu_hotplug()	do { } while (0)
+#define cpu_hotplug_init()		do { } while (0)
+#define get_online_cpus()		do { } while (0)
+#define put_online_cpus()		do { } while (0)
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
Index: linux-2.6.23/kernel/cpu.c
===================================================================
--- linux-2.6.23.orig/kernel/cpu.c
+++ linux-2.6.23/kernel/cpu.c
@@ -48,7 +48,7 @@ void __init cpu_hotplug_init(void)
 	init_completion(&cpu_hotplug.readers_done);
 }
 
-void lock_cpu_hotplug(void)
+void get_online_cpus(void)
 {
 	might_sleep();
 	if (cpu_hotplug.active_writer == current)
@@ -58,9 +58,9 @@ void lock_cpu_hotplug(void)
 	mutex_unlock(&cpu_hotplug.lock);
 
 }
-EXPORT_SYMBOL_GPL(lock_cpu_hotplug);
+EXPORT_SYMBOL_GPL(get_online_cpus);
 
-void unlock_cpu_hotplug(void)
+void put_online_cpus(void)
 {
 	if (cpu_hotplug.active_writer == current)
 		return;
@@ -73,7 +73,7 @@ void unlock_cpu_hotplug(void)
 	mutex_unlock(&cpu_hotplug.lock);
 
 }
-EXPORT_SYMBOL_GPL(unlock_cpu_hotplug);
+EXPORT_SYMBOL_GPL(put_online_cpus);
 
 #endif	/* CONFIG_HOTPLUG_CPU */
 
Index: linux-2.6.23/kernel/stop_machine.c
===================================================================
--- linux-2.6.23.orig/kernel/stop_machine.c
+++ linux-2.6.23/kernel/stop_machine.c
@@ -203,13 +203,13 @@ int stop_machine_run(int (*fn)(void *), 
 	int ret;
 
 	/* No CPUs can come up or down during this. */
-	lock_cpu_hotplug();
+	get_online_cpus();
 	p = __stop_machine_run(fn, data, cpu);
 	if (!IS_ERR(p))
 		ret = kthread_stop(p);
 	else
 		ret = PTR_ERR(p);
-	unlock_cpu_hotplug();
+	put_online_cpus();
 
 	return ret;
 }
Index: linux-2.6.23/arch/i386/kernel/cpu/mtrr/main.c
===================================================================
--- linux-2.6.23.orig/arch/i386/kernel/cpu/mtrr/main.c
+++ linux-2.6.23/arch/i386/kernel/cpu/mtrr/main.c
@@ -351,7 +351,7 @@ int mtrr_add_page(unsigned long base, un
 	replace = -1;
 
 	/* No CPU hotplug when we change MTRR entries */
-	lock_cpu_hotplug();
+	get_online_cpus();
 	/*  Search for existing MTRR  */
 	mutex_lock(&mtrr_mutex);
 	for (i = 0; i < num_var_ranges; ++i) {
@@ -407,7 +407,7 @@ int mtrr_add_page(unsigned long base, un
 	error = i;
  out:
 	mutex_unlock(&mtrr_mutex);
-	unlock_cpu_hotplug();
+	put_online_cpus();
 	return error;
 }
 
@@ -497,7 +497,7 @@ int mtrr_del_page(int reg, unsigned long
 
 	max = num_var_ranges;
 	/* No CPU hotplug when we change MTRR entries */
-	lock_cpu_hotplug();
+	get_online_cpus();
 	mutex_lock(&mtrr_mutex);
 	if (reg < 0) {
 		/*  Search for existing MTRR  */
@@ -538,7 +538,7 @@ int mtrr_del_page(int reg, unsigned long
 	error = reg;
  out:
 	mutex_unlock(&mtrr_mutex);
-	unlock_cpu_hotplug();
+	put_online_cpus();
 	return error;
 }
 /**
Index: linux-2.6.23/arch/i386/kernel/microcode.c
===================================================================
--- linux-2.6.23.orig/arch/i386/kernel/microcode.c
+++ linux-2.6.23/arch/i386/kernel/microcode.c
@@ -436,7 +436,7 @@ static ssize_t microcode_write (struct f
 		return -EINVAL;
 	}
 
-	lock_cpu_hotplug();
+	get_online_cpus();
 	mutex_lock(&microcode_mutex);
 
 	user_buffer = (void __user *) buf;
@@ -447,7 +447,7 @@ static ssize_t microcode_write (struct f
 		ret = (ssize_t)len;
 
 	mutex_unlock(&microcode_mutex);
-	unlock_cpu_hotplug();
+	put_online_cpus();
 
 	return ret;
 }
@@ -658,14 +658,14 @@ static ssize_t reload_store(struct sys_d
 
 		old = current->cpus_allowed;
 
-		lock_cpu_hotplug();
+		get_online_cpus();
 		set_cpus_allowed(current, cpumask_of_cpu(cpu));
 
 		mutex_lock(&microcode_mutex);
 		if (uci->valid)
 			err = cpu_request_microcode(cpu);
 		mutex_unlock(&microcode_mutex);
-		unlock_cpu_hotplug();
+		put_online_cpus();
 		set_cpus_allowed(current, old);
 	}
 	if (err)
@@ -817,9 +817,9 @@ static int __init microcode_init (void)
 		return PTR_ERR(microcode_pdev);
 	}
 
-	lock_cpu_hotplug();
+	get_online_cpus();
 	error = sysdev_driver_register(&cpu_sysdev_class, &mc_sysdev_driver);
-	unlock_cpu_hotplug();
+	put_online_cpus();
 	if (error) {
 		microcode_dev_exit();
 		platform_device_unregister(microcode_pdev);
@@ -839,9 +839,9 @@ static void __exit microcode_exit (void)
 
 	unregister_hotcpu_notifier(&mc_cpu_notifier);
 
-	lock_cpu_hotplug();
+	get_online_cpus();
 	sysdev_driver_unregister(&cpu_sysdev_class, &mc_sysdev_driver);
-	unlock_cpu_hotplug();
+	put_online_cpus();
 
 	platform_device_unregister(microcode_pdev);
 }
Index: linux-2.6.23/arch/powerpc/platforms/pseries/hotplug-cpu.c
===================================================================
--- linux-2.6.23.orig/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ linux-2.6.23/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -151,7 +151,7 @@ static int pseries_add_processor(struct 
 	for (i = 0; i < nthreads; i++)
 		cpu_set(i, tmp);
 
-	lock_cpu_hotplug();
+	get_online_cpus();
 
 	BUG_ON(!cpus_subset(cpu_present_map, cpu_possible_map));
 
@@ -188,7 +188,7 @@ static int pseries_add_processor(struct 
 	}
 	err = 0;
 out_unlock:
-	unlock_cpu_hotplug();
+	put_online_cpus();
 	return err;
 }
 
@@ -209,7 +209,7 @@ static void pseries_remove_processor(str
 
 	nthreads = len / sizeof(u32);
 
-	lock_cpu_hotplug();
+	get_online_cpus();
 	for (i = 0; i < nthreads; i++) {
 		for_each_present_cpu(cpu) {
 			if (get_hard_smp_processor_id(cpu) != intserv[i])
@@ -223,7 +223,7 @@ static void pseries_remove_processor(str
 			printk(KERN_WARNING "Could not find cpu to remove "
 			       "with physical id 0x%x\n", intserv[i]);
 	}
-	unlock_cpu_hotplug();
+	put_online_cpus();
 }
 
 static int pseries_smp_notifier(struct notifier_block *nb,
Index: linux-2.6.23/arch/powerpc/platforms/pseries/rtasd.c
===================================================================
--- linux-2.6.23.orig/arch/powerpc/platforms/pseries/rtasd.c
+++ linux-2.6.23/arch/powerpc/platforms/pseries/rtasd.c
@@ -382,7 +382,7 @@ static void do_event_scan_all_cpus(long 
 {
 	int cpu;
 
-	lock_cpu_hotplug();
+	get_online_cpus();
 	cpu = first_cpu(cpu_online_map);
 	for (;;) {
 		set_cpus_allowed(current, cpumask_of_cpu(cpu));
@@ -390,15 +390,15 @@ static void do_event_scan_all_cpus(long 
 		set_cpus_allowed(current, CPU_MASK_ALL);
 
 		/* Drop hotplug lock, and sleep for the specified delay */
-		unlock_cpu_hotplug();
+		put_online_cpus();
 		msleep_interruptible(delay);
-		lock_cpu_hotplug();
+		get_online_cpus();
 
 		cpu = next_cpu(cpu, cpu_online_map);
 		if (cpu == NR_CPUS)
 			break;
 	}
-	unlock_cpu_hotplug();
+	put_online_cpus();
 }
 
 static int rtasd(void *unused)
Index: linux-2.6.23/drivers/s390/char/sclp_config.c
===================================================================
--- linux-2.6.23.orig/drivers/s390/char/sclp_config.c
+++ linux-2.6.23/drivers/s390/char/sclp_config.c
@@ -29,12 +29,12 @@ static void sclp_cpu_capability_notify(s
 	struct sys_device *sysdev;
 
 	printk(KERN_WARNING TAG "cpu capability changed.\n");
-	lock_cpu_hotplug();
+	get_online_cpus();
 	for_each_online_cpu(cpu) {
 		sysdev = get_cpu_sysdev(cpu);
 		kobject_uevent(&sysdev->kobj, KOBJ_CHANGE);
 	}
-	unlock_cpu_hotplug();
+	put_online_cpus();
 }
 
 static void sclp_conf_receiver_fn(struct evbuf_header *evbuf)
Index: linux-2.6.23/kernel/rcutorture.c
===================================================================
--- linux-2.6.23.orig/kernel/rcutorture.c
+++ linux-2.6.23/kernel/rcutorture.c
@@ -726,11 +726,11 @@ static void rcu_torture_shuffle_tasks(vo
 	cpumask_t tmp_mask = CPU_MASK_ALL;
 	int i;
 
-	lock_cpu_hotplug();
+	get_online_cpus();
 
 	/* No point in shuffling if there is only one online CPU (ex: UP) */
 	if (num_online_cpus() == 1) {
-		unlock_cpu_hotplug();
+		put_online_cpus();
 		return;
 	}
 
@@ -762,7 +762,7 @@ static void rcu_torture_shuffle_tasks(vo
 	else
 		rcu_idle_cpu--;
 
-	unlock_cpu_hotplug();
+	put_online_cpus();
 }
 
 /* Shuffle tasks across CPUs, with the intent of allowing each CPU in the
Index: linux-2.6.23/net/core/flow.c
===================================================================
--- linux-2.6.23.orig/net/core/flow.c
+++ linux-2.6.23/net/core/flow.c
@@ -296,7 +296,7 @@ void flow_cache_flush(void)
 	static DEFINE_MUTEX(flow_flush_sem);
 
 	/* Don't want cpus going down or up during this. */
-	lock_cpu_hotplug();
+	get_online_cpus();
 	mutex_lock(&flow_flush_sem);
 	atomic_set(&info.cpuleft, num_online_cpus());
 	init_completion(&info.completion);
@@ -308,7 +308,7 @@ void flow_cache_flush(void)
 
 	wait_for_completion(&info.completion);
 	mutex_unlock(&flow_flush_sem);
-	unlock_cpu_hotplug();
+	put_online_cpus();
 }
 
 static void __devinit flow_cache_cpu_prepare(int cpu)
Index: linux-2.6.23/arch/mips/kernel/mips-mt-fpaff.c
===================================================================
--- linux-2.6.23.orig/arch/mips/kernel/mips-mt-fpaff.c
+++ linux-2.6.23/arch/mips/kernel/mips-mt-fpaff.c
@@ -58,13 +58,13 @@ asmlinkage long mipsmt_sys_sched_setaffi
 	if (copy_from_user(&new_mask, user_mask_ptr, sizeof(new_mask)))
 		return -EFAULT;
 
-	lock_cpu_hotplug();
+	get_online_cpus();
 	read_lock(&tasklist_lock);
 
 	p = find_process_by_pid(pid);
 	if (!p) {
 		read_unlock(&tasklist_lock);
-		unlock_cpu_hotplug();
+		put_online_cpus();
 		return -ESRCH;
 	}
 
@@ -106,7 +106,7 @@ asmlinkage long mipsmt_sys_sched_setaffi
 
 out_unlock:
 	put_task_struct(p);
-	unlock_cpu_hotplug();
+	put_online_cpus();
 	return retval;
 }
 
@@ -125,7 +125,7 @@ asmlinkage long mipsmt_sys_sched_getaffi
 	if (len < real_len)
 		return -EINVAL;
 
-	lock_cpu_hotplug();
+	get_online_cpus();
 	read_lock(&tasklist_lock);
 
 	retval = -ESRCH;
@@ -140,7 +140,7 @@ asmlinkage long mipsmt_sys_sched_getaffi
 
 out_unlock:
 	read_unlock(&tasklist_lock);
-	unlock_cpu_hotplug();
+	put_online_cpus();
 	if (retval)
 		return retval;
 	if (copy_to_user(user_mask_ptr, &mask, real_len))
Index: linux-2.6.23/drivers/lguest/core.c
===================================================================
--- linux-2.6.23.orig/drivers/lguest/core.c
+++ linux-2.6.23/drivers/lguest/core.c
@@ -735,7 +735,7 @@ static int __init init(void)
 
 	/* We don't need the complexity of CPUs coming and going while we're
 	 * doing this. */
-	lock_cpu_hotplug();
+	get_online_cpus();
 	if (cpu_has_pge) { /* We have a broader idea of "global". */
 		/* Remember that this was originally set (for cleanup). */
 		cpu_had_pge = 1;
@@ -745,7 +745,7 @@ static int __init init(void)
 		/* Turn off the feature in the global feature set. */
 		clear_bit(X86_FEATURE_PGE, boot_cpu_data.x86_capability);
 	}
-	unlock_cpu_hotplug();
+	put_online_cpus();
 
 	/* All good! */
 	return 0;
@@ -759,13 +759,13 @@ static void __exit fini(void)
 	unmap_switcher();
 
 	/* If we had PGE before we started, turn it back on now. */
-	lock_cpu_hotplug();
+	get_online_cpus();
 	if (cpu_had_pge) {
 		set_bit(X86_FEATURE_PGE, boot_cpu_data.x86_capability);
 		/* adjust_pge's argument "1" means set PGE. */
 		on_each_cpu(adjust_pge, (void *)1, 0, 1);
 	}
-	unlock_cpu_hotplug();
+	put_online_cpus();
 }
 
 /* The Host side of lguest can be a module.  This is a nice way for people to
Index: linux-2.6.23/kernel/cpuset.c
===================================================================
--- linux-2.6.23.orig/kernel/cpuset.c
+++ linux-2.6.23/kernel/cpuset.c
@@ -536,10 +536,10 @@ static int cpusets_overlap(struct cpuset
  *
  * Call with cgroup_mutex held.  May take callback_mutex during
  * call due to the kfifo_alloc() and kmalloc() calls.  May nest
- * a call to the lock_cpu_hotplug()/unlock_cpu_hotplug() pair.
+ * a call to the get_online_cpus()/put_online_cpus() pair.
  * Must not be called holding callback_mutex, because we must not
- * call lock_cpu_hotplug() while holding callback_mutex.  Elsewhere
- * the kernel nests callback_mutex inside lock_cpu_hotplug() calls.
+ * call get_online_cpus() while holding callback_mutex.  Elsewhere
+ * the kernel nests callback_mutex inside get_online_cpus() calls.
  * So the reverse nesting would risk an ABBA deadlock.
  *
  * The three key local variables below are:
@@ -673,9 +673,9 @@ restart:
 
 rebuild:
 	/* Have scheduler rebuild sched domains */
-	lock_cpu_hotplug();
+	get_online_cpus();
 	partition_sched_domains(ndoms, doms);
-	unlock_cpu_hotplug();
+	put_online_cpus();
 
 done:
 	if (q && !IS_ERR(q))
@@ -1503,10 +1503,10 @@ static struct cgroup_subsys_state *cpuse
  *
  * If the cpuset being removed has its flag 'sched_load_balance'
  * enabled, then simulate turning sched_load_balance off, which
- * will call rebuild_sched_domains().  The lock_cpu_hotplug()
+ * will call rebuild_sched_domains().  The get_online_cpus()
  * call in rebuild_sched_domains() must not be made while holding
  * callback_mutex.  Elsewhere the kernel nests callback_mutex inside
- * lock_cpu_hotplug() calls.  So the reverse nesting would risk an
+ * get_online_cpus() calls.  So the reverse nesting would risk an
  * ABBA deadlock.
  */
 
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC PATCH 3/4] Replace per-subsystem mutexes with get_online_cpus
  2007-10-16 10:33 [RFC PATCH 0/4] Refcount Based Cpu-Hotplug Revisit Gautham R Shenoy
  2007-10-16 10:34 ` [RFC PATCH 1/4] Refcount Based Cpu-Hotplug Implementation Gautham R Shenoy
  2007-10-16 10:35 ` [RFC PATCH 2/4] Rename lock_cpu_hotplug to get_online_cpus Gautham R Shenoy
@ 2007-10-16 10:36 ` Gautham R Shenoy
  2007-10-21 11:39   ` Oleg Nesterov
  2007-10-16 10:37 ` [RFC PATCH 4/4] Remove CPU_DEAD/CPU_UP_CANCELLED handling from workqueue.c Gautham R Shenoy
  2007-10-16 17:20 ` [RFC PATCH 0/4] Refcount Based Cpu-Hotplug Revisit Linus Torvalds
  4 siblings, 1 reply; 29+ messages in thread
From: Gautham R Shenoy @ 2007-10-16 10:36 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton
  Cc: linux-kernel, Srivatsa Vaddagiri, Rusty Russel, Dipankar Sarma,
	Oleg Nesterov, Ingo Molnar, Paul E McKenney

This patch converts the known per-subsystem cpu_hotplug mutexes to 
get_online_cpus put_online_cpus. 
It also eliminates the CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE hotplug 
notification events.

Signed-off-by: Gautham  R Shenoy <ego@in.ibm.com>
---
 include/linux/notifier.h |    4 +---
 kernel/cpu.c             |    4 ----
 kernel/sched.c           |   27 ++++++++++-----------------
 kernel/workqueue.c       |   31 ++++++++++++++++---------------
 mm/slab.c                |   18 +++++++++++-------
 5 files changed, 38 insertions(+), 46 deletions(-)

Index: linux-2.6.23/kernel/sched.c
===================================================================
--- linux-2.6.23.orig/kernel/sched.c
+++ linux-2.6.23/kernel/sched.c
@@ -360,7 +360,6 @@ struct rq {
 };
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
-static DEFINE_MUTEX(sched_hotcpu_mutex);
 
 static inline void check_preempt_curr(struct rq *rq, struct task_struct *p)
 {
@@ -4337,13 +4336,13 @@ long sched_setaffinity(pid_t pid, cpumas
 	struct task_struct *p;
 	int retval;
 
-	mutex_lock(&sched_hotcpu_mutex);
+	get_online_cpus();
 	read_lock(&tasklist_lock);
 
 	p = find_process_by_pid(pid);
 	if (!p) {
 		read_unlock(&tasklist_lock);
-		mutex_unlock(&sched_hotcpu_mutex);
+		put_online_cpus();
 		return -ESRCH;
 	}
 
@@ -4370,7 +4369,7 @@ long sched_setaffinity(pid_t pid, cpumas
 
 out_unlock:
 	put_task_struct(p);
-	mutex_unlock(&sched_hotcpu_mutex);
+	put_online_cpus();
 	return retval;
 }
 
@@ -4427,7 +4426,7 @@ long sched_getaffinity(pid_t pid, cpumas
 	struct task_struct *p;
 	int retval;
 
-	mutex_lock(&sched_hotcpu_mutex);
+	get_online_cpus();
 	read_lock(&tasklist_lock);
 
 	retval = -ESRCH;
@@ -4443,7 +4442,7 @@ long sched_getaffinity(pid_t pid, cpumas
 
 out_unlock:
 	read_unlock(&tasklist_lock);
-	mutex_unlock(&sched_hotcpu_mutex);
+	put_online_cpus();
 
 	return retval;
 }
@@ -5342,9 +5341,6 @@ migration_call(struct notifier_block *nf
 	struct rq *rq;
 
 	switch (action) {
-	case CPU_LOCK_ACQUIRE:
-		mutex_lock(&sched_hotcpu_mutex);
-		break;
 
 	case CPU_UP_PREPARE:
 	case CPU_UP_PREPARE_FROZEN:
@@ -5398,7 +5394,7 @@ migration_call(struct notifier_block *nf
 		BUG_ON(rq->nr_running != 0);
 
 		/* No need to migrate the tasks: it was best-effort if
-		 * they didn't take sched_hotcpu_mutex.  Just wake up
+		 * they didn't do a get_online_cpus().  Just wake up
 		 * the requestors. */
 		spin_lock_irq(&rq->lock);
 		while (!list_empty(&rq->migration_queue)) {
@@ -5412,9 +5408,6 @@ migration_call(struct notifier_block *nf
 		spin_unlock_irq(&rq->lock);
 		break;
 #endif
-	case CPU_LOCK_RELEASE:
-		mutex_unlock(&sched_hotcpu_mutex);
-		break;
 	}
 	return NOTIFY_OK;
 }
@@ -6338,10 +6331,10 @@ static int arch_reinit_sched_domains(voi
 {
 	int err;
 
-	mutex_lock(&sched_hotcpu_mutex);
+	get_online_cpus();
 	detach_destroy_domains(&cpu_online_map);
 	err = arch_init_sched_domains(&cpu_online_map);
-	mutex_unlock(&sched_hotcpu_mutex);
+	put_online_cpus();
 
 	return err;
 }
@@ -6452,12 +6445,12 @@ void __init sched_init_smp(void)
 {
 	cpumask_t non_isolated_cpus;
 
-	mutex_lock(&sched_hotcpu_mutex);
+	get_online_cpus();
 	arch_init_sched_domains(&cpu_online_map);
 	cpus_andnot(non_isolated_cpus, cpu_possible_map, cpu_isolated_map);
 	if (cpus_empty(non_isolated_cpus))
 		cpu_set(smp_processor_id(), non_isolated_cpus);
-	mutex_unlock(&sched_hotcpu_mutex);
+	put_online_cpus();
 	/* XXX: Theoretical race here - CPU may be hotplugged now */
 	hotcpu_notifier(update_sched_domains, 0);
 
Index: linux-2.6.23/mm/slab.c
===================================================================
--- linux-2.6.23.orig/mm/slab.c
+++ linux-2.6.23/mm/slab.c
@@ -730,8 +730,7 @@ static inline void init_lock_keys(void)
 #endif
 
 /*
- * 1. Guard access to the cache-chain.
- * 2. Protect sanity of cpu_online_map against cpu hotplug events
+ * Guard access to the cache-chain.
  */
 static DEFINE_MUTEX(cache_chain_mutex);
 static struct list_head cache_chain;
@@ -1331,12 +1330,11 @@ static int __cpuinit cpuup_callback(stru
 	int err = 0;
 
 	switch (action) {
-	case CPU_LOCK_ACQUIRE:
-		mutex_lock(&cache_chain_mutex);
-		break;
 	case CPU_UP_PREPARE:
 	case CPU_UP_PREPARE_FROZEN:
+		mutex_lock(&cache_chain_mutex);
 		err = cpuup_prepare(cpu);
+		mutex_unlock(&cache_chain_mutex);
 		break;
 	case CPU_ONLINE:
 	case CPU_ONLINE_FROZEN:
@@ -1373,9 +1371,8 @@ static int __cpuinit cpuup_callback(stru
 #endif
 	case CPU_UP_CANCELED:
 	case CPU_UP_CANCELED_FROZEN:
+		mutex_lock(&cache_chain_mutex);
 		cpuup_canceled(cpu);
-		break;
-	case CPU_LOCK_RELEASE:
 		mutex_unlock(&cache_chain_mutex);
 		break;
 	}
@@ -2170,6 +2167,7 @@ kmem_cache_create (const char *name, siz
 	 * We use cache_chain_mutex to ensure a consistent view of
 	 * cpu_online_map as well.  Please see cpuup_callback
 	 */
+	get_online_cpus();
 	mutex_lock(&cache_chain_mutex);
 
 	list_for_each_entry(pc, &cache_chain, next) {
@@ -2396,6 +2394,7 @@ oops:
 		panic("kmem_cache_create(): failed to create slab `%s'\n",
 		      name);
 	mutex_unlock(&cache_chain_mutex);
+	put_online_cpus();
 	return cachep;
 }
 EXPORT_SYMBOL(kmem_cache_create);
@@ -2547,9 +2546,11 @@ int kmem_cache_shrink(struct kmem_cache 
 	int ret;
 	BUG_ON(!cachep || in_interrupt());
 
+	get_online_cpus();
 	mutex_lock(&cache_chain_mutex);
 	ret = __cache_shrink(cachep);
 	mutex_unlock(&cache_chain_mutex);
+	put_online_cpus();
 	return ret;
 }
 EXPORT_SYMBOL(kmem_cache_shrink);
@@ -2575,6 +2576,7 @@ void kmem_cache_destroy(struct kmem_cach
 	BUG_ON(!cachep || in_interrupt());
 
 	/* Find the cache in the chain of caches. */
+	get_online_cpus();
 	mutex_lock(&cache_chain_mutex);
 	/*
 	 * the chain is never empty, cache_cache is never destroyed
@@ -2584,6 +2586,7 @@ void kmem_cache_destroy(struct kmem_cach
 		slab_error(cachep, "Can't free all objects");
 		list_add(&cachep->next, &cache_chain);
 		mutex_unlock(&cache_chain_mutex);
+		put_online_cpus();
 		return;
 	}
 
@@ -2592,6 +2595,7 @@ void kmem_cache_destroy(struct kmem_cach
 
 	__kmem_cache_destroy(cachep);
 	mutex_unlock(&cache_chain_mutex);
+	put_online_cpus();
 }
 EXPORT_SYMBOL(kmem_cache_destroy);
 
Index: linux-2.6.23/kernel/workqueue.c
===================================================================
--- linux-2.6.23.orig/kernel/workqueue.c
+++ linux-2.6.23/kernel/workqueue.c
@@ -592,8 +592,6 @@ EXPORT_SYMBOL(schedule_delayed_work_on);
  * Returns zero on success.
  * Returns -ve errno on failure.
  *
- * Appears to be racy against CPU hotplug.
- *
  * schedule_on_each_cpu() is very slow.
  */
 int schedule_on_each_cpu(work_func_t func)
@@ -605,7 +603,7 @@ int schedule_on_each_cpu(work_func_t fun
 	if (!works)
 		return -ENOMEM;
 
-	preempt_disable();		/* CPU hotplug */
+	get_online_cpus();		/* CPU hotplug */
 	for_each_online_cpu(cpu) {
 		struct work_struct *work = per_cpu_ptr(works, cpu);
 
@@ -613,7 +611,7 @@ int schedule_on_each_cpu(work_func_t fun
 		set_bit(WORK_STRUCT_PENDING, work_data_bits(work));
 		__queue_work(per_cpu_ptr(keventd_wq->cpu_wq, cpu), work);
 	}
-	preempt_enable();
+	put_online_cpus();
 	flush_workqueue(keventd_wq);
 	free_percpu(works);
 	return 0;
@@ -749,8 +747,10 @@ struct workqueue_struct *__create_workqu
 		err = create_workqueue_thread(cwq, singlethread_cpu);
 		start_workqueue_thread(cwq, -1);
 	} else {
+		get_online_cpus();
 		mutex_lock(&workqueue_mutex);
 		list_add(&wq->list, &workqueues);
+		mutex_unlock(&workqueue_mutex);
 
 		for_each_possible_cpu(cpu) {
 			cwq = init_cpu_workqueue(wq, cpu);
@@ -759,7 +759,7 @@ struct workqueue_struct *__create_workqu
 			err = create_workqueue_thread(cwq, cpu);
 			start_workqueue_thread(cwq, cpu);
 		}
-		mutex_unlock(&workqueue_mutex);
+		put_online_cpus();
 	}
 
 	if (err) {
@@ -809,6 +809,7 @@ void destroy_workqueue(struct workqueue_
 	struct cpu_workqueue_struct *cwq;
 	int cpu;
 
+	get_online_cpus();
 	mutex_lock(&workqueue_mutex);
 	list_del(&wq->list);
 	mutex_unlock(&workqueue_mutex);
@@ -817,6 +818,7 @@ void destroy_workqueue(struct workqueue_
 		cwq = per_cpu_ptr(wq->cpu_wq, cpu);
 		cleanup_workqueue_thread(cwq, cpu);
 	}
+	put_online_cpus();
 
 	free_percpu(wq->cpu_wq);
 	kfree(wq);
@@ -830,22 +832,17 @@ static int __devinit workqueue_cpu_callb
 	unsigned int cpu = (unsigned long)hcpu;
 	struct cpu_workqueue_struct *cwq;
 	struct workqueue_struct *wq;
+	int ret = NOTIFY_OK;
 
 	action &= ~CPU_TASKS_FROZEN;
 
 	switch (action) {
-	case CPU_LOCK_ACQUIRE:
-		mutex_lock(&workqueue_mutex);
-		return NOTIFY_OK;
-
-	case CPU_LOCK_RELEASE:
-		mutex_unlock(&workqueue_mutex);
-		return NOTIFY_OK;
 
 	case CPU_UP_PREPARE:
 		cpu_set(cpu, cpu_populated_map);
 	}
 
+	mutex_lock(&workqueue_mutex);
 	list_for_each_entry(wq, &workqueues, list) {
 		cwq = per_cpu_ptr(wq->cpu_wq, cpu);
 
@@ -853,8 +850,10 @@ static int __devinit workqueue_cpu_callb
 		case CPU_UP_PREPARE:
 			if (!create_workqueue_thread(cwq, cpu))
 				break;
-			printk(KERN_ERR "workqueue for %i failed\n", cpu);
-			return NOTIFY_BAD;
+			printk(KERN_ERR "workqueue [%s] for %i failed\n",
+				wq->name, cpu);
+			ret = NOTIFY_BAD;
+			goto out_unlock;
 
 		case CPU_ONLINE:
 			start_workqueue_thread(cwq, cpu);
@@ -868,7 +867,9 @@ static int __devinit workqueue_cpu_callb
 		}
 	}
 
-	return NOTIFY_OK;
+out_unlock:
+	mutex_unlock(&workqueue_mutex);
+	return ret;
 }
 
 void __init init_workqueues(void)
Index: linux-2.6.23/kernel/cpu.c
===================================================================
--- linux-2.6.23.orig/kernel/cpu.c
+++ linux-2.6.23/kernel/cpu.c
@@ -185,7 +185,6 @@ static int _cpu_down(unsigned int cpu, i
 		return -EINVAL;
 
 	cpu_hotplug_begin();
-	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_ACQUIRE, hcpu);
 	err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod,
 					hcpu, -1, &nr_calls);
 	if (err == NOTIFY_BAD) {
@@ -238,7 +237,6 @@ out_thread:
 out_allowed:
 	set_cpus_allowed(current, old_allowed);
 out_release:
-	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_RELEASE, hcpu);
 	cpu_hotplug_done();
 	return err;
 }
@@ -269,7 +267,6 @@ static int __cpuinit _cpu_up(unsigned in
 		return -EINVAL;
 
 	cpu_hotplug_begin();
-	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_ACQUIRE, hcpu);
 	ret = __raw_notifier_call_chain(&cpu_chain, CPU_UP_PREPARE | mod, hcpu,
 							-1, &nr_calls);
 	if (ret == NOTIFY_BAD) {
@@ -293,7 +290,6 @@ out_notify:
 	if (ret != 0)
 		__raw_notifier_call_chain(&cpu_chain,
 				CPU_UP_CANCELED | mod, hcpu, nr_calls, NULL);
-	raw_notifier_call_chain(&cpu_chain, CPU_LOCK_RELEASE, hcpu);
 	cpu_hotplug_done();
 
 	return ret;
Index: linux-2.6.23/include/linux/notifier.h
===================================================================
--- linux-2.6.23.orig/include/linux/notifier.h
+++ linux-2.6.23/include/linux/notifier.h
@@ -207,9 +207,7 @@ static inline int notifier_to_errno(int 
 #define CPU_DOWN_PREPARE	0x0005 /* CPU (unsigned)v going down */
 #define CPU_DOWN_FAILED		0x0006 /* CPU (unsigned)v NOT going down */
 #define CPU_DEAD		0x0007 /* CPU (unsigned)v dead */
-#define CPU_LOCK_ACQUIRE	0x0008 /* Acquire all hotcpu locks */
-#define CPU_LOCK_RELEASE	0x0009 /* Release all hotcpu locks */
-#define CPU_DYING		0x000A /* CPU (unsigned)v not running any task,
+#define CPU_DYING		0x0008 /* CPU (unsigned)v not running any task,
 				        * not handling interrupts, soon dead */
 
 /* Used for CPU hotplug events occuring while tasks are frozen due to a suspend
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC PATCH 4/4] Remove CPU_DEAD/CPU_UP_CANCELLED handling from workqueue.c
  2007-10-16 10:33 [RFC PATCH 0/4] Refcount Based Cpu-Hotplug Revisit Gautham R Shenoy
                   ` (2 preceding siblings ...)
  2007-10-16 10:36 ` [RFC PATCH 3/4] Replace per-subsystem mutexes with get_online_cpus Gautham R Shenoy
@ 2007-10-16 10:37 ` Gautham R Shenoy
  2007-10-17 11:57   ` Oleg Nesterov
  2007-10-16 17:20 ` [RFC PATCH 0/4] Refcount Based Cpu-Hotplug Revisit Linus Torvalds
  4 siblings, 1 reply; 29+ messages in thread
From: Gautham R Shenoy @ 2007-10-16 10:37 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton
  Cc: linux-kernel, Srivatsa Vaddagiri, Rusty Russel, Dipankar Sarma,
	Oleg Nesterov, Ingo Molnar, Paul E McKenney

cleanup_workqueue_thread() in the CPU_DEAD and CPU_UP_CANCELLED path
will cause a deadlock if the worker thread is executing a work item
which is blocked on get_online_cpus(). This will lead to a irrecoverable
hang. 

Solution is not to cleanup the worker thread. Instead let it remain
even after the cpu goes offline. Since no one can queue any work
on an offlined cpu, this thread will be forever sleeping, untill
someone onlines the cpu.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
---
 kernel/workqueue.c |   13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

Index: linux-2.6.23/kernel/workqueue.c
===================================================================
--- linux-2.6.23.orig/kernel/workqueue.c
+++ linux-2.6.23/kernel/workqueue.c
@@ -30,6 +30,7 @@
 #include <linux/hardirq.h>
 #include <linux/mempolicy.h>
 #include <linux/freezer.h>
+#include <linux/cpumask.h>
 #include <linux/kallsyms.h>
 #include <linux/debug_locks.h>
 #include <linux/lockdep.h>
@@ -679,6 +680,7 @@ init_cpu_workqueue(struct workqueue_stru
 	spin_lock_init(&cwq->lock);
 	INIT_LIST_HEAD(&cwq->worklist);
 	init_waitqueue_head(&cwq->more_work);
+	cwq->thread = NULL;
 
 	return cwq;
 }
@@ -712,7 +714,7 @@ static void start_workqueue_thread(struc
 
 	if (p != NULL) {
 		if (cpu >= 0)
-			kthread_bind(p, cpu);
+			set_cpus_allowed(p, cpumask_of_cpu(cpu));
 		wake_up_process(p);
 	}
 }
@@ -848,6 +850,9 @@ static int __devinit workqueue_cpu_callb
 
 		switch (action) {
 		case CPU_UP_PREPARE:
+			if (likely(cwq->thread != NULL &&
+					!IS_ERR(cwq->thread)))
+				break;
 			if (!create_workqueue_thread(cwq, cpu))
 				break;
 			printk(KERN_ERR "workqueue [%s] for %i failed\n",
@@ -858,12 +863,6 @@ static int __devinit workqueue_cpu_callb
 		case CPU_ONLINE:
 			start_workqueue_thread(cwq, cpu);
 			break;
-
-		case CPU_UP_CANCELED:
-			start_workqueue_thread(cwq, -1);
-		case CPU_DEAD:
-			cleanup_workqueue_thread(cwq, cpu);
-			break;
 		}
 	}
 
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 0/4] Refcount Based Cpu-Hotplug Revisit.
  2007-10-16 10:33 [RFC PATCH 0/4] Refcount Based Cpu-Hotplug Revisit Gautham R Shenoy
                   ` (3 preceding siblings ...)
  2007-10-16 10:37 ` [RFC PATCH 4/4] Remove CPU_DEAD/CPU_UP_CANCELLED handling from workqueue.c Gautham R Shenoy
@ 2007-10-16 17:20 ` Linus Torvalds
  2007-10-17  2:11   ` Dipankar Sarma
  4 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2007-10-16 17:20 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Andrew Morton, linux-kernel, Srivatsa Vaddagiri, Rusty Russel,
	Dipankar Sarma, Oleg Nesterov, Ingo Molnar, Paul E McKenney



On Tue, 16 Oct 2007, Gautham R Shenoy wrote:
> 
> Patch 1/4: Implements the core refcount + waitqueue model.
> Patch 2/4: Replaces all the lock_cpu_hotplug/unlock_cpu_hotplug instances
> 	   with get_online_cpus()/put_online_cpus()
> Patch 3/4: Replaces the subsystem mutexes (we do have three of them now, 
>            in sched.c, slab.c and workqueue.c) with get_online_cpus,
> 	   put_online_cpus.
> Patch 4/4: Eliminates the CPU_DEAD and CPU_UP_CANCELLED event handling
>  	   from workqueue.c
> 
> The patch series has survived an overnight test with kernbench on i386.
> and has been tested with Paul Mckenney's latest preemptible rcu code.
> 
> Awaiting thy feedback!

Well, afaik, the patch series is fairly clean, and I'm obviously perfectly 
happy with the approach, so I have no objections. 

But it looks buggy. This:

	+static void cpu_hotplug_begin(void)
	+{
	+       mutex_lock(&cpu_hotplug.lock);
	+       cpu_hotplug.active_writer = current;
	+       while (cpu_hotplug.refcount) {
	+               mutex_unlock(&cpu_hotplug.lock);
	+               wait_for_completion(&cpu_hotplug.readers_done);
	+               mutex_lock(&cpu_hotplug.lock);
	+       }
	+
	+}

drops the cpu_hotplug.lock, which - as far as I can see - means that 
another process can come in and do the same, and mess up the 
"active_writer" thing. The oerson that actually *gets* the lock may not be 
the same one that has "active_writer" set to itself. No? Am I missing 
something.

So I think this needs (a) more people looking at it (I think I found a 
bug, who knows if there are more subtle ones lurking) and (b) lots of 
testing. 

			Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/4] Refcount Based Cpu-Hotplug Implementation
  2007-10-16 10:34 ` [RFC PATCH 1/4] Refcount Based Cpu-Hotplug Implementation Gautham R Shenoy
@ 2007-10-17  0:47   ` Rusty Russell
  2007-10-17  5:37     ` Gautham R Shenoy
  2007-10-17 10:53   ` Paul Jackson
  1 sibling, 1 reply; 29+ messages in thread
From: Rusty Russell @ 2007-10-17  0:47 UTC (permalink / raw)
  To: ego
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, Srivatsa Vaddagiri,
	Dipankar Sarma, Oleg Nesterov, Ingo Molnar, Paul E McKenney

On Tuesday 16 October 2007 20:34:17 Gautham R Shenoy wrote:
> This patch implements a Refcount + Waitqueue based model for
> cpu-hotplug.

Hi Gautham,

	I can't see where you re-initialize the completion.  

> +static void cpu_hotplug_begin(void)
> +{
> +	mutex_lock(&cpu_hotplug.lock);
> +	cpu_hotplug.active_writer = current;
> +	while (cpu_hotplug.refcount) {
> +		mutex_unlock(&cpu_hotplug.lock);
> +		wait_for_completion(&cpu_hotplug.readers_done);
> +		mutex_lock(&cpu_hotplug.lock);
> +	}

AFAICT this will busy-wait on the second CPU hotplug.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 0/4] Refcount Based Cpu-Hotplug Revisit.
  2007-10-16 17:20 ` [RFC PATCH 0/4] Refcount Based Cpu-Hotplug Revisit Linus Torvalds
@ 2007-10-17  2:11   ` Dipankar Sarma
  2007-10-17  2:23     ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Dipankar Sarma @ 2007-10-17  2:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Gautham R Shenoy, Andrew Morton, linux-kernel, Srivatsa Vaddagiri,
	Rusty Russel, Oleg Nesterov, Ingo Molnar, Paul E McKenney

On Tue, Oct 16, 2007 at 10:20:37AM -0700, Linus Torvalds wrote:
> On Tue, 16 Oct 2007, Gautham R Shenoy wrote:
> 
> Well, afaik, the patch series is fairly clean, and I'm obviously perfectly 
> happy with the approach, so I have no objections. 
> 
> But it looks buggy. This:
> 
> 	+static void cpu_hotplug_begin(void)
> 	+{
> 	+       mutex_lock(&cpu_hotplug.lock);
> 	+       cpu_hotplug.active_writer = current;
> 	+       while (cpu_hotplug.refcount) {
> 	+               mutex_unlock(&cpu_hotplug.lock);
> 	+               wait_for_completion(&cpu_hotplug.readers_done);
> 	+               mutex_lock(&cpu_hotplug.lock);
> 	+       }
> 	+
> 	+}
> 
> drops the cpu_hotplug.lock, which - as far as I can see - means that 
> another process can come in and do the same, and mess up the 
> "active_writer" thing. The oerson that actually *gets* the lock may not be 
> the same one that has "active_writer" set to itself. No? Am I missing 
> something.

Unless I am reading the patch wrongly, it seems cpu_hotplug_begin() is called 
while holding the cpu_add_remove_lock mutex. So, another CPU cannot come in
and do the same until _cpu_down() is over.

Thanks
Dipankar

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 0/4] Refcount Based Cpu-Hotplug Revisit.
  2007-10-17  2:11   ` Dipankar Sarma
@ 2007-10-17  2:23     ` Linus Torvalds
  2007-10-17  4:17       ` Gautham R Shenoy
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2007-10-17  2:23 UTC (permalink / raw)
  To: Dipankar Sarma
  Cc: Gautham R Shenoy, Andrew Morton, linux-kernel, Srivatsa Vaddagiri,
	Rusty Russel, Oleg Nesterov, Ingo Molnar, Paul E McKenney



On Wed, 17 Oct 2007, Dipankar Sarma wrote:
> 
> Unless I am reading the patch wrongly, it seems cpu_hotplug_begin() is called 
> while holding the cpu_add_remove_lock mutex. So, another CPU cannot come in
> and do the same until _cpu_down() is over.

Ahh, in that case I take back that objection, although maybe a comment 
about the required nesting within the other mutex is in order? Or maybe 
it's there and I just missed it (blush).

			Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 0/4] Refcount Based Cpu-Hotplug Revisit.
  2007-10-17  2:23     ` Linus Torvalds
@ 2007-10-17  4:17       ` Gautham R Shenoy
  0 siblings, 0 replies; 29+ messages in thread
From: Gautham R Shenoy @ 2007-10-17  4:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dipankar Sarma, Andrew Morton, linux-kernel, Srivatsa Vaddagiri,
	Rusty Russel, Oleg Nesterov, Ingo Molnar, Paul E McKenney

On Tue, Oct 16, 2007 at 07:23:38PM -0700, Linus Torvalds wrote:
> 
> 
> On Wed, 17 Oct 2007, Dipankar Sarma wrote:
> > 
> > Unless I am reading the patch wrongly, it seems cpu_hotplug_begin() is called 
> > while holding the cpu_add_remove_lock mutex. So, another CPU cannot come in
> > and do the same until _cpu_down() is over.
> 
> Ahh, in that case I take back that objection, although maybe a comment 
> about the required nesting within the other mutex is in order? Or maybe 
> it's there and I just missed it (blush).

Good point! I'll add the comment before cpu_hotplug_begin().


> 
> 			Linus

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/4] Refcount Based Cpu-Hotplug Implementation
  2007-10-17  0:47   ` Rusty Russell
@ 2007-10-17  5:37     ` Gautham R Shenoy
  2007-10-17  6:29       ` Rusty Russell
  2007-10-21 12:47       ` Oleg Nesterov
  0 siblings, 2 replies; 29+ messages in thread
From: Gautham R Shenoy @ 2007-10-17  5:37 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, Srivatsa Vaddagiri,
	Dipankar Sarma, Oleg Nesterov, Ingo Molnar, Paul E McKenney

On Wed, Oct 17, 2007 at 10:47:41AM +1000, Rusty Russell wrote:
> On Tuesday 16 October 2007 20:34:17 Gautham R Shenoy wrote:
> > This patch implements a Refcount + Waitqueue based model for
> > cpu-hotplug.
> 
> Hi Gautham,

Hi Rusty, 

> 
> 	I can't see where you re-initialize the completion.  

The cpu_hotplug.readers_done is a global variable which has been
initialized in cpu_hotplug_init. 

So I am wondering is the re-initialization required ? 

> 
> > +static void cpu_hotplug_begin(void)
> > +{
> > +	mutex_lock(&cpu_hotplug.lock);
> > +	cpu_hotplug.active_writer = current;
> > +	while (cpu_hotplug.refcount) {
> > +		mutex_unlock(&cpu_hotplug.lock);
> > +		wait_for_completion(&cpu_hotplug.readers_done);
> > +		mutex_lock(&cpu_hotplug.lock);
> > +	}
> 
> AFAICT this will busy-wait on the second CPU hotplug.
> 

Well when the first cpu_hotplug comes out of wait_for_completion, it
would have decremented the ->done count, so it's as good as new 
for the second CPU hotplug, no?

> Cheers,
> Rusty.

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/4] Refcount Based Cpu-Hotplug Implementation
  2007-10-17  5:37     ` Gautham R Shenoy
@ 2007-10-17  6:29       ` Rusty Russell
  2007-10-18  6:29         ` Gautham R Shenoy
  2007-10-21 12:47       ` Oleg Nesterov
  1 sibling, 1 reply; 29+ messages in thread
From: Rusty Russell @ 2007-10-17  6:29 UTC (permalink / raw)
  To: ego
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, Srivatsa Vaddagiri,
	Dipankar Sarma, Oleg Nesterov, Ingo Molnar, Paul E McKenney

On Wednesday 17 October 2007 15:37:54 Gautham R Shenoy wrote:
> On Wed, Oct 17, 2007 at 10:47:41AM +1000, Rusty Russell wrote:
> > On Tuesday 16 October 2007 20:34:17 Gautham R Shenoy wrote:
> > > This patch implements a Refcount + Waitqueue based model for
> > > cpu-hotplug.
> >
> > Hi Gautham,
>
> Hi Rusty,
>
> > 	I can't see where you re-initialize the completion.
>
> The cpu_hotplug.readers_done is a global variable which has been
> initialized in cpu_hotplug_init.
>
> So I am wondering is the re-initialization required ?

Yes.  AFAICT you use this completion on every hotplug.  Yet once a completion 
is completed, it needs to be re-initialized to be reused: it's "complete" and 
wait_for_completion will return immediately thereafter.

Perhaps you want a waitqueue instead?

Rusty.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/4] Refcount Based Cpu-Hotplug Implementation
  2007-10-16 10:34 ` [RFC PATCH 1/4] Refcount Based Cpu-Hotplug Implementation Gautham R Shenoy
  2007-10-17  0:47   ` Rusty Russell
@ 2007-10-17 10:53   ` Paul Jackson
  2007-10-17 11:27     ` Paul Jackson
  1 sibling, 1 reply; 29+ messages in thread
From: Paul Jackson @ 2007-10-17 10:53 UTC (permalink / raw)
  To: ego
  Cc: torvalds, akpm, linux-kernel, vatsa, rusty, dipankar, oleg, mingo,
	paulmck

This patch dies on boot for me, using sn2_defconfig.  It is dies on
the kernel/cpu.c line:

  mutex_lock(&cpu_hotplug.lock);

This is in cpu_hotplug_begin(), called from _cpu_up(), called from
cpu_up().

I haven't finished verifying this yet, but it looks like the
initialization of the mutex cpu_hotplug.lock in cpu_hotplug_init()
only happens if CONFIG_HOTPLUG_CPU is enabled, but the use of that
mutex in the above mutex_lock() call happens on all configs.

In the sn2_defconfig I'm using, as well as in several other ia64,
mips and powerpc configs, CONFIG_HOTPLUG_CPU is, at present anyway,
disabled.

For the record, the kernel death messages are:

===

Unable to handle kernel NULL pointer dereference (address 0000000000000000)
swapper[1]: Oops 8804682956800 [1]
Modules linked in:

Pid: 1, CPU 0, comm:              swapper
psr : 00001010085a2010 ifs : 800000000000038b ip  : [<a00000010081e080>]    Not tainted
ip is at __mutex_lock_slowpath+0x280/0x660
unat: 0000000000000000 pfs : 000000000000038b rsc : 0000000000000003
rnat: 1fcc5c7d8b12a4b4 bsps: 000000000001003e pr  : 0000000000001041
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a00000010081e040 b6  : e0000030025bee60 b7  : a0000001005315c0
f6  : 1003e1111111111111111 f7  : 1003e20c49ba5e353f7cf
f8  : 1003e00000000000000c8 f9  : 10006c7fffffffd73ea5c
f10 : 100019ffffffff8a77fd5 f11 : 1003e0000000000000000
r1  : a000000101064760 r2  : a00000010118ef80 r3  : e00000b020377b20
r8  : e00000b020377d00 r9  : 0000000000004000 r10 : e00000b020370b58
r11 : e00000b020370000 r12 : e00000b020377cf0 r13 : e00000b020370000
r14 : 0000000000000001 r15 : e00000b020377d18 r16 : e00000b020377768
r17 : 00000000dead4ead r18 : a000000100d01380 r19 : a000000100e68300
r20 : 0000000000004000 r21 : 0000000000000010 r22 : 0000000000000000
r23 : 0000000000000000 r24 : 0000000000004000 r25 : ffffffffffffffff
r26 : e00000b020377d10 r27 : 0000000000000000 r28 : e00000b020377d00
r29 : e00000b020377d08 r30 : a00000010118efa0 r31 : a00000010118ef98

Call Trace:
 [<a000000100013a60>] show_stack+0x40/0xa0
                                sp=e00000b0203778c0 bsp=e00000b020370f80
 [<a000000100014360>] show_regs+0x840/0x880
                                sp=e00000b020377a90 bsp=e00000b020370f28
 [<a000000100036460>] die+0x220/0x300
                                sp=e00000b020377a90 bsp=e00000b020370ee0
 [<a000000100059cd0>] ia64_do_page_fault+0x910/0xa80
                                sp=e00000b020377a90 bsp=e00000b020370e90
 [<a00000010000b120>] ia64_leave_kernel+0x0/0x280
                                sp=e00000b020377b20 bsp=e00000b020370e90
 [<a00000010081e080>] __mutex_lock_slowpath+0x280/0x660
                                sp=e00000b020377cf0 bsp=e00000b020370e38
 [<a00000010081e4a0>] mutex_lock+0x40/0x60
                                sp=e00000b020377d20 bsp=e00000b020370e18
 [<a000000100a2c7e0>] cpu_up+0x220/0x6a0
                                sp=e00000b020377d20 bsp=e00000b020370dd0
 [<a000000100a044d0>] kernel_init+0x250/0x7e0
                                sp=e00000b020377d30 bsp=e00000b020370d88
 [<a000000100011fd0>] kernel_thread_helper+0xd0/0x100
                                sp=e00000b020377e30 bsp=e00000b020370d60
 [<a000000100008dc0>] start_kernel_thread+0x20/0x40
                                sp=e00000b020377e30 bsp=e00000b020370d60
Kernel panic - not syncing: Attempted to kill init!


-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/4] Refcount Based Cpu-Hotplug Implementation
  2007-10-17 10:53   ` Paul Jackson
@ 2007-10-17 11:27     ` Paul Jackson
  2007-10-17 11:50       ` Gautham R Shenoy
  0 siblings, 1 reply; 29+ messages in thread
From: Paul Jackson @ 2007-10-17 11:27 UTC (permalink / raw)
  To: Paul Jackson
  Cc: ego, torvalds, akpm, linux-kernel, vatsa, rusty, dipankar, oleg,
	mingo, paulmck

With the following patch, the four cpu hotplug patches by
Gautham R Shenoy boot successfully on my sn2_defconfig ia64
SN2 Altix system.

This patch moves the cpu_hotplug_init() code outside of the
#ifdef CONFIG_HOTPLUG_CPU code.

I will confess to being confused however as to the best way
to handle this.  Having "cpu_hotplug_init" be a piece of code
that has to execute whether or not CONFIG_HOTPLUG_CPU is
enabled doesn't seem right.

Signed-off-by: Paul Jackson <pj@sgi.com>

---

 include/linux/cpu.h |    3 +--
 kernel/cpu.c        |    4 ++--
 2 files changed, 3 insertions(+), 4 deletions(-)

--- 2.6.23-mm1.orig/include/linux/cpu.h	2007-10-17 03:59:59.529623639 -0700
+++ 2.6.23-mm1/include/linux/cpu.h	2007-10-17 04:03:56.257223754 -0700
@@ -83,6 +83,7 @@ static inline void unregister_cpu_notifi
 
 #endif /* CONFIG_SMP */
 extern struct sysdev_class cpu_sysdev_class;
+extern void cpu_hotplug_init(void);
 
 #ifdef CONFIG_HOTPLUG_CPU
 /* Stop CPUs going up and down. */
@@ -97,7 +98,6 @@ static inline void cpuhotplug_mutex_unlo
 	mutex_unlock(cpu_hp_mutex);
 }
 
-extern void cpu_hotplug_init(void);
 extern void get_online_cpus(void);
 extern void put_online_cpus(void);
 #define hotcpu_notifier(fn, pri) {				\
@@ -117,7 +117,6 @@ static inline void cpuhotplug_mutex_lock
 static inline void cpuhotplug_mutex_unlock(struct mutex *cpu_hp_mutex)
 { }
 
-#define cpu_hotplug_init()		do { } while (0)
 #define get_online_cpus()		do { } while (0)
 #define put_online_cpus()		do { } while (0)
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
--- 2.6.23-mm1.orig/kernel/cpu.c	2007-10-17 03:59:59.873628872 -0700
+++ 2.6.23-mm1/kernel/cpu.c	2007-10-17 04:04:52.522079165 -0700
@@ -38,8 +38,6 @@ static struct {
 
 #define writer_exists() (cpu_hotplug.active_writer != NULL)
 
-#ifdef CONFIG_HOTPLUG_CPU
-
 void __init cpu_hotplug_init(void)
 {
 	cpu_hotplug.active_writer = NULL;
@@ -48,6 +46,8 @@ void __init cpu_hotplug_init(void)
 	init_completion(&cpu_hotplug.readers_done);
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
+
 void get_online_cpus(void)
 {
 	might_sleep();


-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/4] Refcount Based Cpu-Hotplug Implementation
  2007-10-17 11:27     ` Paul Jackson
@ 2007-10-17 11:50       ` Gautham R Shenoy
  2007-10-17 12:04         ` Paul Jackson
  0 siblings, 1 reply; 29+ messages in thread
From: Gautham R Shenoy @ 2007-10-17 11:50 UTC (permalink / raw)
  To: Paul Jackson
  Cc: torvalds, akpm, linux-kernel, vatsa, rusty, dipankar, oleg, mingo,
	paulmck

Hi Paul,

On Wed, Oct 17, 2007 at 04:27:09AM -0700, Paul Jackson wrote:
> With the following patch, the four cpu hotplug patches by
> Gautham R Shenoy boot successfully on my sn2_defconfig ia64
> SN2 Altix system.
> 
> This patch moves the cpu_hotplug_init() code outside of the
> #ifdef CONFIG_HOTPLUG_CPU code.
> 
> I will confess to being confused however as to the best way
> to handle this.  Having "cpu_hotplug_init" be a piece of code
> that has to execute whether or not CONFIG_HOTPLUG_CPU is
> enabled doesn't seem right.

Thanks for the patch! 

I had not tested the patch series with !CONFIG_HOTPLUG_CPU,
so no wonder I missed it.

> 
> Signed-off-by: Paul Jackson <pj@sgi.com>
> 

Acked-by: Gautham R Shenoy <ego@in.ibm.com>

> ---
> 
>  include/linux/cpu.h |    3 +--
>  kernel/cpu.c        |    4 ++--
>  2 files changed, 3 insertions(+), 4 deletions(-)
> 
> --- 2.6.23-mm1.orig/include/linux/cpu.h	2007-10-17 03:59:59.529623639 -0700
> +++ 2.6.23-mm1/include/linux/cpu.h	2007-10-17 04:03:56.257223754 -0700
> @@ -83,6 +83,7 @@ static inline void unregister_cpu_notifi
> 
>  #endif /* CONFIG_SMP */
>  extern struct sysdev_class cpu_sysdev_class;
> +extern void cpu_hotplug_init(void);
> 
>  #ifdef CONFIG_HOTPLUG_CPU
>  /* Stop CPUs going up and down. */
> @@ -97,7 +98,6 @@ static inline void cpuhotplug_mutex_unlo
>  	mutex_unlock(cpu_hp_mutex);
>  }
> 
> -extern void cpu_hotplug_init(void);
>  extern void get_online_cpus(void);
>  extern void put_online_cpus(void);
>  #define hotcpu_notifier(fn, pri) {				\
> @@ -117,7 +117,6 @@ static inline void cpuhotplug_mutex_lock
>  static inline void cpuhotplug_mutex_unlock(struct mutex *cpu_hp_mutex)
>  { }
> 
> -#define cpu_hotplug_init()		do { } while (0)
>  #define get_online_cpus()		do { } while (0)
>  #define put_online_cpus()		do { } while (0)
>  #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
> --- 2.6.23-mm1.orig/kernel/cpu.c	2007-10-17 03:59:59.873628872 -0700
> +++ 2.6.23-mm1/kernel/cpu.c	2007-10-17 04:04:52.522079165 -0700
> @@ -38,8 +38,6 @@ static struct {
> 
>  #define writer_exists() (cpu_hotplug.active_writer != NULL)
> 
> -#ifdef CONFIG_HOTPLUG_CPU
> -
>  void __init cpu_hotplug_init(void)
>  {
>  	cpu_hotplug.active_writer = NULL;
> @@ -48,6 +46,8 @@ void __init cpu_hotplug_init(void)
>  	init_completion(&cpu_hotplug.readers_done);
>  }
> 
> +#ifdef CONFIG_HOTPLUG_CPU
> +
>  void get_online_cpus(void)
>  {
>  	might_sleep();
> 
> 
> -- 
>                   I won't rest till it's the best ...
>                   Programmer, Linux Scalability
>                   Paul Jackson <pj@sgi.com> 1.925.600.0401

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 4/4] Remove CPU_DEAD/CPU_UP_CANCELLED handling from workqueue.c
  2007-10-16 10:37 ` [RFC PATCH 4/4] Remove CPU_DEAD/CPU_UP_CANCELLED handling from workqueue.c Gautham R Shenoy
@ 2007-10-17 11:57   ` Oleg Nesterov
  0 siblings, 0 replies; 29+ messages in thread
From: Oleg Nesterov @ 2007-10-17 11:57 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, Srivatsa Vaddagiri,
	Rusty Russel, Dipankar Sarma, Ingo Molnar, Paul E McKenney

On 10/16, Gautham R Shenoy wrote:
>
> cleanup_workqueue_thread() in the CPU_DEAD and CPU_UP_CANCELLED path
> will cause a deadlock if the worker thread is executing a work item
> which is blocked on get_online_cpus(). This will lead to a irrecoverable
> hang.
>
> Solution is not to cleanup the worker thread. Instead let it remain
> even after the cpu goes offline. Since no one can queue any work
> on an offlined cpu, this thread will be forever sleeping, untill
> someone onlines the cpu.

Imho: not perfect, but acceptable. We can re-introduce cwq->should_stop
flag, so that cwq->thread eventually exits after it flushed ->worklist.
But see below.

> @@ -679,6 +680,7 @@ init_cpu_workqueue(struct workqueue_stru
>  	spin_lock_init(&cwq->lock);
>  	INIT_LIST_HEAD(&cwq->worklist);
>  	init_waitqueue_head(&cwq->more_work);
> +	cwq->thread = NULL;

Not needed, cwq->thread == NULL because alloc_percpu() uses GFP_ZERO.

>  	return cwq;
>  }
> @@ -712,7 +714,7 @@ static void start_workqueue_thread(struc
>
>  	if (p != NULL) {
>  		if (cpu >= 0)
> -			kthread_bind(p, cpu);
> +			set_cpus_allowed(p, cpumask_of_cpu(cpu));

OK, this is understandable, but see below...

>  		wake_up_process(p);
>  	}
>  }
> @@ -848,6 +850,9 @@ static int __devinit workqueue_cpu_callb
>
>  		switch (action) {
>  		case CPU_UP_PREPARE:
> +			if (likely(cwq->thread != NULL &&
> +					!IS_ERR(cwq->thread)))

IS_ERR(cwq->thread) is not possible. cwq->thread is either NULL, or
a valid task_struct.

> +				break;
>  			if (!create_workqueue_thread(cwq, cpu))
>  				break;
>  			printk(KERN_ERR "workqueue [%s] for %i failed\n",
> @@ -858,12 +863,6 @@ static int __devinit workqueue_cpu_callb
>  		case CPU_ONLINE:
>  			start_workqueue_thread(cwq, cpu);
>  			break;
> -
> -		case CPU_UP_CANCELED:
> -			start_workqueue_thread(cwq, -1);
> -		case CPU_DEAD:
> -			cleanup_workqueue_thread(cwq, cpu);
> -			break;
>  		}
>  	}

Unfortunately, this approach seem to have a sublte problem.

Suppose that CPU 0 goes down. Let's denote the corresponding cwq->thread as T.
When CPU goes offline, T is not bound to CPU 0 any longer. So far this is OK.

Now suppose that CPU 0 goes online again. There is a window before CPU_ONLINE
(note that CPU 0 is already online at this time) binds T back to CPU 0. It is
possible that another thread running on CPU 0 does schedule_work() in that window,
or it can run on any CPU but calls schedule_delayed_work_on(0).

In this case the work_struct may run on the wrong CPU because T is ready to
execute the work_struct, this is bug. work->func() has all rights to expect
that cwq->thread is bound to the right CPU.

Let's take cache_reap() as an example. It is critical that this function is
really "per cpu", otherwise it can use the wrong per-cpu data, and even the
last schedule_delayed_work() is not reliable.

(Sorry, have no time to read the other patches now, will do on weekend).

Oleg.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/4] Refcount Based Cpu-Hotplug Implementation
  2007-10-17 11:50       ` Gautham R Shenoy
@ 2007-10-17 12:04         ` Paul Jackson
  0 siblings, 0 replies; 29+ messages in thread
From: Paul Jackson @ 2007-10-17 12:04 UTC (permalink / raw)
  To: ego
  Cc: torvalds, akpm, linux-kernel, vatsa, rusty, dipankar, oleg, mingo,
	paulmck

Gautham wrote:
> Thanks for the patch! 

You're welcome.

I wonder what are the chances of this patch set making it into 2.6.24.

The cgroup (aka container) folks Paul Menage and David Rientjes are
working with me on the (hopefully) last fix for getting cpusets to work
with the cgroup changes.  This involves how cpusets adapts to CPUs
coming and going from the system due to hotplug/hotunplug or coming and
going from cpusets due to user changes in the cpuset configuraton.

Not surprisingly, the code we are struggling with depends on the CPU
hotplug support, and this present patch set of yours looks to be a
fine improvement to that.

However whatever we do for this last fix has to make it into 2.6.24,
because cgroups themselves are targeted for 2.6.24, and I can't let
that lead to any serious regressions in existing cpuset capability.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 2/4] Rename lock_cpu_hotplug to get_online_cpus
  2007-10-16 10:35 ` [RFC PATCH 2/4] Rename lock_cpu_hotplug to get_online_cpus Gautham R Shenoy
@ 2007-10-17 16:13   ` Nathan Lynch
  2007-10-18  7:57     ` Gautham R Shenoy
  0 siblings, 1 reply; 29+ messages in thread
From: Nathan Lynch @ 2007-10-17 16:13 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, Srivatsa Vaddagiri,
	Rusty Russel, Dipankar Sarma, Oleg Nesterov, Ingo Molnar,
	Paul E McKenney

Hi Gautham-

Gautham R Shenoy wrote:
> Replace all lock_cpu_hotplug/unlock_cpu_hotplug from the kernel and use 
> get_online_cpus and put_online_cpus instead as it highlights
> the refcount semantics in these operations.

Something other than "get_online_cpus", please?  lock_cpu_hotplug()
protects cpu_present_map as well as cpu_online_map.  For example, some
of the powerpc code modified in this patch is made a bit less clear
because it is manipulating cpu_present_map, not cpu_online_map.


> Index: linux-2.6.23/arch/powerpc/platforms/pseries/hotplug-cpu.c
> ===================================================================
> --- linux-2.6.23.orig/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ linux-2.6.23/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -151,7 +151,7 @@ static int pseries_add_processor(struct 
>  	for (i = 0; i < nthreads; i++)
>  		cpu_set(i, tmp);
>  
> -	lock_cpu_hotplug();
> +	get_online_cpus();
>  
>  	BUG_ON(!cpus_subset(cpu_present_map, cpu_possible_map));
>  
> @@ -188,7 +188,7 @@ static int pseries_add_processor(struct 
>  	}
>  	err = 0;
>  out_unlock:
> -	unlock_cpu_hotplug();
> +	put_online_cpus();
>  	return err;
>  }
>  
> @@ -209,7 +209,7 @@ static void pseries_remove_processor(str
>  
>  	nthreads = len / sizeof(u32);
>  
> -	lock_cpu_hotplug();
> +	get_online_cpus();
>  	for (i = 0; i < nthreads; i++) {
>  		for_each_present_cpu(cpu) {
>  			if (get_hard_smp_processor_id(cpu) != intserv[i])
> @@ -223,7 +223,7 @@ static void pseries_remove_processor(str
>  			printk(KERN_WARNING "Could not find cpu to remove "
>  			       "with physical id 0x%x\n", intserv[i]);
>  	}
> -	unlock_cpu_hotplug();
> +	put_online_cpus();
>  }

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/4] Refcount Based Cpu-Hotplug Implementation
  2007-10-17  6:29       ` Rusty Russell
@ 2007-10-18  6:29         ` Gautham R Shenoy
  0 siblings, 0 replies; 29+ messages in thread
From: Gautham R Shenoy @ 2007-10-18  6:29 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, Srivatsa Vaddagiri,
	Dipankar Sarma, Oleg Nesterov, Ingo Molnar, Paul E McKenney

On Wed, Oct 17, 2007 at 04:29:12PM +1000, Rusty Russell wrote:
> On Wednesday 17 October 2007 15:37:54 Gautham R Shenoy wrote:
> > On Wed, Oct 17, 2007 at 10:47:41AM +1000, Rusty Russell wrote:
> > > On Tuesday 16 October 2007 20:34:17 Gautham R Shenoy wrote:
> > > > This patch implements a Refcount + Waitqueue based model for
> > > > cpu-hotplug.
> > >
> > > Hi Gautham,
> >
> > Hi Rusty,
> >
> > > 	I can't see where you re-initialize the completion.
> >
> > The cpu_hotplug.readers_done is a global variable which has been
> > initialized in cpu_hotplug_init.
> >
> > So I am wondering is the re-initialization required ?
> 
> Yes.  AFAICT you use this completion on every hotplug.  Yet once a completion 
> is completed, it needs to be re-initialized to be reused: it's "complete" and 
> wait_for_completion will return immediately thereafter.
>

Okay, I thought completion followed the spinlock semantics, and hence 
didn't require re-initalization. Thanks for that information!

> Perhaps you want a waitqueue instead?

Yes, I had considered it. But completion looked appealing since it 
already had the wait-queuing code. I'll give it a try and repost 
the series with other changes.

> 
> Rusty.

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 2/4] Rename lock_cpu_hotplug to get_online_cpus
  2007-10-17 16:13   ` Nathan Lynch
@ 2007-10-18  7:57     ` Gautham R Shenoy
  2007-10-18  8:22       ` Nathan Lynch
  0 siblings, 1 reply; 29+ messages in thread
From: Gautham R Shenoy @ 2007-10-18  7:57 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, Srivatsa Vaddagiri,
	Rusty Russel, Dipankar Sarma, Oleg Nesterov, Ingo Molnar,
	Paul E McKenney

Hi Nathan, 
> Hi Gautham-
> 
> Gautham R Shenoy wrote:
> > Replace all lock_cpu_hotplug/unlock_cpu_hotplug from the kernel and use 
> > get_online_cpus and put_online_cpus instead as it highlights
> > the refcount semantics in these operations.
> 
> Something other than "get_online_cpus", please?  lock_cpu_hotplug()
> protects cpu_present_map as well as cpu_online_map.  For example, some
> of the powerpc code modified in this patch is made a bit less clear
> because it is manipulating cpu_present_map, not cpu_online_map.

A quick look at the code, and I am wondering why is lock_cpu_hotplug()
used there in the first place. It doesn't look like we require any 
protection against cpus coming up/ going down in the code below, 
since the cpu-hotplug operation doesn't affect the cpu_present_map. 

Can't we use another mutex here instead of the cpu_hotplug mutex here ?


> 
> 
> > Index: linux-2.6.23/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > ===================================================================
> > --- linux-2.6.23.orig/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > +++ linux-2.6.23/arch/powerpc/platforms/pseries/hotplug-cpu.c
> > @@ -151,7 +151,7 @@ static int pseries_add_processor(struct 
> >  	for (i = 0; i < nthreads; i++)
> >  		cpu_set(i, tmp);
> >  
> > -	lock_cpu_hotplug();
> > +	get_online_cpus();
> >  
> >  	BUG_ON(!cpus_subset(cpu_present_map, cpu_possible_map));
> >  
> > @@ -188,7 +188,7 @@ static int pseries_add_processor(struct 
> >  	}
> >  	err = 0;
> >  out_unlock:
> > -	unlock_cpu_hotplug();
> > +	put_online_cpus();
> >  	return err;
> >  }
> >  
> > @@ -209,7 +209,7 @@ static void pseries_remove_processor(str
> >  
> >  	nthreads = len / sizeof(u32);
> >  
> > -	lock_cpu_hotplug();
> > +	get_online_cpus();
> >  	for (i = 0; i < nthreads; i++) {
> >  		for_each_present_cpu(cpu) {
> >  			if (get_hard_smp_processor_id(cpu) != intserv[i])
> > @@ -223,7 +223,7 @@ static void pseries_remove_processor(str
> >  			printk(KERN_WARNING "Could not find cpu to remove "
> >  			       "with physical id 0x%x\n", intserv[i]);
> >  	}
> > -	unlock_cpu_hotplug();
> > +	put_online_cpus();
> >  }


-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 2/4] Rename lock_cpu_hotplug to get_online_cpus
  2007-10-18  7:57     ` Gautham R Shenoy
@ 2007-10-18  8:22       ` Nathan Lynch
  2007-10-18  8:59         ` Gautham R Shenoy
  0 siblings, 1 reply; 29+ messages in thread
From: Nathan Lynch @ 2007-10-18  8:22 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, Srivatsa Vaddagiri,
	Rusty Russel, Dipankar Sarma, Oleg Nesterov, Ingo Molnar,
	Paul E McKenney

Gautham R Shenoy wrote:
> Hi Nathan, 
> > Hi Gautham-
> > 
> > Gautham R Shenoy wrote:
> > > Replace all lock_cpu_hotplug/unlock_cpu_hotplug from the kernel and use 
> > > get_online_cpus and put_online_cpus instead as it highlights
> > > the refcount semantics in these operations.
> > 
> > Something other than "get_online_cpus", please?  lock_cpu_hotplug()
> > protects cpu_present_map as well as cpu_online_map.  For example, some
> > of the powerpc code modified in this patch is made a bit less clear
> > because it is manipulating cpu_present_map, not cpu_online_map.
> 
> A quick look at the code, and I am wondering why is lock_cpu_hotplug()
> used there in the first place. It doesn't look like we require any 
> protection against cpus coming up/ going down in the code below, 
> since the cpu-hotplug operation doesn't affect the cpu_present_map. 

The locking is necessary.  Changes to cpu_online_map and
cpu_present_map must be serialized; otherwise you could end up trying
to online a cpu as it is being removed (i.e. cleared from
cpu_present_map).  Online operations must check that a cpu is present
before bringing it up (kernel/cpu.c):

/* Requires cpu_add_remove_lock to be held */
static int __cpuinit _cpu_up(unsigned int cpu, int tasks_frozen)
{
	int ret, nr_calls = 0;
	void *hcpu = (void *)(long)cpu;
	unsigned long mod = tasks_frozen ? CPU_TASKS_FROZEN : 0;

	if (cpu_online(cpu) || !cpu_present(cpu))
		return -EINVAL;
	....

> Can't we use another mutex here instead of the cpu_hotplug mutex here ?

I guess so, but I don't really see the need...


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 2/4] Rename lock_cpu_hotplug to get_online_cpus
  2007-10-18  8:22       ` Nathan Lynch
@ 2007-10-18  8:59         ` Gautham R Shenoy
  2007-10-18 17:30           ` Nathan Lynch
  0 siblings, 1 reply; 29+ messages in thread
From: Gautham R Shenoy @ 2007-10-18  8:59 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, Srivatsa Vaddagiri,
	Rusty Russel, Dipankar Sarma, Oleg Nesterov, Ingo Molnar,
	Paul E McKenney

On Thu, Oct 18, 2007 at 03:22:21AM -0500, Nathan Lynch wrote:
> Gautham R Shenoy wrote:
> > Hi Nathan, 
> > > Hi Gautham-
> > > 
> > > Gautham R Shenoy wrote:
> > > > Replace all lock_cpu_hotplug/unlock_cpu_hotplug from the kernel and use 
> > > > get_online_cpus and put_online_cpus instead as it highlights
> > > > the refcount semantics in these operations.
> > > 
> > > Something other than "get_online_cpus", please?  lock_cpu_hotplug()
> > > protects cpu_present_map as well as cpu_online_map.  For example, some
> > > of the powerpc code modified in this patch is made a bit less clear
> > > because it is manipulating cpu_present_map, not cpu_online_map.
> > 
> > A quick look at the code, and I am wondering why is lock_cpu_hotplug()
> > used there in the first place. It doesn't look like we require any 
> > protection against cpus coming up/ going down in the code below, 
> > since the cpu-hotplug operation doesn't affect the cpu_present_map. 
> 
> The locking is necessary.  Changes to cpu_online_map and
> cpu_present_map must be serialized; otherwise you could end up trying
> to online a cpu as it is being removed (i.e. cleared from
> cpu_present_map).  Online operations must check that a cpu is present
> before bringing it up (kernel/cpu.c):

Fair enough! 

But we are not protecting the cpu_present_map here using
lock_cpu_hotplug(), now are we?

The lock_cpu_hotplug() here, ensures that no cpu-hotplug operations
occur in parallel with a processor add or a processor remove. 
IOW, we're still ensuring that the cpu_online_map doesn't change 
while we're changing the cpu_present_map. So I don't see why the name
get_online_cpus() should be a problem here. May be we could add a
comment as to why we don't want a cpu-hotplug operation to happen while
we're adding/removing a processor.

Unless of course, lock_cpu_hotplug() is also used to serialize 
the add_processor and remove_processor operations. Is that the case
here ?

Thanks and Regards
gautham.
> 
> /* Requires cpu_add_remove_lock to be held */
> static int __cpuinit _cpu_up(unsigned int cpu, int tasks_frozen)
> {
> 	int ret, nr_calls = 0;
> 	void *hcpu = (void *)(long)cpu;
> 	unsigned long mod = tasks_frozen ? CPU_TASKS_FROZEN : 0;
> 
> 	if (cpu_online(cpu) || !cpu_present(cpu))
> 		return -EINVAL;
> 	....
> 
> > Can't we use another mutex here instead of the cpu_hotplug mutex here ?
> 
> I guess so, but I don't really see the need...
> 

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 2/4] Rename lock_cpu_hotplug to get_online_cpus
  2007-10-18  8:59         ` Gautham R Shenoy
@ 2007-10-18 17:30           ` Nathan Lynch
  2007-10-19  5:04             ` Gautham R Shenoy
  0 siblings, 1 reply; 29+ messages in thread
From: Nathan Lynch @ 2007-10-18 17:30 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, Srivatsa Vaddagiri,
	Rusty Russel, Dipankar Sarma, Oleg Nesterov, Ingo Molnar,
	Paul E McKenney

Gautham R Shenoy wrote:
> On Thu, Oct 18, 2007 at 03:22:21AM -0500, Nathan Lynch wrote:
> > Gautham R Shenoy wrote:
> > > Hi Nathan, 
> > > > Hi Gautham-
> > > > 
> > > > Gautham R Shenoy wrote:
> > > > > Replace all lock_cpu_hotplug/unlock_cpu_hotplug from the kernel and use 
> > > > > get_online_cpus and put_online_cpus instead as it highlights
> > > > > the refcount semantics in these operations.
> > > > 
> > > > Something other than "get_online_cpus", please?  lock_cpu_hotplug()
> > > > protects cpu_present_map as well as cpu_online_map.  For example, some
> > > > of the powerpc code modified in this patch is made a bit less clear
> > > > because it is manipulating cpu_present_map, not cpu_online_map.
> > > 
> > > A quick look at the code, and I am wondering why is lock_cpu_hotplug()
> > > used there in the first place. It doesn't look like we require any 
> > > protection against cpus coming up/ going down in the code below, 
> > > since the cpu-hotplug operation doesn't affect the cpu_present_map. 
> > 
> > The locking is necessary.  Changes to cpu_online_map and
> > cpu_present_map must be serialized; otherwise you could end up trying
> > to online a cpu as it is being removed (i.e. cleared from
> > cpu_present_map).  Online operations must check that a cpu is present
> > before bringing it up (kernel/cpu.c):
> 
> Fair enough! 
> 
> But we are not protecting the cpu_present_map here using
> lock_cpu_hotplug(), now are we?

Yes, we are.  In addition to the above, updates to cpu_present_map
have to be serialized.  pseries_add_processor can be summed up as
"find the first N unset bits in cpu_present_map and set them".  That's
not an atomic operation, so some kind of mutual exclusion is needed.


> The lock_cpu_hotplug() here, ensures that no cpu-hotplug operations
> occur in parallel with a processor add or a processor remove. 

That's one important effect, but not the only one (see above).


> IOW, we're still ensuring that the cpu_online_map doesn't change 
> while we're changing the cpu_present_map. So I don't see why the name
> get_online_cpus() should be a problem here.

The naming is a problem IMO for two reasons:

- lock_cpu_hotplug() protects cpu_present_map as well as
  cpu_online_map (sigh, I see that Documentation/cpu-hotplug.txt
  disagrees with me, but my statement holds for powerpc, at least).

- get_online_cpus() implies reference count semantics (as stated in
  the changelog) but AFAICT it really has a reference count
  implementation with read-write locking semantics.

Hmm, I think there's another problem here.  With your changes, code
which relies on the mutual exclusion behavior of lock_cpu_hotplug()
(such as pseries_add/remove_processor) will now be able to run
concurrently.  Probably those functions should use
cpu_hotplug_begin/end instead.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 2/4] Rename lock_cpu_hotplug to get_online_cpus
  2007-10-18 17:30           ` Nathan Lynch
@ 2007-10-19  5:04             ` Gautham R Shenoy
  2007-10-22  0:43               ` Nathan Lynch
  0 siblings, 1 reply; 29+ messages in thread
From: Gautham R Shenoy @ 2007-10-19  5:04 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, Srivatsa Vaddagiri,
	Rusty Russel, Dipankar Sarma, Oleg Nesterov, Ingo Molnar,
	Paul E McKenney

Hi Nathan, 

> Gautham R Shenoy wrote:
> > On Thu, Oct 18, 2007 at 03:22:21AM -0500, Nathan Lynch wrote:
> > > Gautham R Shenoy wrote:
> > > > Hi Nathan, 
> > > > > Hi Gautham-
> > > > > 
> > > > > Gautham R Shenoy wrote:
> > > > > > Replace all lock_cpu_hotplug/unlock_cpu_hotplug from the kernel and use 
> > > > > > get_online_cpus and put_online_cpus instead as it highlights
> > > > > > the refcount semantics in these operations.
> > > > > 
> > > > > Something other than "get_online_cpus", please?  lock_cpu_hotplug()
> > > > > protects cpu_present_map as well as cpu_online_map.  For example, some
> > > > > of the powerpc code modified in this patch is made a bit less clear
> > > > > because it is manipulating cpu_present_map, not cpu_online_map.
> > > > 
> > > > A quick look at the code, and I am wondering why is lock_cpu_hotplug()
> > > > used there in the first place. It doesn't look like we require any 
> > > > protection against cpus coming up/ going down in the code below, 
> > > > since the cpu-hotplug operation doesn't affect the cpu_present_map. 
> > > 
> > > The locking is necessary.  Changes to cpu_online_map and
> > > cpu_present_map must be serialized; otherwise you could end up trying
> > > to online a cpu as it is being removed (i.e. cleared from
> > > cpu_present_map).  Online operations must check that a cpu is present
> > > before bringing it up (kernel/cpu.c):
> > 
> > Fair enough! 
> > 
> > But we are not protecting the cpu_present_map here using
> > lock_cpu_hotplug(), now are we?
> 
> Yes, we are.  In addition to the above, updates to cpu_present_map
> have to be serialized.  pseries_add_processor can be summed up as
> "find the first N unset bits in cpu_present_map and set them".  That's
> not an atomic operation, so some kind of mutual exclusion is needed.
> 

Okay. But other than pseries_add_processor and pseries_remove_processor,
are there any other places where we _change_ the cpu_present_map ?

I agree that we need some kind of synchronization for threads which read
the cpu_present_map. But probably we can use a seperate mutex for that.



> 
> > The lock_cpu_hotplug() here, ensures that no cpu-hotplug operations
> > occur in parallel with a processor add or a processor remove. 
> 
> That's one important effect, but not the only one (see above).
> 
> 
> > IOW, we're still ensuring that the cpu_online_map doesn't change 
> > while we're changing the cpu_present_map. So I don't see why the name
> > get_online_cpus() should be a problem here.
> 
> The naming is a problem IMO for two reasons:
> 
> - lock_cpu_hotplug() protects cpu_present_map as well as
>   cpu_online_map (sigh, I see that Documentation/cpu-hotplug.txt
>   disagrees with me, but my statement holds for powerpc, at least).
> 
> - get_online_cpus() implies reference count semantics (as stated in
>   the changelog) but AFAICT it really has a reference count
>   implementation with read-write locking semantics.
> 
> Hmm, I think there's another problem here.  With your changes, code
> which relies on the mutual exclusion behavior of lock_cpu_hotplug()
> (such as pseries_add/remove_processor) will now be able to run
> concurrently.  Probably those functions should use
> cpu_hotplug_begin/end instead.

One of the primary reasons to move away from the mutex semantics for
cpu-hotplug protection, was that there were a lot of places where
lock_cpu_hotplug() was used for protecting local data structures too,
when they had nothing to do with cpu-hotplug as such, and it resulted 
in a whole mess. It took people quite sometime to sort things out 
with the cpufreq subsystem.

Probably it would be a lot cleaner if we use get_online_cpus() for
protection against cpu-hotplug and use specific mutexes for serializing
accesses to local data structures. Thoughts?

Thanks and Regards
gautham.

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 3/4] Replace per-subsystem mutexes with get_online_cpus
  2007-10-16 10:36 ` [RFC PATCH 3/4] Replace per-subsystem mutexes with get_online_cpus Gautham R Shenoy
@ 2007-10-21 11:39   ` Oleg Nesterov
  2007-10-22  4:58     ` Gautham R Shenoy
  0 siblings, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2007-10-21 11:39 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, Srivatsa Vaddagiri,
	Rusty Russel, Dipankar Sarma, Ingo Molnar, Paul E McKenney

On 10/16, Gautham R Shenoy wrote:
>
> This patch converts the known per-subsystem cpu_hotplug mutexes to
> get_online_cpus put_online_cpus.
> It also eliminates the CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE hotplug
> notification events.

Personally, I like the changes in workqueue.c very much, a couple of
minor nits below.

> --- linux-2.6.23.orig/kernel/workqueue.c
> +++ linux-2.6.23/kernel/workqueue.c
> @@ -592,8 +592,6 @@ EXPORT_SYMBOL(schedule_delayed_work_on);
>   * Returns zero on success.
>   * Returns -ve errno on failure.
>   *
> - * Appears to be racy against CPU hotplug.
> - *

see below,

>   * schedule_on_each_cpu() is very slow.
>   */
>  int schedule_on_each_cpu(work_func_t func)
> @@ -605,7 +603,7 @@ int schedule_on_each_cpu(work_func_t fun
>  	if (!works)
>  		return -ENOMEM;
>
> -	preempt_disable();		/* CPU hotplug */
> +	get_online_cpus();		/* CPU hotplug */
>  	for_each_online_cpu(cpu) {
>  		struct work_struct *work = per_cpu_ptr(works, cpu);
>
> @@ -613,7 +611,7 @@ int schedule_on_each_cpu(work_func_t fun
>  		set_bit(WORK_STRUCT_PENDING, work_data_bits(work));
>  		__queue_work(per_cpu_ptr(keventd_wq->cpu_wq, cpu), work);
>  	}
> -	preempt_enable();
> +	put_online_cpus();
>  	flush_workqueue(keventd_wq);

Still racy. To kill the race, please move flush_workqueue() up, before
put_online_cpus().

> @@ -809,6 +809,7 @@ void destroy_workqueue(struct workqueue_
>  	struct cpu_workqueue_struct *cwq;
>  	int cpu;
>
> +	get_online_cpus();
>  	mutex_lock(&workqueue_mutex);
>  	list_del(&wq->list);
>  	mutex_unlock(&workqueue_mutex);
> @@ -817,6 +818,7 @@ void destroy_workqueue(struct workqueue_
>  		cwq = per_cpu_ptr(wq->cpu_wq, cpu);
>  		cleanup_workqueue_thread(cwq, cpu);
>  	}
> +	put_online_cpus();

Correct, but I'd suggest to do put_online_cpus() earlier, right after
mutex_unlock(&workqueue_mutex).

> @@ -830,22 +832,17 @@ static int __devinit workqueue_cpu_callb
>  	unsigned int cpu = (unsigned long)hcpu;
>  	struct cpu_workqueue_struct *cwq;
>  	struct workqueue_struct *wq;
> +	int ret = NOTIFY_OK;
>
>  	action &= ~CPU_TASKS_FROZEN;
>
>  	switch (action) {
> -	case CPU_LOCK_ACQUIRE:
> -		mutex_lock(&workqueue_mutex);
> -		return NOTIFY_OK;
> -
> -	case CPU_LOCK_RELEASE:
> -		mutex_unlock(&workqueue_mutex);
> -		return NOTIFY_OK;
>

please remove this emtpy line

>  	case CPU_UP_PREPARE:
>  		cpu_set(cpu, cpu_populated_map);
>  	}
>
> +	mutex_lock(&workqueue_mutex);

We don't need workqueue_mutex here. With your patch workqueue_mutex protects
list_head, nothing more. But all other callers (create/destroy) should take
get_online_cpus() anyway. This means that we can convert workqueue_mutex to
spinlock_t.

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 1/4] Refcount Based Cpu-Hotplug Implementation
  2007-10-17  5:37     ` Gautham R Shenoy
  2007-10-17  6:29       ` Rusty Russell
@ 2007-10-21 12:47       ` Oleg Nesterov
  1 sibling, 0 replies; 29+ messages in thread
From: Oleg Nesterov @ 2007-10-21 12:47 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Rusty Russell, Linus Torvalds, Andrew Morton, linux-kernel,
	Srivatsa Vaddagiri, Dipankar Sarma, Ingo Molnar, Paul E McKenney

On 10/17, Gautham R Shenoy wrote:
>
> On Wed, Oct 17, 2007 at 10:47:41AM +1000, Rusty Russell wrote:
> > 
> > 	I can't see where you re-initialize the completion.  
> 
> The cpu_hotplug.readers_done is a global variable which has been
> initialized in cpu_hotplug_init. 
> 
> So I am wondering is the re-initialization required ? 

I don't understand why should we re-initialize the completion too,
but see below.

> > > +static void cpu_hotplug_begin(void)
> > > +{
> > > +	mutex_lock(&cpu_hotplug.lock);
> > > +	cpu_hotplug.active_writer = current;
> > > +	while (cpu_hotplug.refcount) {
> > > +		mutex_unlock(&cpu_hotplug.lock);
> > > +		wait_for_completion(&cpu_hotplug.readers_done);
> > > +		mutex_lock(&cpu_hotplug.lock);
> > > +	}
> > 
> > AFAICT this will busy-wait on the second CPU hotplug.

Why?

> Well when the first cpu_hotplug comes out of wait_for_completion, it
> would have decremented the ->done count, so it's as good as new 
> for the second CPU hotplug, no?

No, because we decrement the ->done count only once, but there is no
guarantee that ->done == 1 when we get CPU after wakeup. Another reader
can do lock_cpu_hotplug/unlock_cpu_hotplug in between, so we have a race.

But I disagree with "Yet once a completion is completed, it needs to be
re-initialized to be reused: it's "complete" and wait_for_completion
will return immediately thereafter".

Rusty, could you please clarify?

Side note, we don't block the new readers while cpu_hotplug_begin() waits
for the completion. I don't think this is a problem, but perhaps it makes
sense to document the possible livelock.

Oleg.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 2/4] Rename lock_cpu_hotplug to get_online_cpus
  2007-10-19  5:04             ` Gautham R Shenoy
@ 2007-10-22  0:43               ` Nathan Lynch
  2007-10-22  4:51                 ` Gautham R Shenoy
  0 siblings, 1 reply; 29+ messages in thread
From: Nathan Lynch @ 2007-10-22  0:43 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, Srivatsa Vaddagiri,
	Rusty Russel, Dipankar Sarma, Oleg Nesterov, Ingo Molnar,
	Paul E McKenney

Gautham R Shenoy wrote:
> > Gautham R Shenoy wrote:
> > > On Thu, Oct 18, 2007 at 03:22:21AM -0500, Nathan Lynch wrote:
> > > > Gautham R Shenoy wrote:
> > > > > > Gautham R Shenoy wrote:
> > > > > > > Replace all lock_cpu_hotplug/unlock_cpu_hotplug from the kernel and use 
> > > > > > > get_online_cpus and put_online_cpus instead as it highlights
> > > > > > > the refcount semantics in these operations.
> > > > > > 
> > > > > > Something other than "get_online_cpus", please?  lock_cpu_hotplug()
> > > > > > protects cpu_present_map as well as cpu_online_map.  For example, some
> > > > > > of the powerpc code modified in this patch is made a bit less clear
> > > > > > because it is manipulating cpu_present_map, not cpu_online_map.
> > > > > 
> > > > > A quick look at the code, and I am wondering why is lock_cpu_hotplug()
> > > > > used there in the first place. It doesn't look like we require any 
> > > > > protection against cpus coming up/ going down in the code below, 
> > > > > since the cpu-hotplug operation doesn't affect the cpu_present_map. 
> > > > 
> > > > The locking is necessary.  Changes to cpu_online_map and
> > > > cpu_present_map must be serialized; otherwise you could end up trying
> > > > to online a cpu as it is being removed (i.e. cleared from
> > > > cpu_present_map).  Online operations must check that a cpu is present
> > > > before bringing it up (kernel/cpu.c):
> > > 
> > > Fair enough! 
> > > 
> > > But we are not protecting the cpu_present_map here using
> > > lock_cpu_hotplug(), now are we?
> > 
> > Yes, we are.  In addition to the above, updates to cpu_present_map
> > have to be serialized.  pseries_add_processor can be summed up as
> > "find the first N unset bits in cpu_present_map and set them".  That's
> > not an atomic operation, so some kind of mutual exclusion is needed.
> > 
> 
> Okay. But other than pseries_add_processor and pseries_remove_processor,
> are there any other places where we _change_ the cpu_present_map ?

Other arch code e.g. ia64 changes it for add/remove also.  But I fail
to see how it matters.


> I agree that we need some kind of synchronization for threads which
> read the cpu_present_map. But probably we can use a seperate mutex
> for that.

That would be needless complexity.


> > The naming is a problem IMO for two reasons:
> > 
> > - lock_cpu_hotplug() protects cpu_present_map as well as
> >   cpu_online_map (sigh, I see that Documentation/cpu-hotplug.txt
> >   disagrees with me, but my statement holds for powerpc, at least).
> > 
> > - get_online_cpus() implies reference count semantics (as stated in
> >   the changelog) but AFAICT it really has a reference count
> >   implementation with read-write locking semantics.
> > 
> > Hmm, I think there's another problem here.  With your changes, code
> > which relies on the mutual exclusion behavior of lock_cpu_hotplug()
> > (such as pseries_add/remove_processor) will now be able to run
> > concurrently.  Probably those functions should use
> > cpu_hotplug_begin/end instead.
> 
> One of the primary reasons to move away from the mutex semantics for
> cpu-hotplug protection, was that there were a lot of places where
> lock_cpu_hotplug() was used for protecting local data structures too,
> when they had nothing to do with cpu-hotplug as such, and it resulted 
> in a whole mess. It took people quite sometime to sort things out 
> with the cpufreq subsystem.

cpu_present_map isn't a "local data structure" any more than
cpu_online_map, and it is quite relevant to cpu hotplug.  We have to
maintain the invariant that the set of cpus online is a subset of cpus
present.

> Probably it would be a lot cleaner if we use get_online_cpus() for
> protection against cpu-hotplug and use specific mutexes for serializing
> accesses to local data structures. Thoughts?

I don't feel like I'm getting through here.  Let me restate.

If I'm reading them correctly, these patches are changing the behavior
of lock_cpu_hotplug() from mutex-style locking to a kind of read-write
locking.  I think that's fine, but the naming of the new API poorly
reflects its real behavior.  Conversion of lock_cpu_hotplug() users
should be done with care.  Most of them - those that need one of the
cpu maps to remain unchanged during a critical section - can be
considered readers.  But a few (such as pseries_add_processor() and
pseries_remove_processor()) are writers, because they modify one of
the maps.

So, why not:

get_cpus_online   -> cpumaps_read_lock
put_cpus_online   -> cpumaps_read_unlock
cpu_hotplug_begin -> cpumaps_write_lock
cpu_hotplug_end   -> cpumaps_write_unlock

Or something similar?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 2/4] Rename lock_cpu_hotplug to get_online_cpus
  2007-10-22  0:43               ` Nathan Lynch
@ 2007-10-22  4:51                 ` Gautham R Shenoy
  0 siblings, 0 replies; 29+ messages in thread
From: Gautham R Shenoy @ 2007-10-22  4:51 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, Srivatsa Vaddagiri,
	Rusty Russel, Dipankar Sarma, Oleg Nesterov, Ingo Molnar,
	Paul E McKenney

> > 
> > Okay. But other than pseries_add_processor and pseries_remove_processor,
> > are there any other places where we _change_ the cpu_present_map ?
> 
> Other arch code e.g. ia64 changes it for add/remove also.  But I fail
> to see how it matters.
> 
> 
> > I agree that we need some kind of synchronization for threads which
> > read the cpu_present_map. But probably we can use a seperate mutex
> > for that.
> 
> That would be needless complexity.
> 
> 
> > > The naming is a problem IMO for two reasons:
> > > 
> > > - lock_cpu_hotplug() protects cpu_present_map as well as
> > >   cpu_online_map (sigh, I see that Documentation/cpu-hotplug.txt
> > >   disagrees with me, but my statement holds for powerpc, at least).
> > > 
> > > - get_online_cpus() implies reference count semantics (as stated in
> > >   the changelog) but AFAICT it really has a reference count
> > >   implementation with read-write locking semantics.
> > > 
> > > Hmm, I think there's another problem here.  With your changes, code
> > > which relies on the mutual exclusion behavior of lock_cpu_hotplug()
> > > (such as pseries_add/remove_processor) will now be able to run
> > > concurrently.  Probably those functions should use
> > > cpu_hotplug_begin/end instead.
> > 
> > One of the primary reasons to move away from the mutex semantics for
> > cpu-hotplug protection, was that there were a lot of places where
> > lock_cpu_hotplug() was used for protecting local data structures too,
> > when they had nothing to do with cpu-hotplug as such, and it resulted 
> > in a whole mess. It took people quite sometime to sort things out 
> > with the cpufreq subsystem.
> 
> cpu_present_map isn't a "local data structure" any more than
> cpu_online_map, and it is quite relevant to cpu hotplug.  We have to
> maintain the invariant that the set of cpus online is a subset of cpus
> present.
> 
> > Probably it would be a lot cleaner if we use get_online_cpus() for
> > protection against cpu-hotplug and use specific mutexes for serializing
> > accesses to local data structures. Thoughts?
> 
> I don't feel like I'm getting through here.  Let me restate.
> 
> If I'm reading them correctly, these patches are changing the behavior
> of lock_cpu_hotplug() from mutex-style locking to a kind of read-write
> locking.  I think that's fine, but the naming of the new API poorly
> reflects its real behavior.  Conversion of lock_cpu_hotplug() users
> should be done with care.  Most of them - those that need one of the
> cpu maps to remain unchanged during a critical section - can be
> considered readers.  But a few (such as pseries_add_processor() and
> pseries_remove_processor()) are writers, because they modify one of
> the maps.
> 
> So, why not:
> 
> get_cpus_online   -> cpumaps_read_lock
> put_cpus_online   -> cpumaps_read_unlock
> cpu_hotplug_begin -> cpumaps_write_lock
> cpu_hotplug_end   -> cpumaps_write_unlock
> 
> Or something similar?

Hi Nathan, 
I totally see your point and agree with it. 

However, though the new implementation appears to have
a read-write-semaphore behaviour it differs in 
following aspects:

a) This implementation allows natural recursion of reads, just
as in the case of preempt_count. 
This is not possible in case of a normal read-write
lock because if you have something like the following in
a timeline, 
R1 -- > R2--> W3 --> R1, where 
Ri is a read by a thread i and
Wi is a write by a thread i, 
then thread1 will deadlock as it will be blocked behind W3, 
holding the semaphore.

b) Regular Reader Writer semaphores are fair. As in if a new reader
arrives when a writer is already present, the new reader waits until
the writer in front of it finishes. But, in the new implementation, 
the new reader will proceed as long as the refcount is non-zero, and the
writer will get to run only when the number of readers = 0. However, the
readers which arrive when the writer is executing, will block until the
writer is done.

Because of these properties, the implementation is more similar to the
refcounting, except that it allows the new bunch of readers to
sleep during an ongoing write.

Like you pointed out, since the code of pseries_add_processor and 
other places where the cpu_present_map is being changed during the
runtime requires write protection, how if the 
cpu_hotplug_begin/cpu_hotplug_end support is made
available outside cpu.c and appropriately named ?

Personally I would propose the following names to reflect the behaviour, 
but if the community is okay with the word "lock" in the API's I don't
have any problems renaming them to what you've suggested.

/* 
 * Api's which ensure that the cpu_present_map and the cpu_online_map
 * do not change while operating in a critical section.
 */

 get_cpu_maps();
 put_cpu_maps();

/* 
 * Api's to serialize the updates to the cpu_present_map/cpu_online_map.
 */

 cpu_map_update_begin();
 cpu_map_update_done();

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 3/4] Replace per-subsystem mutexes with get_online_cpus
  2007-10-21 11:39   ` Oleg Nesterov
@ 2007-10-22  4:58     ` Gautham R Shenoy
  0 siblings, 0 replies; 29+ messages in thread
From: Gautham R Shenoy @ 2007-10-22  4:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Linus Torvalds, Andrew Morton, linux-kernel, Srivatsa Vaddagiri,
	Rusty Russel, Dipankar Sarma, Ingo Molnar, Paul E McKenney

On Sun, Oct 21, 2007 at 03:39:17PM +0400, Oleg Nesterov wrote:
> On 10/16, Gautham R Shenoy wrote:
> >
> > This patch converts the known per-subsystem cpu_hotplug mutexes to
> > get_online_cpus put_online_cpus.
> > It also eliminates the CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE hotplug
> > notification events.
> 
> Personally, I like the changes in workqueue.c very much, a couple of
> minor nits below.
> 
> > --- linux-2.6.23.orig/kernel/workqueue.c
> > +++ linux-2.6.23/kernel/workqueue.c
> > @@ -592,8 +592,6 @@ EXPORT_SYMBOL(schedule_delayed_work_on);
> >   * Returns zero on success.
> >   * Returns -ve errno on failure.
> >   *
> > - * Appears to be racy against CPU hotplug.
> > - *
> 
> see below,
> 
> >   * schedule_on_each_cpu() is very slow.
> >   */
> >  int schedule_on_each_cpu(work_func_t func)
> > @@ -605,7 +603,7 @@ int schedule_on_each_cpu(work_func_t fun
> >  	if (!works)
> >  		return -ENOMEM;
> >
> > -	preempt_disable();		/* CPU hotplug */
> > +	get_online_cpus();		/* CPU hotplug */
> >  	for_each_online_cpu(cpu) {
> >  		struct work_struct *work = per_cpu_ptr(works, cpu);
> >
> > @@ -613,7 +611,7 @@ int schedule_on_each_cpu(work_func_t fun
> >  		set_bit(WORK_STRUCT_PENDING, work_data_bits(work));
> >  		__queue_work(per_cpu_ptr(keventd_wq->cpu_wq, cpu), work);
> >  	}
> > -	preempt_enable();
> > +	put_online_cpus();
> >  	flush_workqueue(keventd_wq);
> 
> Still racy. To kill the race, please move flush_workqueue() up, before
> put_online_cpus().
> 
> > @@ -809,6 +809,7 @@ void destroy_workqueue(struct workqueue_
> >  	struct cpu_workqueue_struct *cwq;
> >  	int cpu;
> >
> > +	get_online_cpus();
> >  	mutex_lock(&workqueue_mutex);
> >  	list_del(&wq->list);
> >  	mutex_unlock(&workqueue_mutex);
> > @@ -817,6 +818,7 @@ void destroy_workqueue(struct workqueue_
> >  		cwq = per_cpu_ptr(wq->cpu_wq, cpu);
> >  		cleanup_workqueue_thread(cwq, cpu);
> >  	}
> > +	put_online_cpus();
> 
> Correct, but I'd suggest to do put_online_cpus() earlier, right after
> mutex_unlock(&workqueue_mutex).
> 
> > @@ -830,22 +832,17 @@ static int __devinit workqueue_cpu_callb
> >  	unsigned int cpu = (unsigned long)hcpu;
> >  	struct cpu_workqueue_struct *cwq;
> >  	struct workqueue_struct *wq;
> > +	int ret = NOTIFY_OK;
> >
> >  	action &= ~CPU_TASKS_FROZEN;
> >
> >  	switch (action) {
> > -	case CPU_LOCK_ACQUIRE:
> > -		mutex_lock(&workqueue_mutex);
> > -		return NOTIFY_OK;
> > -
> > -	case CPU_LOCK_RELEASE:
> > -		mutex_unlock(&workqueue_mutex);
> > -		return NOTIFY_OK;
> >
> 
> please remove this emtpy line
> 
> >  	case CPU_UP_PREPARE:
> >  		cpu_set(cpu, cpu_populated_map);
> >  	}
> >
> > +	mutex_lock(&workqueue_mutex);
> 
> We don't need workqueue_mutex here. With your patch workqueue_mutex protects
> list_head, nothing more. But all other callers (create/destroy) should take
> get_online_cpus() anyway. This means that we can convert workqueue_mutex to
> spinlock_t.

Thanks for the review! 
Will code these changes up in the next version and post them 
sometime soon.

> 
> Oleg.
> 

Thanks and Regards
gautham.
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2007-10-22  4:58 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-16 10:33 [RFC PATCH 0/4] Refcount Based Cpu-Hotplug Revisit Gautham R Shenoy
2007-10-16 10:34 ` [RFC PATCH 1/4] Refcount Based Cpu-Hotplug Implementation Gautham R Shenoy
2007-10-17  0:47   ` Rusty Russell
2007-10-17  5:37     ` Gautham R Shenoy
2007-10-17  6:29       ` Rusty Russell
2007-10-18  6:29         ` Gautham R Shenoy
2007-10-21 12:47       ` Oleg Nesterov
2007-10-17 10:53   ` Paul Jackson
2007-10-17 11:27     ` Paul Jackson
2007-10-17 11:50       ` Gautham R Shenoy
2007-10-17 12:04         ` Paul Jackson
2007-10-16 10:35 ` [RFC PATCH 2/4] Rename lock_cpu_hotplug to get_online_cpus Gautham R Shenoy
2007-10-17 16:13   ` Nathan Lynch
2007-10-18  7:57     ` Gautham R Shenoy
2007-10-18  8:22       ` Nathan Lynch
2007-10-18  8:59         ` Gautham R Shenoy
2007-10-18 17:30           ` Nathan Lynch
2007-10-19  5:04             ` Gautham R Shenoy
2007-10-22  0:43               ` Nathan Lynch
2007-10-22  4:51                 ` Gautham R Shenoy
2007-10-16 10:36 ` [RFC PATCH 3/4] Replace per-subsystem mutexes with get_online_cpus Gautham R Shenoy
2007-10-21 11:39   ` Oleg Nesterov
2007-10-22  4:58     ` Gautham R Shenoy
2007-10-16 10:37 ` [RFC PATCH 4/4] Remove CPU_DEAD/CPU_UP_CANCELLED handling from workqueue.c Gautham R Shenoy
2007-10-17 11:57   ` Oleg Nesterov
2007-10-16 17:20 ` [RFC PATCH 0/4] Refcount Based Cpu-Hotplug Revisit Linus Torvalds
2007-10-17  2:11   ` Dipankar Sarma
2007-10-17  2:23     ` Linus Torvalds
2007-10-17  4:17       ` Gautham R Shenoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox