[GIT PULL sched/core] cpu_stop: implement and use cpu

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [GIT PULL sched/core] cpu_stop: implement and use cpu_stop, take#3
@ 2010-05-06 16:52 Tejun Heo
  2010-05-06 16:52 ` [PATCH 1/5] cpu_stop: implement stop_cpu[s]() Tejun Heo
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Tejun Heo @ 2010-05-06 16:52 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel, paulmck

Hello, Ingo.

Please pull from the following branch to receive cpu_stop patches.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git cpu_stop

Adding mb()'s in synchronize_sched_expedited() is the only change
since the last take[2].  Both Peter and Paul have acked the patchset.
I'll post the patchset as replies to this message.

This patchset is on top of the current sched/core (99bd5e2f) and
contains the following changes.

Paul E. McKenney (1):
      scheduler: correctly place paranioa memory barriers in synchronize_sched_expedited()

Tejun Heo (4):
      cpu_stop: implement stop_cpu[s]()
      stop_machine: reimplement using cpu_stop
      scheduler: replace migration_thread with cpu_stop
      scheduler: kill paranoia check in synchronize_sched_expedited()

 Documentation/RCU/torture.txt |   10 -
 arch/s390/kernel/time.c       |    1 -
 drivers/xen/manage.c          |   14 +-
 include/linux/rcutiny.h       |    2 -
 include/linux/rcutree.h       |    1 -
 include/linux/stop_machine.h  |   59 +++--
 kernel/cpu.c                  |    8 -
 kernel/module.c               |   14 +-
 kernel/rcutorture.c           |    2 +-
 kernel/sched.c                |  284 +++++-----------------
 kernel/sched_fair.c           |   48 +++-
 kernel/stop_machine.c         |  530 +++++++++++++++++++++++++++++++----------
 12 files changed, 538 insertions(+), 435 deletions(-)

Thanks.

--
tejun

[L] http://thread.gmane.org/gmane.linux.kernel/980893

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/5] cpu_stop: implement stop_cpu[s]()
  2010-05-06 16:52 [GIT PULL sched/core] cpu_stop: implement and use cpu_stop, take#3 Tejun Heo
@ 2010-05-06 16:52 ` Tejun Heo
  2010-05-06 16:52 ` [PATCH 2/5] stop_machine: reimplement using cpu_stop Tejun Heo
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Tejun Heo @ 2010-05-06 16:52 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel, paulmck
  Cc: Tejun Heo, Oleg Nesterov, Dimitri Sivanich

Implement a simplistic per-cpu maximum priority cpu monopolization
mechanism.  A non-sleeping callback can be scheduled to run on one or
multiple cpus with maximum priority monopolozing those cpus.  This is
primarily to replace and unify RT workqueue usage in stop_machine and
scheduler migration_thread which currently is serving multiple
purposes.

Four functions are provided - stop_one_cpu(), stop_one_cpu_nowait(),
stop_cpus() and try_stop_cpus().

This is to allow clean sharing of resources among stop_cpu and all the
migration thread users.  One stopper thread per cpu is created which
is currently named "stopper/CPU".  This will eventually replace the
migration thread and take on its name.

* This facility was originally named cpuhog and lived in separate
  files but Peter Zijlstra nacked the name and thus got renamed to
  cpu_stop and moved into stop_machine.c.

* Better reporting of preemption leak as per Peter's suggestion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
---
 include/linux/stop_machine.h |   39 ++++-
 kernel/stop_machine.c        |  372 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 402 insertions(+), 9 deletions(-)

diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index baba3a2..efcbd6c 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -1,15 +1,46 @@
 #ifndef _LINUX_STOP_MACHINE
 #define _LINUX_STOP_MACHINE
-/* "Bogolock": stop the entire machine, disable interrupts.  This is a
-   very heavy lock, which is equivalent to grabbing every spinlock
-   (and more).  So the "read" side to such a lock is anything which
-   disables preeempt. */
+
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
+#include <linux/list.h>
 #include <asm/system.h>
 
 #if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
 
+/*
+ * stop_cpu[s]() is simplistic per-cpu maximum priority cpu
+ * monopolization mechanism.  The caller can specify a non-sleeping
+ * function to be executed on a single or multiple cpus preempting all
+ * other processes and monopolizing those cpus until it finishes.
+ *
+ * Resources for this mechanism are preallocated when a cpu is brought
+ * up and requests are guaranteed to be served as long as the target
+ * cpus are online.
+ */
+
+typedef int (*cpu_stop_fn_t)(void *arg);
+
+struct cpu_stop_work {
+	struct list_head	list;		/* cpu_stopper->works */
+	cpu_stop_fn_t		fn;
+	void			*arg;
+	struct cpu_stop_done	*done;
+};
+
+int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg);
+void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg,
+			 struct cpu_stop_work *work_buf);
+int stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg);
+int try_stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg);
+
+/*
+ * stop_machine "Bogolock": stop the entire machine, disable
+ * interrupts.  This is a very heavy lock, which is equivalent to
+ * grabbing every spinlock (and more).  So the "read" side to such a
+ * lock is anything which disables preeempt.
+ */
+
 /**
  * stop_machine: freeze the machine on all CPUs and run this function
  * @fn: the function to run
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 9bb9fb1..7e3f918 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -1,17 +1,379 @@
-/* Copyright 2008, 2005 Rusty Russell rusty@rustcorp.com.au IBM Corporation.
- * GPL v2 and any later version.
+/*
+ * kernel/stop_machine.c
+ *
+ * Copyright (C) 2008, 2005	IBM Corporation.
+ * Copyright (C) 2008, 2005	Rusty Russell rusty@rustcorp.com.au
+ * Copyright (C) 2010		SUSE Linux Products GmbH
+ * Copyright (C) 2010		Tejun Heo <tj@kernel.org>
+ *
+ * This file is released under the GPLv2 and any later version.
  */
+#include <linux/completion.h>
 #include <linux/cpu.h>
-#include <linux/err.h>
+#include <linux/init.h>
 #include <linux/kthread.h>
 #include <linux/module.h>
+#include <linux/percpu.h>
 #include <linux/sched.h>
 #include <linux/stop_machine.h>
-#include <linux/syscalls.h>
 #include <linux/interrupt.h>
+#include <linux/kallsyms.h>
 
 #include <asm/atomic.h>
-#include <asm/uaccess.h>
+
+/*
+ * Structure to determine completion condition and record errors.  May
+ * be shared by works on different cpus.
+ */
+struct cpu_stop_done {
+	atomic_t		nr_todo;	/* nr left to execute */
+	bool			executed;	/* actually executed? */
+	int			ret;		/* collected return value */
+	struct completion	completion;	/* fired if nr_todo reaches 0 */
+};
+
+/* the actual stopper, one per every possible cpu, enabled on online cpus */
+struct cpu_stopper {
+	spinlock_t		lock;
+	struct list_head	works;		/* list of pending works */
+	struct task_struct	*thread;	/* stopper thread */
+	bool			enabled;	/* is this stopper enabled? */
+};
+
+static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper);
+
+static void cpu_stop_init_done(struct cpu_stop_done *done, unsigned int nr_todo)
+{
+	memset(done, 0, sizeof(*done));
+	atomic_set(&done->nr_todo, nr_todo);
+	init_completion(&done->completion);
+}
+
+/* signal completion unless @done is NULL */
+static void cpu_stop_signal_done(struct cpu_stop_done *done, bool executed)
+{
+	if (done) {
+		if (executed)
+			done->executed = true;
+		if (atomic_dec_and_test(&done->nr_todo))
+			complete(&done->completion);
+	}
+}
+
+/* queue @work to @stopper.  if offline, @work is completed immediately */
+static void cpu_stop_queue_work(struct cpu_stopper *stopper,
+				struct cpu_stop_work *work)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&stopper->lock, flags);
+
+	if (stopper->enabled) {
+		list_add_tail(&work->list, &stopper->works);
+		wake_up_process(stopper->thread);
+	} else
+		cpu_stop_signal_done(work->done, false);
+
+	spin_unlock_irqrestore(&stopper->lock, flags);
+}
+
+/**
+ * stop_one_cpu - stop a cpu
+ * @cpu: cpu to stop
+ * @fn: function to execute
+ * @arg: argument to @fn
+ *
+ * Execute @fn(@arg) on @cpu.  @fn is run in a process context with
+ * the highest priority preempting any task on the cpu and
+ * monopolizing it.  This function returns after the execution is
+ * complete.
+ *
+ * This function doesn't guarantee @cpu stays online till @fn
+ * completes.  If @cpu goes down in the middle, execution may happen
+ * partially or fully on different cpus.  @fn should either be ready
+ * for that or the caller should ensure that @cpu stays online until
+ * this function completes.
+ *
+ * CONTEXT:
+ * Might sleep.
+ *
+ * RETURNS:
+ * -ENOENT if @fn(@arg) was not executed because @cpu was offline;
+ * otherwise, the return value of @fn.
+ */
+int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
+{
+	struct cpu_stop_done done;
+	struct cpu_stop_work work = { .fn = fn, .arg = arg, .done = &done };
+
+	cpu_stop_init_done(&done, 1);
+	cpu_stop_queue_work(&per_cpu(cpu_stopper, cpu), &work);
+	wait_for_completion(&done.completion);
+	return done.executed ? done.ret : -ENOENT;
+}
+
+/**
+ * stop_one_cpu_nowait - stop a cpu but don't wait for completion
+ * @cpu: cpu to stop
+ * @fn: function to execute
+ * @arg: argument to @fn
+ *
+ * Similar to stop_one_cpu() but doesn't wait for completion.  The
+ * caller is responsible for ensuring @work_buf is currently unused
+ * and will remain untouched until stopper starts executing @fn.
+ *
+ * CONTEXT:
+ * Don't care.
+ */
+void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg,
+			struct cpu_stop_work *work_buf)
+{
+	*work_buf = (struct cpu_stop_work){ .fn = fn, .arg = arg, };
+	cpu_stop_queue_work(&per_cpu(cpu_stopper, cpu), work_buf);
+}
+
+/* static data for stop_cpus */
+static DEFINE_MUTEX(stop_cpus_mutex);
+static DEFINE_PER_CPU(struct cpu_stop_work, stop_cpus_work);
+
+int __stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg)
+{
+	struct cpu_stop_work *work;
+	struct cpu_stop_done done;
+	unsigned int cpu;
+
+	/* initialize works and done */
+	for_each_cpu(cpu, cpumask) {
+		work = &per_cpu(stop_cpus_work, cpu);
+		work->fn = fn;
+		work->arg = arg;
+		work->done = &done;
+	}
+	cpu_stop_init_done(&done, cpumask_weight(cpumask));
+
+	/*
+	 * Disable preemption while queueing to avoid getting
+	 * preempted by a stopper which might wait for other stoppers
+	 * to enter @fn which can lead to deadlock.
+	 */
+	preempt_disable();
+	for_each_cpu(cpu, cpumask)
+		cpu_stop_queue_work(&per_cpu(cpu_stopper, cpu),
+				    &per_cpu(stop_cpus_work, cpu));
+	preempt_enable();
+
+	wait_for_completion(&done.completion);
+	return done.executed ? done.ret : -ENOENT;
+}
+
+/**
+ * stop_cpus - stop multiple cpus
+ * @cpumask: cpus to stop
+ * @fn: function to execute
+ * @arg: argument to @fn
+ *
+ * Execute @fn(@arg) on online cpus in @cpumask.  On each target cpu,
+ * @fn is run in a process context with the highest priority
+ * preempting any task on the cpu and monopolizing it.  This function
+ * returns after all executions are complete.
+ *
+ * This function doesn't guarantee the cpus in @cpumask stay online
+ * till @fn completes.  If some cpus go down in the middle, execution
+ * on the cpu may happen partially or fully on different cpus.  @fn
+ * should either be ready for that or the caller should ensure that
+ * the cpus stay online until this function completes.
+ *
+ * All stop_cpus() calls are serialized making it safe for @fn to wait
+ * for all cpus to start executing it.
+ *
+ * CONTEXT:
+ * Might sleep.
+ *
+ * RETURNS:
+ * -ENOENT if @fn(@arg) was not executed at all because all cpus in
+ * @cpumask were offline; otherwise, 0 if all executions of @fn
+ * returned 0, any non zero return value if any returned non zero.
+ */
+int stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg)
+{
+	int ret;
+
+	/* static works are used, process one request at a time */
+	mutex_lock(&stop_cpus_mutex);
+	ret = __stop_cpus(cpumask, fn, arg);
+	mutex_unlock(&stop_cpus_mutex);
+	return ret;
+}
+
+/**
+ * try_stop_cpus - try to stop multiple cpus
+ * @cpumask: cpus to stop
+ * @fn: function to execute
+ * @arg: argument to @fn
+ *
+ * Identical to stop_cpus() except that it fails with -EAGAIN if
+ * someone else is already using the facility.
+ *
+ * CONTEXT:
+ * Might sleep.
+ *
+ * RETURNS:
+ * -EAGAIN if someone else is already stopping cpus, -ENOENT if
+ * @fn(@arg) was not executed at all because all cpus in @cpumask were
+ * offline; otherwise, 0 if all executions of @fn returned 0, any non
+ * zero return value if any returned non zero.
+ */
+int try_stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg)
+{
+	int ret;
+
+	/* static works are used, process one request at a time */
+	if (!mutex_trylock(&stop_cpus_mutex))
+		return -EAGAIN;
+	ret = __stop_cpus(cpumask, fn, arg);
+	mutex_unlock(&stop_cpus_mutex);
+	return ret;
+}
+
+static int cpu_stopper_thread(void *data)
+{
+	struct cpu_stopper *stopper = data;
+	struct cpu_stop_work *work;
+	int ret;
+
+repeat:
+	set_current_state(TASK_INTERRUPTIBLE);	/* mb paired w/ kthread_stop */
+
+	if (kthread_should_stop()) {
+		__set_current_state(TASK_RUNNING);
+		return 0;
+	}
+
+	work = NULL;
+	spin_lock_irq(&stopper->lock);
+	if (!list_empty(&stopper->works)) {
+		work = list_first_entry(&stopper->works,
+					struct cpu_stop_work, list);
+		list_del_init(&work->list);
+	}
+	spin_unlock_irq(&stopper->lock);
+
+	if (work) {
+		cpu_stop_fn_t fn = work->fn;
+		void *arg = work->arg;
+		struct cpu_stop_done *done = work->done;
+		char ksym_buf[KSYM_NAME_LEN];
+
+		__set_current_state(TASK_RUNNING);
+
+		/* cpu stop callbacks are not allowed to sleep */
+		preempt_disable();
+
+		ret = fn(arg);
+		if (ret)
+			done->ret = ret;
+
+		/* restore preemption and check it's still balanced */
+		preempt_enable();
+		WARN_ONCE(preempt_count(),
+			  "cpu_stop: %s(%p) leaked preempt count\n",
+			  kallsyms_lookup((unsigned long)fn, NULL, NULL, NULL,
+					  ksym_buf), arg);
+
+		cpu_stop_signal_done(done, true);
+	} else
+		schedule();
+
+	goto repeat;
+}
+
+/* manage stopper for a cpu, mostly lifted from sched migration thread mgmt */
+static int __cpuinit cpu_stop_cpu_callback(struct notifier_block *nfb,
+					   unsigned long action, void *hcpu)
+{
+	struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
+	unsigned int cpu = (unsigned long)hcpu;
+	struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
+	struct cpu_stop_work *work;
+	struct task_struct *p;
+
+	switch (action & ~CPU_TASKS_FROZEN) {
+	case CPU_UP_PREPARE:
+		BUG_ON(stopper->thread || stopper->enabled ||
+		       !list_empty(&stopper->works));
+		p = kthread_create(cpu_stopper_thread, stopper, "stopper/%d",
+				   cpu);
+		if (IS_ERR(p))
+			return NOTIFY_BAD;
+		sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
+		get_task_struct(p);
+		stopper->thread = p;
+		break;
+
+	case CPU_ONLINE:
+		kthread_bind(stopper->thread, cpu);
+		/* strictly unnecessary, as first user will wake it */
+		wake_up_process(stopper->thread);
+		/* mark enabled */
+		spin_lock_irq(&stopper->lock);
+		stopper->enabled = true;
+		spin_unlock_irq(&stopper->lock);
+		break;
+
+#ifdef CONFIG_HOTPLUG_CPU
+	case CPU_UP_CANCELED:
+	case CPU_DEAD:
+		/* kill the stopper */
+		kthread_stop(stopper->thread);
+		/* drain remaining works */
+		spin_lock_irq(&stopper->lock);
+		list_for_each_entry(work, &stopper->works, list)
+			cpu_stop_signal_done(work->done, false);
+		stopper->enabled = false;
+		spin_unlock_irq(&stopper->lock);
+		/* release the stopper */
+		put_task_struct(stopper->thread);
+		stopper->thread = NULL;
+		break;
+#endif
+	}
+
+	return NOTIFY_OK;
+}
+
+/*
+ * Give it a higher priority so that cpu stopper is available to other
+ * cpu notifiers.  It currently shares the same priority as sched
+ * migration_notifier.
+ */
+static struct notifier_block __cpuinitdata cpu_stop_cpu_notifier = {
+	.notifier_call	= cpu_stop_cpu_callback,
+	.priority	= 10,
+};
+
+static int __init cpu_stop_init(void)
+{
+	void *bcpu = (void *)(long)smp_processor_id();
+	unsigned int cpu;
+	int err;
+
+	for_each_possible_cpu(cpu) {
+		struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
+
+		spin_lock_init(&stopper->lock);
+		INIT_LIST_HEAD(&stopper->works);
+	}
+
+	/* start one for the boot cpu */
+	err = cpu_stop_cpu_callback(&cpu_stop_cpu_notifier, CPU_UP_PREPARE,
+				    bcpu);
+	BUG_ON(err == NOTIFY_BAD);
+	cpu_stop_cpu_callback(&cpu_stop_cpu_notifier, CPU_ONLINE, bcpu);
+	register_cpu_notifier(&cpu_stop_cpu_notifier);
+
+	return 0;
+}
+early_initcall(cpu_stop_init);
 
 /* This controls the threads on each CPU. */
 enum stopmachine_state {
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/5] stop_machine: reimplement using cpu_stop
  2010-05-06 16:52 [GIT PULL sched/core] cpu_stop: implement and use cpu_stop, take#3 Tejun Heo
  2010-05-06 16:52 ` [PATCH 1/5] cpu_stop: implement stop_cpu[s]() Tejun Heo
@ 2010-05-06 16:52 ` Tejun Heo
  2010-05-11 21:50   ` Andrew Morton
  2010-05-06 16:52 ` [PATCH 3/5] sched: replace migration_thread with cpu_stop Tejun Heo
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2010-05-06 16:52 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel, paulmck
  Cc: Tejun Heo, Oleg Nesterov, Dimitri Sivanich

Reimplement stop_machine using cpu_stop.  As cpu stoppers are
guaranteed to be available for all online cpus,
stop_machine_create/destroy() are no longer necessary and removed.

With resource management and synchronization handled by cpu_stop, the
new implementation is much simpler.  Asking the cpu_stop to execute
the stop_cpu() state machine on all online cpus with cpu hotplug
disabled is enough.

stop_machine itself doesn't need to manage any global resources
anymore, so all per-instance information is rolled into struct
stop_machine_data and the mutex and all static data variables are
removed.

The previous implementation created and destroyed RT workqueues as
necessary which made stop_machine() calls highly expensive on very
large machines.  According to Dimitri Sivanich, preventing the dynamic
creation/destruction makes booting faster more than twice on very
large machines.  cpu_stop resources are preallocated for all online
cpus and should have the same effect.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
---
 arch/s390/kernel/time.c      |    1 -
 drivers/xen/manage.c         |   14 +---
 include/linux/stop_machine.h |   20 -----
 kernel/cpu.c                 |    8 --
 kernel/module.c              |   14 +---
 kernel/stop_machine.c        |  158 ++++++++++--------------------------------
 6 files changed, 42 insertions(+), 173 deletions(-)

diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index fba6dec..03d9656 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -390,7 +390,6 @@ static void __init time_init_wq(void)
 	if (time_sync_wq)
 		return;
 	time_sync_wq = create_singlethread_workqueue("timesync");
-	stop_machine_create();
 }
 
 /*
diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index 2ac4440..8943b8c 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -80,12 +80,6 @@ static void do_suspend(void)
 
 	shutting_down = SHUTDOWN_SUSPEND;
 
-	err = stop_machine_create();
-	if (err) {
-		printk(KERN_ERR "xen suspend: failed to setup stop_machine %d\n", err);
-		goto out;
-	}
-
 #ifdef CONFIG_PREEMPT
 	/* If the kernel is preemptible, we need to freeze all the processes
 	   to prevent them from being in the middle of a pagetable update
@@ -93,7 +87,7 @@ static void do_suspend(void)
 	err = freeze_processes();
 	if (err) {
 		printk(KERN_ERR "xen suspend: freeze failed %d\n", err);
-		goto out_destroy_sm;
+		goto out;
 	}
 #endif
 
@@ -136,12 +130,8 @@ out_resume:
 out_thaw:
 #ifdef CONFIG_PREEMPT
 	thaw_processes();
-
-out_destroy_sm:
-#endif
-	stop_machine_destroy();
-
 out:
+#endif
 	shutting_down = SHUTDOWN_INVALID;
 }
 #endif	/* CONFIG_PM_SLEEP */
diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index efcbd6c..0e552e7 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -67,23 +67,6 @@ int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus);
  */
 int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus);
 
-/**
- * stop_machine_create: create all stop_machine threads
- *
- * Description: This causes all stop_machine threads to be created before
- * stop_machine actually gets called. This can be used by subsystems that
- * need a non failing stop_machine infrastructure.
- */
-int stop_machine_create(void);
-
-/**
- * stop_machine_destroy: destroy all stop_machine threads
- *
- * Description: This causes all stop_machine threads which were created with
- * stop_machine_create to be destroyed again.
- */
-void stop_machine_destroy(void);
-
 #else
 
 static inline int stop_machine(int (*fn)(void *), void *data,
@@ -96,8 +79,5 @@ static inline int stop_machine(int (*fn)(void *), void *data,
 	return ret;
 }
 
-static inline int stop_machine_create(void) { return 0; }
-static inline void stop_machine_destroy(void) { }
-
 #endif /* CONFIG_SMP */
 #endif /* _LINUX_STOP_MACHINE */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 914aedc..5457775 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -266,9 +266,6 @@ int __ref cpu_down(unsigned int cpu)
 {
 	int err;
 
-	err = stop_machine_create();
-	if (err)
-		return err;
 	cpu_maps_update_begin();
 
 	if (cpu_hotplug_disabled) {
@@ -280,7 +277,6 @@ int __ref cpu_down(unsigned int cpu)
 
 out:
 	cpu_maps_update_done();
-	stop_machine_destroy();
 	return err;
 }
 EXPORT_SYMBOL(cpu_down);
@@ -361,9 +357,6 @@ int disable_nonboot_cpus(void)
 {
 	int cpu, first_cpu, error;
 
-	error = stop_machine_create();
-	if (error)
-		return error;
 	cpu_maps_update_begin();
 	first_cpu = cpumask_first(cpu_online_mask);
 	/*
@@ -394,7 +387,6 @@ int disable_nonboot_cpus(void)
 		printk(KERN_ERR "Non-boot CPUs are not disabled\n");
 	}
 	cpu_maps_update_done();
-	stop_machine_destroy();
 	return error;
 }
 
diff --git a/kernel/module.c b/kernel/module.c
index 1016b75..0838246 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -723,16 +723,8 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
 		return -EFAULT;
 	name[MODULE_NAME_LEN-1] = '\0';
 
-	/* Create stop_machine threads since free_module relies on
-	 * a non-failing stop_machine call. */
-	ret = stop_machine_create();
-	if (ret)
-		return ret;
-
-	if (mutex_lock_interruptible(&module_mutex) != 0) {
-		ret = -EINTR;
-		goto out_stop;
-	}
+	if (mutex_lock_interruptible(&module_mutex) != 0)
+		return -EINTR;
 
 	mod = find_module(name);
 	if (!mod) {
@@ -792,8 +784,6 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
 
  out:
 	mutex_unlock(&module_mutex);
-out_stop:
-	stop_machine_destroy();
 	return ret;
 }
 
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 7e3f918..884c7a1 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -388,174 +388,92 @@ enum stopmachine_state {
 	/* Exit */
 	STOPMACHINE_EXIT,
 };
-static enum stopmachine_state state;
 
 struct stop_machine_data {
-	int (*fn)(void *);
-	void *data;
-	int fnret;
+	int			(*fn)(void *);
+	void			*data;
+	/* Like num_online_cpus(), but hotplug cpu uses us, so we need this. */
+	unsigned int		num_threads;
+	const struct cpumask	*active_cpus;
+
+	enum stopmachine_state	state;
+	atomic_t		thread_ack;
 };
 
-/* Like num_online_cpus(), but hotplug cpu uses us, so we need this. */
-static unsigned int num_threads;
-static atomic_t thread_ack;
-static DEFINE_MUTEX(lock);
-/* setup_lock protects refcount, stop_machine_wq and stop_machine_work. */
-static DEFINE_MUTEX(setup_lock);
-/* Users of stop_machine. */
-static int refcount;
-static struct workqueue_struct *stop_machine_wq;
-static struct stop_machine_data active, idle;
-static const struct cpumask *active_cpus;
-static void __percpu *stop_machine_work;
-
-static void set_state(enum stopmachine_state newstate)
+static void set_state(struct stop_machine_data *smdata,
+		      enum stopmachine_state newstate)
 {
 	/* Reset ack counter. */
-	atomic_set(&thread_ack, num_threads);
+	atomic_set(&smdata->thread_ack, smdata->num_threads);
 	smp_wmb();
-	state = newstate;
+	smdata->state = newstate;
 }
 
 /* Last one to ack a state moves to the next state. */
-static void ack_state(void)
+static void ack_state(struct stop_machine_data *smdata)
 {
-	if (atomic_dec_and_test(&thread_ack))
-		set_state(state + 1);
+	if (atomic_dec_and_test(&smdata->thread_ack))
+		set_state(smdata, smdata->state + 1);
 }
 
-/* This is the actual function which stops the CPU. It runs
- * in the context of a dedicated stopmachine workqueue. */
-static void stop_cpu(struct work_struct *unused)
+/* This is the cpu_stop function which stops the CPU. */
+static int stop_machine_cpu_stop(void *data)
 {
+	struct stop_machine_data *smdata = data;
 	enum stopmachine_state curstate = STOPMACHINE_NONE;
-	struct stop_machine_data *smdata = &idle;
-	int cpu = smp_processor_id();
-	int err;
+	int cpu = smp_processor_id(), err = 0;
+	bool is_active;
+
+	if (!smdata->active_cpus)
+		is_active = cpu == cpumask_first(cpu_online_mask);
+	else
+		is_active = cpumask_test_cpu(cpu, smdata->active_cpus);
 
-	if (!active_cpus) {
-		if (cpu == cpumask_first(cpu_online_mask))
-			smdata = &active;
-	} else {
-		if (cpumask_test_cpu(cpu, active_cpus))
-			smdata = &active;
-	}
 	/* Simple state machine */
 	do {
 		/* Chill out and ensure we re-read stopmachine_state. */
 		cpu_relax();
-		if (state != curstate) {
-			curstate = state;
+		if (smdata->state != curstate) {
+			curstate = smdata->state;
 			switch (curstate) {
 			case STOPMACHINE_DISABLE_IRQ:
 				local_irq_disable();
 				hard_irq_disable();
 				break;
 			case STOPMACHINE_RUN:
-				/* On multiple CPUs only a single error code
-				 * is needed to tell that something failed. */
-				err = smdata->fn(smdata->data);
-				if (err)
-					smdata->fnret = err;
+				if (is_active)
+					err = smdata->fn(smdata->data);
 				break;
 			default:
 				break;
 			}
-			ack_state();
+			ack_state(smdata);
 		}
 	} while (curstate != STOPMACHINE_EXIT);
 
 	local_irq_enable();
+	return err;
 }
 
-/* Callback for CPUs which aren't supposed to do anything. */
-static int chill(void *unused)
-{
-	return 0;
-}
-
-int stop_machine_create(void)
-{
-	mutex_lock(&setup_lock);
-	if (refcount)
-		goto done;
-	stop_machine_wq = create_rt_workqueue("kstop");
-	if (!stop_machine_wq)
-		goto err_out;
-	stop_machine_work = alloc_percpu(struct work_struct);
-	if (!stop_machine_work)
-		goto err_out;
-done:
-	refcount++;
-	mutex_unlock(&setup_lock);
-	return 0;
-
-err_out:
-	if (stop_machine_wq)
-		destroy_workqueue(stop_machine_wq);
-	mutex_unlock(&setup_lock);
-	return -ENOMEM;
-}
-EXPORT_SYMBOL_GPL(stop_machine_create);
-
-void stop_machine_destroy(void)
-{
-	mutex_lock(&setup_lock);
-	refcount--;
-	if (refcount)
-		goto done;
-	destroy_workqueue(stop_machine_wq);
-	free_percpu(stop_machine_work);
-done:
-	mutex_unlock(&setup_lock);
-}
-EXPORT_SYMBOL_GPL(stop_machine_destroy);
-
 int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
 {
-	struct work_struct *sm_work;
-	int i, ret;
-
-	/* Set up initial state. */
-	mutex_lock(&lock);
-	num_threads = num_online_cpus();
-	active_cpus = cpus;
-	active.fn = fn;
-	active.data = data;
-	active.fnret = 0;
-	idle.fn = chill;
-	idle.data = NULL;
-
-	set_state(STOPMACHINE_PREPARE);
-
-	/* Schedule the stop_cpu work on all cpus: hold this CPU so one
-	 * doesn't hit this CPU until we're ready. */
-	get_cpu();
-	for_each_online_cpu(i) {
-		sm_work = per_cpu_ptr(stop_machine_work, i);
-		INIT_WORK(sm_work, stop_cpu);
-		queue_work_on(i, stop_machine_wq, sm_work);
-	}
-	/* This will release the thread on our CPU. */
-	put_cpu();
-	flush_workqueue(stop_machine_wq);
-	ret = active.fnret;
-	mutex_unlock(&lock);
-	return ret;
+	struct stop_machine_data smdata = { .fn = fn, .data = data,
+					    .num_threads = num_online_cpus(),
+					    .active_cpus = cpus };
+
+	/* Set the initial state and stop all online cpus. */
+	set_state(&smdata, STOPMACHINE_PREPARE);
+	return stop_cpus(cpu_online_mask, stop_machine_cpu_stop, &smdata);
 }
 
 int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
 {
 	int ret;
 
-	ret = stop_machine_create();
-	if (ret)
-		return ret;
 	/* No CPUs can come up or down during this. */
 	get_online_cpus();
 	ret = __stop_machine(fn, data, cpus);
 	put_online_cpus();
-	stop_machine_destroy();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(stop_machine);
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/5] sched: replace migration_thread with cpu_stop
  2010-05-06 16:52 [GIT PULL sched/core] cpu_stop: implement and use cpu_stop, take#3 Tejun Heo
  2010-05-06 16:52 ` [PATCH 1/5] cpu_stop: implement stop_cpu[s]() Tejun Heo
  2010-05-06 16:52 ` [PATCH 2/5] stop_machine: reimplement using cpu_stop Tejun Heo
@ 2010-05-06 16:52 ` Tejun Heo
  2010-05-06 19:29   ` Paul E. McKenney
  2010-05-06 16:52 ` [PATCH 4/5] sched: kill paranoia check in synchronize_sched_expedited() Tejun Heo
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2010-05-06 16:52 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel, paulmck
  Cc: Tejun Heo, Dipankar Sarma, Josh Triplett, Oleg Nesterov,
	Dimitri Sivanich

Currently migration_thread is serving three purposes - migration
pusher, context to execute active_load_balance() and forced context
switcher for expedited RCU synchronize_sched.  All three roles are
hardcoded into migration_thread() and determining which job is
scheduled is slightly messy.

This patch kills migration_thread and replaces all three uses with
cpu_stop.  The three different roles of migration_thread() are
splitted into three separate cpu_stop callbacks -
migration_cpu_stop(), active_load_balance_cpu_stop() and
synchronize_sched_expedited_cpu_stop() - and each use case now simply
asks cpu_stop to execute the callback as necessary.

synchronize_sched_expedited() was implemented with private
preallocated resources and custom multi-cpu queueing and waiting
logic, both of which are provided by cpu_stop.
synchronize_sched_expedited_count is made atomic and all other shared
resources along with the mutex are dropped.

synchronize_sched_expedited() also implemented a check to detect cases
where not all the callback got executed on their assigned cpus and
fall back to synchronize_sched().  If called with cpu hotplug blocked,
cpu_stop already guarantees that and the condition cannot happen;
otherwise, stop_machine() would break.  However, this patch preserves
the paranoid check using a cpumask to record on which cpus the stopper
ran so that it can serve as a bisection point if something actually
goes wrong theree.

Because the internal execution state is no longer visible,
rcu_expedited_torture_stats() is removed.

This patch also renames cpu_stop threads to from "stopper/%d" to
"migration/%d".  The names of these threads ultimately don't matter
and there's no reason to make unnecessary userland visible changes.

With this patch applied, stop_machine() and sched now share the same
resources.  stop_machine() is faster without wasting any resources and
sched migration users are much cleaner.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Josh Triplett <josh@freedesktop.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
---
 Documentation/RCU/torture.txt |   10 --
 include/linux/rcutiny.h       |    2 -
 include/linux/rcutree.h       |    1 -
 kernel/rcutorture.c           |    2 +-
 kernel/sched.c                |  315 ++++++++++++-----------------------------
 kernel/sched_fair.c           |   48 +++++--
 kernel/stop_machine.c         |    2 +-
 7 files changed, 127 insertions(+), 253 deletions(-)

diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt
index 0e50bc2..5d90167 100644
--- a/Documentation/RCU/torture.txt
+++ b/Documentation/RCU/torture.txt
@@ -182,16 +182,6 @@ Similarly, sched_expedited RCU provides the following:
 	sched_expedited-torture: Reader Pipe:  12660320201 95875 0 0 0 0 0 0 0 0 0
 	sched_expedited-torture: Reader Batch:  12660424885 0 0 0 0 0 0 0 0 0 0
 	sched_expedited-torture: Free-Block Circulation:  1090795 1090795 1090794 1090793 1090792 1090791 1090790 1090789 1090788 1090787 0
-	state: -1 / 0:0 3:0 4:0
-
-As before, the first four lines are similar to those for RCU.
-The last line shows the task-migration state.  The first number is
--1 if synchronize_sched_expedited() is idle, -2 if in the process of
-posting wakeups to the migration kthreads, and N when waiting on CPU N.
-Each of the colon-separated fields following the "/" is a CPU:state pair.
-Valid states are "0" for idle, "1" for waiting for quiescent state,
-"2" for passed through quiescent state, and "3" when a race with a
-CPU-hotplug event forces use of the synchronize_sched() primitive.
 
 
 USAGE
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index a519587..0006b2d 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -60,8 +60,6 @@ static inline long rcu_batches_completed_bh(void)
 	return 0;
 }
 
-extern int rcu_expedited_torture_stats(char *page);
-
 static inline void rcu_force_quiescent_state(void)
 {
 }
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 42cc3a0..24e467e 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -35,7 +35,6 @@ struct notifier_block;
 extern void rcu_sched_qs(int cpu);
 extern void rcu_bh_qs(int cpu);
 extern int rcu_needs_cpu(int cpu);
-extern int rcu_expedited_torture_stats(char *page);
 
 #ifdef CONFIG_TREE_PREEMPT_RCU
 
diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c
index 58df55b..2b676f3 100644
--- a/kernel/rcutorture.c
+++ b/kernel/rcutorture.c
@@ -669,7 +669,7 @@ static struct rcu_torture_ops sched_expedited_ops = {
 	.sync		= synchronize_sched_expedited,
 	.cb_barrier	= NULL,
 	.fqs		= rcu_sched_force_quiescent_state,
-	.stats		= rcu_expedited_torture_stats,
+	.stats		= NULL,
 	.irq_capable	= 1,
 	.name		= "sched_expedited"
 };
diff --git a/kernel/sched.c b/kernel/sched.c
index 4956ed0..f1d577a 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -55,9 +55,9 @@
 #include <linux/cpu.h>
 #include <linux/cpuset.h>
 #include <linux/percpu.h>
-#include <linux/kthread.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
+#include <linux/stop_machine.h>
 #include <linux/sysctl.h>
 #include <linux/syscalls.h>
 #include <linux/times.h>
@@ -539,15 +539,13 @@ struct rq {
 	int post_schedule;
 	int active_balance;
 	int push_cpu;
+	struct cpu_stop_work active_balance_work;
 	/* cpu of this runqueue: */
 	int cpu;
 	int online;
 
 	unsigned long avg_load_per_task;
 
-	struct task_struct *migration_thread;
-	struct list_head migration_queue;
-
 	u64 rt_avg;
 	u64 age_stamp;
 	u64 idle_stamp;
@@ -2037,21 +2035,18 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 	__set_task_cpu(p, new_cpu);
 }
 
-struct migration_req {
-	struct list_head list;
-
+struct migration_arg {
 	struct task_struct *task;
 	int dest_cpu;
-
-	struct completion done;
 };
 
+static int migration_cpu_stop(void *data);
+
 /*
  * The task's runqueue lock must be held.
  * Returns true if you have to wait for migration thread.
  */
-static int
-migrate_task(struct task_struct *p, int dest_cpu, struct migration_req *req)
+static bool migrate_task(struct task_struct *p, int dest_cpu)
 {
 	struct rq *rq = task_rq(p);
 
@@ -2059,15 +2054,7 @@ migrate_task(struct task_struct *p, int dest_cpu, struct migration_req *req)
 	 * If the task is not on a runqueue (and not running), then
 	 * the next wake-up will properly place the task.
 	 */
-	if (!p->se.on_rq && !task_running(rq, p))
-		return 0;
-
-	init_completion(&req->done);
-	req->task = p;
-	req->dest_cpu = dest_cpu;
-	list_add(&req->list, &rq->migration_queue);
-
-	return 1;
+	return p->se.on_rq || task_running(rq, p);
 }
 
 /*
@@ -3110,7 +3097,6 @@ static void update_cpu_load(struct rq *this_rq)
 void sched_exec(void)
 {
 	struct task_struct *p = current;
-	struct migration_req req;
 	unsigned long flags;
 	struct rq *rq;
 	int dest_cpu;
@@ -3124,17 +3110,11 @@ void sched_exec(void)
 	 * select_task_rq() can race against ->cpus_allowed
 	 */
 	if (cpumask_test_cpu(dest_cpu, &p->cpus_allowed) &&
-	    likely(cpu_active(dest_cpu)) &&
-	    migrate_task(p, dest_cpu, &req)) {
-		/* Need to wait for migration thread (might exit: take ref). */
-		struct task_struct *mt = rq->migration_thread;
+	    likely(cpu_active(dest_cpu)) && migrate_task(p, dest_cpu)) {
+		struct migration_arg arg = { p, dest_cpu };
 
-		get_task_struct(mt);
 		task_rq_unlock(rq, &flags);
-		wake_up_process(mt);
-		put_task_struct(mt);
-		wait_for_completion(&req.done);
-
+		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
 		return;
 	}
 unlock:
@@ -5290,17 +5270,15 @@ static inline void sched_init_granularity(void)
 /*
  * This is how migration works:
  *
- * 1) we queue a struct migration_req structure in the source CPU's
- *    runqueue and wake up that CPU's migration thread.
- * 2) we down() the locked semaphore => thread blocks.
- * 3) migration thread wakes up (implicitly it forces the migrated
- *    thread off the CPU)
- * 4) it gets the migration request and checks whether the migrated
- *    task is still in the wrong runqueue.
- * 5) if it's in the wrong runqueue then the migration thread removes
+ * 1) we invoke migration_cpu_stop() on the target CPU using
+ *    stop_one_cpu().
+ * 2) stopper starts to run (implicitly forcing the migrated thread
+ *    off the CPU)
+ * 3) it checks whether the migrated task is still in the wrong runqueue.
+ * 4) if it's in the wrong runqueue then the migration thread removes
  *    it and puts it into the right queue.
- * 6) migration thread up()s the semaphore.
- * 7) we wake up and the migration is done.
+ * 5) stopper completes and stop_one_cpu() returns and the migration
+ *    is done.
  */
 
 /*
@@ -5314,9 +5292,9 @@ static inline void sched_init_granularity(void)
  */
 int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 {
-	struct migration_req req;
 	unsigned long flags;
 	struct rq *rq;
+	unsigned int dest_cpu;
 	int ret = 0;
 
 	/*
@@ -5354,15 +5332,12 @@ again:
 	if (cpumask_test_cpu(task_cpu(p), new_mask))
 		goto out;
 
-	if (migrate_task(p, cpumask_any_and(cpu_active_mask, new_mask), &req)) {
+	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
+	if (migrate_task(p, dest_cpu)) {
+		struct migration_arg arg = { p, dest_cpu };
 		/* Need help from migration thread: drop lock and wait. */
-		struct task_struct *mt = rq->migration_thread;
-
-		get_task_struct(mt);
 		task_rq_unlock(rq, &flags);
-		wake_up_process(mt);
-		put_task_struct(mt);
-		wait_for_completion(&req.done);
+		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
 		tlb_migrate_finish(p->mm);
 		return 0;
 	}
@@ -5420,70 +5395,22 @@ fail:
 	return ret;
 }
 
-#define RCU_MIGRATION_IDLE	0
-#define RCU_MIGRATION_NEED_QS	1
-#define RCU_MIGRATION_GOT_QS	2
-#define RCU_MIGRATION_MUST_SYNC	3
-
 /*
- * migration_thread - this is a highprio system thread that performs
- * thread migration by bumping thread off CPU then 'pushing' onto
- * another runqueue.
+ * migration_cpu_stop - this will be executed by a highprio stopper thread
+ * and performs thread migration by bumping thread off CPU then
+ * 'pushing' onto another runqueue.
  */
-static int migration_thread(void *data)
+static int migration_cpu_stop(void *data)
 {
-	int badcpu;
-	int cpu = (long)data;
-	struct rq *rq;
-
-	rq = cpu_rq(cpu);
-	BUG_ON(rq->migration_thread != current);
-
-	set_current_state(TASK_INTERRUPTIBLE);
-	while (!kthread_should_stop()) {
-		struct migration_req *req;
-		struct list_head *head;
-
-		raw_spin_lock_irq(&rq->lock);
-
-		if (cpu_is_offline(cpu)) {
-			raw_spin_unlock_irq(&rq->lock);
-			break;
-		}
-
-		if (rq->active_balance) {
-			active_load_balance(rq, cpu);
-			rq->active_balance = 0;
-		}
-
-		head = &rq->migration_queue;
-
-		if (list_empty(head)) {
-			raw_spin_unlock_irq(&rq->lock);
-			schedule();
-			set_current_state(TASK_INTERRUPTIBLE);
-			continue;
-		}
-		req = list_entry(head->next, struct migration_req, list);
-		list_del_init(head->next);
-
-		if (req->task != NULL) {
-			raw_spin_unlock(&rq->lock);
-			__migrate_task(req->task, cpu, req->dest_cpu);
-		} else if (likely(cpu == (badcpu = smp_processor_id()))) {
-			req->dest_cpu = RCU_MIGRATION_GOT_QS;
-			raw_spin_unlock(&rq->lock);
-		} else {
-			req->dest_cpu = RCU_MIGRATION_MUST_SYNC;
-			raw_spin_unlock(&rq->lock);
-			WARN_ONCE(1, "migration_thread() on CPU %d, expected %d\n", badcpu, cpu);
-		}
-		local_irq_enable();
-
-		complete(&req->done);
-	}
-	__set_current_state(TASK_RUNNING);
+	struct migration_arg *arg = data;
 
+	/*
+	 * The original target cpu might have gone down and we might
+	 * be on another cpu but it doesn't matter.
+	 */
+	local_irq_disable();
+	__migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
+	local_irq_enable();
 	return 0;
 }
 
@@ -5850,35 +5777,20 @@ static void set_rq_offline(struct rq *rq)
 static int __cpuinit
 migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
-	struct task_struct *p;
 	int cpu = (long)hcpu;
 	unsigned long flags;
-	struct rq *rq;
+	struct rq *rq = cpu_rq(cpu);
 
 	switch (action) {
 
 	case CPU_UP_PREPARE:
 	case CPU_UP_PREPARE_FROZEN:
-		p = kthread_create(migration_thread, hcpu, "migration/%d", cpu);
-		if (IS_ERR(p))
-			return NOTIFY_BAD;
-		kthread_bind(p, cpu);
-		/* Must be high prio: stop_machine expects to yield to it. */
-		rq = task_rq_lock(p, &flags);
-		__setscheduler(rq, p, SCHED_FIFO, MAX_RT_PRIO-1);
-		task_rq_unlock(rq, &flags);
-		get_task_struct(p);
-		cpu_rq(cpu)->migration_thread = p;
 		rq->calc_load_update = calc_load_update;
 		break;
 
 	case CPU_ONLINE:
 	case CPU_ONLINE_FROZEN:
-		/* Strictly unnecessary, as first user will wake it. */
-		wake_up_process(cpu_rq(cpu)->migration_thread);
-
 		/* Update our root-domain */
-		rq = cpu_rq(cpu);
 		raw_spin_lock_irqsave(&rq->lock, flags);
 		if (rq->rd) {
 			BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
@@ -5889,25 +5801,9 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 		break;
 
 #ifdef CONFIG_HOTPLUG_CPU
-	case CPU_UP_CANCELED:
-	case CPU_UP_CANCELED_FROZEN:
-		if (!cpu_rq(cpu)->migration_thread)
-			break;
-		/* Unbind it from offline cpu so it can run. Fall thru. */
-		kthread_bind(cpu_rq(cpu)->migration_thread,
-			     cpumask_any(cpu_online_mask));
-		kthread_stop(cpu_rq(cpu)->migration_thread);
-		put_task_struct(cpu_rq(cpu)->migration_thread);
-		cpu_rq(cpu)->migration_thread = NULL;
-		break;
-
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN:
 		migrate_live_tasks(cpu);
-		rq = cpu_rq(cpu);
-		kthread_stop(rq->migration_thread);
-		put_task_struct(rq->migration_thread);
-		rq->migration_thread = NULL;
 		/* Idle task back to normal (off runqueue, low prio) */
 		raw_spin_lock_irq(&rq->lock);
 		deactivate_task(rq, rq->idle, 0);
@@ -5918,29 +5814,11 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 		migrate_nr_uninterruptible(rq);
 		BUG_ON(rq->nr_running != 0);
 		calc_global_load_remove(rq);
-		/*
-		 * No need to migrate the tasks: it was best-effort if
-		 * they didn't take sched_hotcpu_mutex. Just wake up
-		 * the requestors.
-		 */
-		raw_spin_lock_irq(&rq->lock);
-		while (!list_empty(&rq->migration_queue)) {
-			struct migration_req *req;
-
-			req = list_entry(rq->migration_queue.next,
-					 struct migration_req, list);
-			list_del_init(&req->list);
-			raw_spin_unlock_irq(&rq->lock);
-			complete(&req->done);
-			raw_spin_lock_irq(&rq->lock);
-		}
-		raw_spin_unlock_irq(&rq->lock);
 		break;
 
 	case CPU_DYING:
 	case CPU_DYING_FROZEN:
 		/* Update our root-domain */
-		rq = cpu_rq(cpu);
 		raw_spin_lock_irqsave(&rq->lock, flags);
 		if (rq->rd) {
 			BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
@@ -7757,10 +7635,8 @@ void __init sched_init(void)
 		rq->push_cpu = 0;
 		rq->cpu = i;
 		rq->online = 0;
-		rq->migration_thread = NULL;
 		rq->idle_stamp = 0;
 		rq->avg_idle = 2*sysctl_sched_migration_cost;
-		INIT_LIST_HEAD(&rq->migration_queue);
 		rq_attach_root(rq, &def_root_domain);
 #endif
 		init_rq_hrtick(rq);
@@ -9054,43 +8930,39 @@ struct cgroup_subsys cpuacct_subsys = {
 
 #ifndef CONFIG_SMP
 
-int rcu_expedited_torture_stats(char *page)
-{
-	return 0;
-}
-EXPORT_SYMBOL_GPL(rcu_expedited_torture_stats);
-
 void synchronize_sched_expedited(void)
 {
+	/*
+	 * There must be a full memory barrier on each affected CPU
+	 * between the time that try_stop_cpus() is called and the
+	 * time that it returns.
+	 *
+	 * In the current initial implementation of cpu_stop, the
+	 * above condition is already met when the control reaches
+	 * this point and the following smp_mb() is not strictly
+	 * necessary.  Do smp_mb() anyway for documentation and
+	 * robustness against future implementation changes.
+	 */
+	smp_mb();
 }
 EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
 
 #else /* #ifndef CONFIG_SMP */
 
-static DEFINE_PER_CPU(struct migration_req, rcu_migration_req);
-static DEFINE_MUTEX(rcu_sched_expedited_mutex);
-
-#define RCU_EXPEDITED_STATE_POST -2
-#define RCU_EXPEDITED_STATE_IDLE -1
+static atomic_t synchronize_sched_expedited_count = ATOMIC_INIT(0);
 
-static int rcu_expedited_state = RCU_EXPEDITED_STATE_IDLE;
-
-int rcu_expedited_torture_stats(char *page)
+static int synchronize_sched_expedited_cpu_stop(void *data)
 {
-	int cnt = 0;
-	int cpu;
+	static DEFINE_SPINLOCK(done_mask_lock);
+	struct cpumask *done_mask = data;
 
-	cnt += sprintf(&page[cnt], "state: %d /", rcu_expedited_state);
-	for_each_online_cpu(cpu) {
-		 cnt += sprintf(&page[cnt], " %d:%d",
-				cpu, per_cpu(rcu_migration_req, cpu).dest_cpu);
+	if (done_mask) {
+		spin_lock(&done_mask_lock);
+		cpumask_set_cpu(smp_processor_id(), done_mask);
+		spin_unlock(&done_mask_lock);
 	}
-	cnt += sprintf(&page[cnt], "\n");
-	return cnt;
+	return 0;
 }
-EXPORT_SYMBOL_GPL(rcu_expedited_torture_stats);
-
-static long synchronize_sched_expedited_count;
 
 /*
  * Wait for an rcu-sched grace period to elapse, but use "big hammer"
@@ -9104,60 +8976,55 @@ static long synchronize_sched_expedited_count;
  */
 void synchronize_sched_expedited(void)
 {
-	int cpu;
-	unsigned long flags;
-	bool need_full_sync = 0;
-	struct rq *rq;
-	struct migration_req *req;
-	long snap;
-	int trycount = 0;
+	cpumask_var_t done_mask_var;
+	struct cpumask *done_mask = NULL;
+	int snap, trycount = 0;
+
+	/*
+	 * done_mask is used to check that all cpus actually have
+	 * finished running the stopper, which is guaranteed by
+	 * stop_cpus() if it's called with cpu hotplug blocked.  Keep
+	 * the paranoia for now but it's best effort if cpumask is off
+	 * stack.
+	 */
+	if (zalloc_cpumask_var(&done_mask_var, GFP_ATOMIC))
+		done_mask = done_mask_var;
 
 	smp_mb();  /* ensure prior mod happens before capturing snap. */
-	snap = ACCESS_ONCE(synchronize_sched_expedited_count) + 1;
+	snap = atomic_read(&synchronize_sched_expedited_count) + 1;
 	get_online_cpus();
-	while (!mutex_trylock(&rcu_sched_expedited_mutex)) {
+	while (try_stop_cpus(cpu_online_mask,
+			     synchronize_sched_expedited_cpu_stop,
+			     done_mask) == -EAGAIN) {
 		put_online_cpus();
 		if (trycount++ < 10)
 			udelay(trycount * num_online_cpus());
 		else {
 			synchronize_sched();
-			return;
+			goto free_out;
 		}
-		if (ACCESS_ONCE(synchronize_sched_expedited_count) - snap > 0) {
+		if (atomic_read(&synchronize_sched_expedited_count) - snap > 0) {
 			smp_mb(); /* ensure test happens before caller kfree */
-			return;
+			goto free_out;
 		}
 		get_online_cpus();
 	}
-	rcu_expedited_state = RCU_EXPEDITED_STATE_POST;
-	for_each_online_cpu(cpu) {
-		rq = cpu_rq(cpu);
-		req = &per_cpu(rcu_migration_req, cpu);
-		init_completion(&req->done);
-		req->task = NULL;
-		req->dest_cpu = RCU_MIGRATION_NEED_QS;
-		raw_spin_lock_irqsave(&rq->lock, flags);
-		list_add(&req->list, &rq->migration_queue);
-		raw_spin_unlock_irqrestore(&rq->lock, flags);
-		wake_up_process(rq->migration_thread);
-	}
-	for_each_online_cpu(cpu) {
-		rcu_expedited_state = cpu;
-		req = &per_cpu(rcu_migration_req, cpu);
-		rq = cpu_rq(cpu);
-		wait_for_completion(&req->done);
-		raw_spin_lock_irqsave(&rq->lock, flags);
-		if (unlikely(req->dest_cpu == RCU_MIGRATION_MUST_SYNC))
-			need_full_sync = 1;
-		req->dest_cpu = RCU_MIGRATION_IDLE;
-		raw_spin_unlock_irqrestore(&rq->lock, flags);
-	}
-	rcu_expedited_state = RCU_EXPEDITED_STATE_IDLE;
-	synchronize_sched_expedited_count++;
-	mutex_unlock(&rcu_sched_expedited_mutex);
+	atomic_inc(&synchronize_sched_expedited_count);
+	if (done_mask)
+		cpumask_xor(done_mask, done_mask, cpu_online_mask);
 	put_online_cpus();
-	if (need_full_sync)
+
+	/* paranoia - this can't happen */
+	if (done_mask && cpumask_weight(done_mask)) {
+		char buf[80];
+
+		cpulist_scnprintf(buf, sizeof(buf), done_mask);
+		WARN_ONCE(1, "synchronize_sched_expedited: cpu online and done masks disagree on %d cpus: %s\n",
+			  cpumask_weight(done_mask), buf);
 		synchronize_sched();
+	}
+free_out:
+	free_cpumask_var(done_mask_var);
 }
 EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
 
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index cbd8b8a..217e4a9 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2798,6 +2798,8 @@ static int need_active_balance(struct sched_domain *sd, int sd_idle, int idle)
 	return unlikely(sd->nr_balance_failed > sd->cache_nice_tries+2);
 }
 
+static int active_load_balance_cpu_stop(void *data);
+
 /*
  * Check this_cpu to ensure it is balanced within domain. Attempt to move
  * tasks if there is an imbalance.
@@ -2887,8 +2889,9 @@ redo:
 		if (need_active_balance(sd, sd_idle, idle)) {
 			raw_spin_lock_irqsave(&busiest->lock, flags);
 
-			/* don't kick the migration_thread, if the curr
-			 * task on busiest cpu can't be moved to this_cpu
+			/* don't kick the active_load_balance_cpu_stop,
+			 * if the curr task on busiest cpu can't be
+			 * moved to this_cpu
 			 */
 			if (!cpumask_test_cpu(this_cpu,
 					      &busiest->curr->cpus_allowed)) {
@@ -2898,14 +2901,22 @@ redo:
 				goto out_one_pinned;
 			}
 
+			/*
+			 * ->active_balance synchronizes accesses to
+			 * ->active_balance_work.  Once set, it's cleared
+			 * only after active load balance is finished.
+			 */
 			if (!busiest->active_balance) {
 				busiest->active_balance = 1;
 				busiest->push_cpu = this_cpu;
 				active_balance = 1;
 			}
 			raw_spin_unlock_irqrestore(&busiest->lock, flags);
+
 			if (active_balance)
-				wake_up_process(busiest->migration_thread);
+				stop_one_cpu_nowait(cpu_of(busiest),
+					active_load_balance_cpu_stop, busiest,
+					&busiest->active_balance_work);
 
 			/*
 			 * We've kicked active balancing, reset the failure
@@ -3012,24 +3023,29 @@ static void idle_balance(int this_cpu, struct rq *this_rq)
 }
 
 /*
- * active_load_balance is run by migration threads. It pushes running tasks
- * off the busiest CPU onto idle CPUs. It requires at least 1 task to be
- * running on each physical CPU where possible, and avoids physical /
- * logical imbalances.
- *
- * Called with busiest_rq locked.
+ * active_load_balance_cpu_stop is run by cpu stopper. It pushes
+ * running tasks off the busiest CPU onto idle CPUs. It requires at
+ * least 1 task to be running on each physical CPU where possible, and
+ * avoids physical / logical imbalances.
  */
-static void active_load_balance(struct rq *busiest_rq, int busiest_cpu)
+static int active_load_balance_cpu_stop(void *data)
 {
+	struct rq *busiest_rq = data;
+	int busiest_cpu = cpu_of(busiest_rq);
 	int target_cpu = busiest_rq->push_cpu;
+	struct rq *target_rq = cpu_rq(target_cpu);
 	struct sched_domain *sd;
-	struct rq *target_rq;
+
+	raw_spin_lock_irq(&busiest_rq->lock);
+
+	/* make sure the requested cpu hasn't gone down in the meantime */
+	if (unlikely(busiest_cpu != smp_processor_id() ||
+		     !busiest_rq->active_balance))
+		goto out_unlock;
 
 	/* Is there any task to move? */
 	if (busiest_rq->nr_running <= 1)
-		return;
-
-	target_rq = cpu_rq(target_cpu);
+		goto out_unlock;
 
 	/*
 	 * This condition is "impossible", if it occurs
@@ -3058,6 +3074,10 @@ static void active_load_balance(struct rq *busiest_rq, int busiest_cpu)
 			schedstat_inc(sd, alb_failed);
 	}
 	double_unlock_balance(busiest_rq, target_rq);
+out_unlock:
+	busiest_rq->active_balance = 0;
+	raw_spin_unlock_irq(&busiest_rq->lock);
+	return 0;
 }
 
 #ifdef CONFIG_NO_HZ
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 884c7a1..5b20141 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -301,7 +301,7 @@ static int __cpuinit cpu_stop_cpu_callback(struct notifier_block *nfb,
 	case CPU_UP_PREPARE:
 		BUG_ON(stopper->thread || stopper->enabled ||
 		       !list_empty(&stopper->works));
-		p = kthread_create(cpu_stopper_thread, stopper, "stopper/%d",
+		p = kthread_create(cpu_stopper_thread, stopper, "migration/%d",
 				   cpu);
 		if (IS_ERR(p))
 			return NOTIFY_BAD;
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/5] sched: kill paranoia check in synchronize_sched_expedited()
  2010-05-06 16:52 [GIT PULL sched/core] cpu_stop: implement and use cpu_stop, take#3 Tejun Heo
                   ` (2 preceding siblings ...)
  2010-05-06 16:52 ` [PATCH 3/5] sched: replace migration_thread with cpu_stop Tejun Heo
@ 2010-05-06 16:52 ` Tejun Heo
  2010-05-06 16:52 ` [PATCH 5/5] sched: correctly place paranioa memory barriers " Tejun Heo
  2010-05-08 11:15 ` [GIT PULL sched/core] cpu_stop: implement and use cpu_stop, take#3 Ingo Molnar
  5 siblings, 0 replies; 13+ messages in thread
From: Tejun Heo @ 2010-05-06 16:52 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel, paulmck
  Cc: Tejun Heo, Dipankar Sarma, Josh Triplett, Oleg Nesterov,
	Dimitri Sivanich

The paranoid check which verifies that the cpu_stop callback is
actually called on all online cpus is completely superflous.  It's
guaranteed by cpu_stop facility and if it didn't work as advertised
other things would go horribly wrong and trying to recover using
synchronize_sched() wouldn't be very meaningful.

Kill the paranoid check.  Removal of this feature is done as a
separate step so that it can serve as a bisection point if something
actually goes wrong.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Josh Triplett <josh@freedesktop.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
---
 kernel/sched.c |   40 +++-------------------------------------
 1 files changed, 3 insertions(+), 37 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index f1d577a..e9c6d79 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -8953,14 +8953,6 @@ static atomic_t synchronize_sched_expedited_count = ATOMIC_INIT(0);
 
 static int synchronize_sched_expedited_cpu_stop(void *data)
 {
-	static DEFINE_SPINLOCK(done_mask_lock);
-	struct cpumask *done_mask = data;
-
-	if (done_mask) {
-		spin_lock(&done_mask_lock);
-		cpumask_set_cpu(smp_processor_id(), done_mask);
-		spin_unlock(&done_mask_lock);
-	}
 	return 0;
 }
 
@@ -8976,55 +8968,29 @@ static int synchronize_sched_expedited_cpu_stop(void *data)
  */
 void synchronize_sched_expedited(void)
 {
-	cpumask_var_t done_mask_var;
-	struct cpumask *done_mask = NULL;
 	int snap, trycount = 0;
 
-	/*
-	 * done_mask is used to check that all cpus actually have
-	 * finished running the stopper, which is guaranteed by
-	 * stop_cpus() if it's called with cpu hotplug blocked.  Keep
-	 * the paranoia for now but it's best effort if cpumask is off
-	 * stack.
-	 */
-	if (zalloc_cpumask_var(&done_mask_var, GFP_ATOMIC))
-		done_mask = done_mask_var;
-
 	smp_mb();  /* ensure prior mod happens before capturing snap. */
 	snap = atomic_read(&synchronize_sched_expedited_count) + 1;
 	get_online_cpus();
 	while (try_stop_cpus(cpu_online_mask,
 			     synchronize_sched_expedited_cpu_stop,
-			     done_mask) == -EAGAIN) {
+			     NULL) == -EAGAIN) {
 		put_online_cpus();
 		if (trycount++ < 10)
 			udelay(trycount * num_online_cpus());
 		else {
 			synchronize_sched();
-			goto free_out;
+			return;
 		}
 		if (atomic_read(&synchronize_sched_expedited_count) - snap > 0) {
 			smp_mb(); /* ensure test happens before caller kfree */
-			goto free_out;
+			return;
 		}
 		get_online_cpus();
 	}
 	atomic_inc(&synchronize_sched_expedited_count);
-	if (done_mask)
-		cpumask_xor(done_mask, done_mask, cpu_online_mask);
 	put_online_cpus();
-
-	/* paranoia - this can't happen */
-	if (done_mask && cpumask_weight(done_mask)) {
-		char buf[80];
-
-		cpulist_scnprintf(buf, sizeof(buf), done_mask);
-		WARN_ONCE(1, "synchronize_sched_expedited: cpu online and done masks disagree on %d cpus: %s\n",
-			  cpumask_weight(done_mask), buf);
-		synchronize_sched();
-	}
-free_out:
-	free_cpumask_var(done_mask_var);
 }
 EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
 
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/5] sched: correctly place paranioa memory barriers in synchronize_sched_expedited()
  2010-05-06 16:52 [GIT PULL sched/core] cpu_stop: implement and use cpu_stop, take#3 Tejun Heo
                   ` (3 preceding siblings ...)
  2010-05-06 16:52 ` [PATCH 4/5] sched: kill paranoia check in synchronize_sched_expedited() Tejun Heo
@ 2010-05-06 16:52 ` Tejun Heo
  2010-05-08 11:15 ` [GIT PULL sched/core] cpu_stop: implement and use cpu_stop, take#3 Ingo Molnar
  5 siblings, 0 replies; 13+ messages in thread
From: Tejun Heo @ 2010-05-06 16:52 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel, paulmck; +Cc: Tejun Heo

From: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

The memory barriers must be in the SMP case, not in the !SMP case.
Also add a barrier after the atomic_inc() in order to ensure that
other CPUs see post-synchronize_sched_expedited() actions as following
the expedited grace period.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/sched.c |   21 +++++++++++----------
 1 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index e9c6d79..155a16d 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -8932,6 +8932,15 @@ struct cgroup_subsys cpuacct_subsys = {
 
 void synchronize_sched_expedited(void)
 {
+}
+EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
+
+#else /* #ifndef CONFIG_SMP */
+
+static atomic_t synchronize_sched_expedited_count = ATOMIC_INIT(0);
+
+static int synchronize_sched_expedited_cpu_stop(void *data)
+{
 	/*
 	 * There must be a full memory barrier on each affected CPU
 	 * between the time that try_stop_cpus() is called and the
@@ -8943,16 +8952,7 @@ void synchronize_sched_expedited(void)
 	 * necessary.  Do smp_mb() anyway for documentation and
 	 * robustness against future implementation changes.
 	 */
-	smp_mb();
-}
-EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
-
-#else /* #ifndef CONFIG_SMP */
-
-static atomic_t synchronize_sched_expedited_count = ATOMIC_INIT(0);
-
-static int synchronize_sched_expedited_cpu_stop(void *data)
-{
+	smp_mb(); /* See above comment block. */
 	return 0;
 }
 
@@ -8990,6 +8990,7 @@ void synchronize_sched_expedited(void)
 		get_online_cpus();
 	}
 	atomic_inc(&synchronize_sched_expedited_count);
+	smp_mb__after_atomic_inc(); /* ensure post-GP actions seen after GP. */
 	put_online_cpus();
 }
 EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/5] sched: replace migration_thread with cpu_stop
  2010-05-06 16:52 ` [PATCH 3/5] sched: replace migration_thread with cpu_stop Tejun Heo
@ 2010-05-06 19:29   ` Paul E. McKenney
  2010-05-07  5:24     ` Tejun Heo
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2010-05-06 19:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: mingo, peterz, linux-kernel, Dipankar Sarma, Josh Triplett,
	Oleg Nesterov, Dimitri Sivanich

On Thu, May 06, 2010 at 06:52:50PM +0200, Tejun Heo wrote:
> Currently migration_thread is serving three purposes - migration
> pusher, context to execute active_load_balance() and forced context
> switcher for expedited RCU synchronize_sched.  All three roles are
> hardcoded into migration_thread() and determining which job is
> scheduled is slightly messy.
> 
> This patch kills migration_thread and replaces all three uses with
> cpu_stop.  The three different roles of migration_thread() are
> splitted into three separate cpu_stop callbacks -
> migration_cpu_stop(), active_load_balance_cpu_stop() and
> synchronize_sched_expedited_cpu_stop() - and each use case now simply
> asks cpu_stop to execute the callback as necessary.
> 
> synchronize_sched_expedited() was implemented with private
> preallocated resources and custom multi-cpu queueing and waiting
> logic, both of which are provided by cpu_stop.
> synchronize_sched_expedited_count is made atomic and all other shared
> resources along with the mutex are dropped.
> 
> synchronize_sched_expedited() also implemented a check to detect cases
> where not all the callback got executed on their assigned cpus and
> fall back to synchronize_sched().  If called with cpu hotplug blocked,
> cpu_stop already guarantees that and the condition cannot happen;
> otherwise, stop_machine() would break.  However, this patch preserves
> the paranoid check using a cpumask to record on which cpus the stopper
> ran so that it can serve as a bisection point if something actually
> goes wrong theree.
> 
> Because the internal execution state is no longer visible,
> rcu_expedited_torture_stats() is removed.
> 
> This patch also renames cpu_stop threads to from "stopper/%d" to
> "migration/%d".  The names of these threads ultimately don't matter
> and there's no reason to make unnecessary userland visible changes.
> 
> With this patch applied, stop_machine() and sched now share the same
> resources.  stop_machine() is faster without wasting any resources and
> sched migration users are much cleaner.

With the two patches submitted:

Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

> Signed-off-by: Tejun Heo <tj@kernel.org>
> Acked-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Dipankar Sarma <dipankar@in.ibm.com>
> Cc: Josh Triplett <josh@freedesktop.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Dimitri Sivanich <sivanich@sgi.com>
> ---
>  Documentation/RCU/torture.txt |   10 --
>  include/linux/rcutiny.h       |    2 -
>  include/linux/rcutree.h       |    1 -
>  kernel/rcutorture.c           |    2 +-
>  kernel/sched.c                |  315 ++++++++++++-----------------------------
>  kernel/sched_fair.c           |   48 +++++--
>  kernel/stop_machine.c         |    2 +-
>  7 files changed, 127 insertions(+), 253 deletions(-)
> 
> diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt
> index 0e50bc2..5d90167 100644
> --- a/Documentation/RCU/torture.txt
> +++ b/Documentation/RCU/torture.txt
> @@ -182,16 +182,6 @@ Similarly, sched_expedited RCU provides the following:
>  	sched_expedited-torture: Reader Pipe:  12660320201 95875 0 0 0 0 0 0 0 0 0
>  	sched_expedited-torture: Reader Batch:  12660424885 0 0 0 0 0 0 0 0 0 0
>  	sched_expedited-torture: Free-Block Circulation:  1090795 1090795 1090794 1090793 1090792 1090791 1090790 1090789 1090788 1090787 0
> -	state: -1 / 0:0 3:0 4:0
> -
> -As before, the first four lines are similar to those for RCU.
> -The last line shows the task-migration state.  The first number is
> --1 if synchronize_sched_expedited() is idle, -2 if in the process of
> -posting wakeups to the migration kthreads, and N when waiting on CPU N.
> -Each of the colon-separated fields following the "/" is a CPU:state pair.
> -Valid states are "0" for idle, "1" for waiting for quiescent state,
> -"2" for passed through quiescent state, and "3" when a race with a
> -CPU-hotplug event forces use of the synchronize_sched() primitive.
> 
> 
>  USAGE
> diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> index a519587..0006b2d 100644
> --- a/include/linux/rcutiny.h
> +++ b/include/linux/rcutiny.h
> @@ -60,8 +60,6 @@ static inline long rcu_batches_completed_bh(void)
>  	return 0;
>  }
> 
> -extern int rcu_expedited_torture_stats(char *page);
> -
>  static inline void rcu_force_quiescent_state(void)
>  {
>  }
> diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
> index 42cc3a0..24e467e 100644
> --- a/include/linux/rcutree.h
> +++ b/include/linux/rcutree.h
> @@ -35,7 +35,6 @@ struct notifier_block;
>  extern void rcu_sched_qs(int cpu);
>  extern void rcu_bh_qs(int cpu);
>  extern int rcu_needs_cpu(int cpu);
> -extern int rcu_expedited_torture_stats(char *page);
> 
>  #ifdef CONFIG_TREE_PREEMPT_RCU
> 
> diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c
> index 58df55b..2b676f3 100644
> --- a/kernel/rcutorture.c
> +++ b/kernel/rcutorture.c
> @@ -669,7 +669,7 @@ static struct rcu_torture_ops sched_expedited_ops = {
>  	.sync		= synchronize_sched_expedited,
>  	.cb_barrier	= NULL,
>  	.fqs		= rcu_sched_force_quiescent_state,
> -	.stats		= rcu_expedited_torture_stats,
> +	.stats		= NULL,
>  	.irq_capable	= 1,
>  	.name		= "sched_expedited"
>  };
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 4956ed0..f1d577a 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -55,9 +55,9 @@
>  #include <linux/cpu.h>
>  #include <linux/cpuset.h>
>  #include <linux/percpu.h>
> -#include <linux/kthread.h>
>  #include <linux/proc_fs.h>
>  #include <linux/seq_file.h>
> +#include <linux/stop_machine.h>
>  #include <linux/sysctl.h>
>  #include <linux/syscalls.h>
>  #include <linux/times.h>
> @@ -539,15 +539,13 @@ struct rq {
>  	int post_schedule;
>  	int active_balance;
>  	int push_cpu;
> +	struct cpu_stop_work active_balance_work;
>  	/* cpu of this runqueue: */
>  	int cpu;
>  	int online;
> 
>  	unsigned long avg_load_per_task;
> 
> -	struct task_struct *migration_thread;
> -	struct list_head migration_queue;
> -
>  	u64 rt_avg;
>  	u64 age_stamp;
>  	u64 idle_stamp;
> @@ -2037,21 +2035,18 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>  	__set_task_cpu(p, new_cpu);
>  }
> 
> -struct migration_req {
> -	struct list_head list;
> -
> +struct migration_arg {
>  	struct task_struct *task;
>  	int dest_cpu;
> -
> -	struct completion done;
>  };
> 
> +static int migration_cpu_stop(void *data);
> +
>  /*
>   * The task's runqueue lock must be held.
>   * Returns true if you have to wait for migration thread.
>   */
> -static int
> -migrate_task(struct task_struct *p, int dest_cpu, struct migration_req *req)
> +static bool migrate_task(struct task_struct *p, int dest_cpu)
>  {
>  	struct rq *rq = task_rq(p);
> 
> @@ -2059,15 +2054,7 @@ migrate_task(struct task_struct *p, int dest_cpu, struct migration_req *req)
>  	 * If the task is not on a runqueue (and not running), then
>  	 * the next wake-up will properly place the task.
>  	 */
> -	if (!p->se.on_rq && !task_running(rq, p))
> -		return 0;
> -
> -	init_completion(&req->done);
> -	req->task = p;
> -	req->dest_cpu = dest_cpu;
> -	list_add(&req->list, &rq->migration_queue);
> -
> -	return 1;
> +	return p->se.on_rq || task_running(rq, p);
>  }
> 
>  /*
> @@ -3110,7 +3097,6 @@ static void update_cpu_load(struct rq *this_rq)
>  void sched_exec(void)
>  {
>  	struct task_struct *p = current;
> -	struct migration_req req;
>  	unsigned long flags;
>  	struct rq *rq;
>  	int dest_cpu;
> @@ -3124,17 +3110,11 @@ void sched_exec(void)
>  	 * select_task_rq() can race against ->cpus_allowed
>  	 */
>  	if (cpumask_test_cpu(dest_cpu, &p->cpus_allowed) &&
> -	    likely(cpu_active(dest_cpu)) &&
> -	    migrate_task(p, dest_cpu, &req)) {
> -		/* Need to wait for migration thread (might exit: take ref). */
> -		struct task_struct *mt = rq->migration_thread;
> +	    likely(cpu_active(dest_cpu)) && migrate_task(p, dest_cpu)) {
> +		struct migration_arg arg = { p, dest_cpu };
> 
> -		get_task_struct(mt);
>  		task_rq_unlock(rq, &flags);
> -		wake_up_process(mt);
> -		put_task_struct(mt);
> -		wait_for_completion(&req.done);
> -
> +		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
>  		return;
>  	}
>  unlock:
> @@ -5290,17 +5270,15 @@ static inline void sched_init_granularity(void)
>  /*
>   * This is how migration works:
>   *
> - * 1) we queue a struct migration_req structure in the source CPU's
> - *    runqueue and wake up that CPU's migration thread.
> - * 2) we down() the locked semaphore => thread blocks.
> - * 3) migration thread wakes up (implicitly it forces the migrated
> - *    thread off the CPU)
> - * 4) it gets the migration request and checks whether the migrated
> - *    task is still in the wrong runqueue.
> - * 5) if it's in the wrong runqueue then the migration thread removes
> + * 1) we invoke migration_cpu_stop() on the target CPU using
> + *    stop_one_cpu().
> + * 2) stopper starts to run (implicitly forcing the migrated thread
> + *    off the CPU)
> + * 3) it checks whether the migrated task is still in the wrong runqueue.
> + * 4) if it's in the wrong runqueue then the migration thread removes
>   *    it and puts it into the right queue.
> - * 6) migration thread up()s the semaphore.
> - * 7) we wake up and the migration is done.
> + * 5) stopper completes and stop_one_cpu() returns and the migration
> + *    is done.
>   */
> 
>  /*
> @@ -5314,9 +5292,9 @@ static inline void sched_init_granularity(void)
>   */
>  int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>  {
> -	struct migration_req req;
>  	unsigned long flags;
>  	struct rq *rq;
> +	unsigned int dest_cpu;
>  	int ret = 0;
> 
>  	/*
> @@ -5354,15 +5332,12 @@ again:
>  	if (cpumask_test_cpu(task_cpu(p), new_mask))
>  		goto out;
> 
> -	if (migrate_task(p, cpumask_any_and(cpu_active_mask, new_mask), &req)) {
> +	dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
> +	if (migrate_task(p, dest_cpu)) {
> +		struct migration_arg arg = { p, dest_cpu };
>  		/* Need help from migration thread: drop lock and wait. */
> -		struct task_struct *mt = rq->migration_thread;
> -
> -		get_task_struct(mt);
>  		task_rq_unlock(rq, &flags);
> -		wake_up_process(mt);
> -		put_task_struct(mt);
> -		wait_for_completion(&req.done);
> +		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
>  		tlb_migrate_finish(p->mm);
>  		return 0;
>  	}
> @@ -5420,70 +5395,22 @@ fail:
>  	return ret;
>  }
> 
> -#define RCU_MIGRATION_IDLE	0
> -#define RCU_MIGRATION_NEED_QS	1
> -#define RCU_MIGRATION_GOT_QS	2
> -#define RCU_MIGRATION_MUST_SYNC	3
> -
>  /*
> - * migration_thread - this is a highprio system thread that performs
> - * thread migration by bumping thread off CPU then 'pushing' onto
> - * another runqueue.
> + * migration_cpu_stop - this will be executed by a highprio stopper thread
> + * and performs thread migration by bumping thread off CPU then
> + * 'pushing' onto another runqueue.
>   */
> -static int migration_thread(void *data)
> +static int migration_cpu_stop(void *data)
>  {
> -	int badcpu;
> -	int cpu = (long)data;
> -	struct rq *rq;
> -
> -	rq = cpu_rq(cpu);
> -	BUG_ON(rq->migration_thread != current);
> -
> -	set_current_state(TASK_INTERRUPTIBLE);
> -	while (!kthread_should_stop()) {
> -		struct migration_req *req;
> -		struct list_head *head;
> -
> -		raw_spin_lock_irq(&rq->lock);
> -
> -		if (cpu_is_offline(cpu)) {
> -			raw_spin_unlock_irq(&rq->lock);
> -			break;
> -		}
> -
> -		if (rq->active_balance) {
> -			active_load_balance(rq, cpu);
> -			rq->active_balance = 0;
> -		}
> -
> -		head = &rq->migration_queue;
> -
> -		if (list_empty(head)) {
> -			raw_spin_unlock_irq(&rq->lock);
> -			schedule();
> -			set_current_state(TASK_INTERRUPTIBLE);
> -			continue;
> -		}
> -		req = list_entry(head->next, struct migration_req, list);
> -		list_del_init(head->next);
> -
> -		if (req->task != NULL) {
> -			raw_spin_unlock(&rq->lock);
> -			__migrate_task(req->task, cpu, req->dest_cpu);
> -		} else if (likely(cpu == (badcpu = smp_processor_id()))) {
> -			req->dest_cpu = RCU_MIGRATION_GOT_QS;
> -			raw_spin_unlock(&rq->lock);
> -		} else {
> -			req->dest_cpu = RCU_MIGRATION_MUST_SYNC;
> -			raw_spin_unlock(&rq->lock);
> -			WARN_ONCE(1, "migration_thread() on CPU %d, expected %d\n", badcpu, cpu);
> -		}
> -		local_irq_enable();
> -
> -		complete(&req->done);
> -	}
> -	__set_current_state(TASK_RUNNING);
> +	struct migration_arg *arg = data;
> 
> +	/*
> +	 * The original target cpu might have gone down and we might
> +	 * be on another cpu but it doesn't matter.
> +	 */
> +	local_irq_disable();
> +	__migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
> +	local_irq_enable();
>  	return 0;
>  }
> 
> @@ -5850,35 +5777,20 @@ static void set_rq_offline(struct rq *rq)
>  static int __cpuinit
>  migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
>  {
> -	struct task_struct *p;
>  	int cpu = (long)hcpu;
>  	unsigned long flags;
> -	struct rq *rq;
> +	struct rq *rq = cpu_rq(cpu);
> 
>  	switch (action) {
> 
>  	case CPU_UP_PREPARE:
>  	case CPU_UP_PREPARE_FROZEN:
> -		p = kthread_create(migration_thread, hcpu, "migration/%d", cpu);
> -		if (IS_ERR(p))
> -			return NOTIFY_BAD;
> -		kthread_bind(p, cpu);
> -		/* Must be high prio: stop_machine expects to yield to it. */
> -		rq = task_rq_lock(p, &flags);
> -		__setscheduler(rq, p, SCHED_FIFO, MAX_RT_PRIO-1);
> -		task_rq_unlock(rq, &flags);
> -		get_task_struct(p);
> -		cpu_rq(cpu)->migration_thread = p;
>  		rq->calc_load_update = calc_load_update;
>  		break;
> 
>  	case CPU_ONLINE:
>  	case CPU_ONLINE_FROZEN:
> -		/* Strictly unnecessary, as first user will wake it. */
> -		wake_up_process(cpu_rq(cpu)->migration_thread);
> -
>  		/* Update our root-domain */
> -		rq = cpu_rq(cpu);
>  		raw_spin_lock_irqsave(&rq->lock, flags);
>  		if (rq->rd) {
>  			BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
> @@ -5889,25 +5801,9 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
>  		break;
> 
>  #ifdef CONFIG_HOTPLUG_CPU
> -	case CPU_UP_CANCELED:
> -	case CPU_UP_CANCELED_FROZEN:
> -		if (!cpu_rq(cpu)->migration_thread)
> -			break;
> -		/* Unbind it from offline cpu so it can run. Fall thru. */
> -		kthread_bind(cpu_rq(cpu)->migration_thread,
> -			     cpumask_any(cpu_online_mask));
> -		kthread_stop(cpu_rq(cpu)->migration_thread);
> -		put_task_struct(cpu_rq(cpu)->migration_thread);
> -		cpu_rq(cpu)->migration_thread = NULL;
> -		break;
> -
>  	case CPU_DEAD:
>  	case CPU_DEAD_FROZEN:
>  		migrate_live_tasks(cpu);
> -		rq = cpu_rq(cpu);
> -		kthread_stop(rq->migration_thread);
> -		put_task_struct(rq->migration_thread);
> -		rq->migration_thread = NULL;
>  		/* Idle task back to normal (off runqueue, low prio) */
>  		raw_spin_lock_irq(&rq->lock);
>  		deactivate_task(rq, rq->idle, 0);
> @@ -5918,29 +5814,11 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
>  		migrate_nr_uninterruptible(rq);
>  		BUG_ON(rq->nr_running != 0);
>  		calc_global_load_remove(rq);
> -		/*
> -		 * No need to migrate the tasks: it was best-effort if
> -		 * they didn't take sched_hotcpu_mutex. Just wake up
> -		 * the requestors.
> -		 */
> -		raw_spin_lock_irq(&rq->lock);
> -		while (!list_empty(&rq->migration_queue)) {
> -			struct migration_req *req;
> -
> -			req = list_entry(rq->migration_queue.next,
> -					 struct migration_req, list);
> -			list_del_init(&req->list);
> -			raw_spin_unlock_irq(&rq->lock);
> -			complete(&req->done);
> -			raw_spin_lock_irq(&rq->lock);
> -		}
> -		raw_spin_unlock_irq(&rq->lock);
>  		break;
> 
>  	case CPU_DYING:
>  	case CPU_DYING_FROZEN:
>  		/* Update our root-domain */
> -		rq = cpu_rq(cpu);
>  		raw_spin_lock_irqsave(&rq->lock, flags);
>  		if (rq->rd) {
>  			BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
> @@ -7757,10 +7635,8 @@ void __init sched_init(void)
>  		rq->push_cpu = 0;
>  		rq->cpu = i;
>  		rq->online = 0;
> -		rq->migration_thread = NULL;
>  		rq->idle_stamp = 0;
>  		rq->avg_idle = 2*sysctl_sched_migration_cost;
> -		INIT_LIST_HEAD(&rq->migration_queue);
>  		rq_attach_root(rq, &def_root_domain);
>  #endif
>  		init_rq_hrtick(rq);
> @@ -9054,43 +8930,39 @@ struct cgroup_subsys cpuacct_subsys = {
> 
>  #ifndef CONFIG_SMP
> 
> -int rcu_expedited_torture_stats(char *page)
> -{
> -	return 0;
> -}
> -EXPORT_SYMBOL_GPL(rcu_expedited_torture_stats);
> -
>  void synchronize_sched_expedited(void)
>  {
> +	/*
> +	 * There must be a full memory barrier on each affected CPU
> +	 * between the time that try_stop_cpus() is called and the
> +	 * time that it returns.
> +	 *
> +	 * In the current initial implementation of cpu_stop, the
> +	 * above condition is already met when the control reaches
> +	 * this point and the following smp_mb() is not strictly
> +	 * necessary.  Do smp_mb() anyway for documentation and
> +	 * robustness against future implementation changes.
> +	 */
> +	smp_mb();
>  }
>  EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
> 
>  #else /* #ifndef CONFIG_SMP */
> 
> -static DEFINE_PER_CPU(struct migration_req, rcu_migration_req);
> -static DEFINE_MUTEX(rcu_sched_expedited_mutex);
> -
> -#define RCU_EXPEDITED_STATE_POST -2
> -#define RCU_EXPEDITED_STATE_IDLE -1
> +static atomic_t synchronize_sched_expedited_count = ATOMIC_INIT(0);
> 
> -static int rcu_expedited_state = RCU_EXPEDITED_STATE_IDLE;
> -
> -int rcu_expedited_torture_stats(char *page)
> +static int synchronize_sched_expedited_cpu_stop(void *data)
>  {
> -	int cnt = 0;
> -	int cpu;
> +	static DEFINE_SPINLOCK(done_mask_lock);
> +	struct cpumask *done_mask = data;
> 
> -	cnt += sprintf(&page[cnt], "state: %d /", rcu_expedited_state);
> -	for_each_online_cpu(cpu) {
> -		 cnt += sprintf(&page[cnt], " %d:%d",
> -				cpu, per_cpu(rcu_migration_req, cpu).dest_cpu);
> +	if (done_mask) {
> +		spin_lock(&done_mask_lock);
> +		cpumask_set_cpu(smp_processor_id(), done_mask);
> +		spin_unlock(&done_mask_lock);
>  	}
> -	cnt += sprintf(&page[cnt], "\n");
> -	return cnt;
> +	return 0;
>  }
> -EXPORT_SYMBOL_GPL(rcu_expedited_torture_stats);
> -
> -static long synchronize_sched_expedited_count;
> 
>  /*
>   * Wait for an rcu-sched grace period to elapse, but use "big hammer"
> @@ -9104,60 +8976,55 @@ static long synchronize_sched_expedited_count;
>   */
>  void synchronize_sched_expedited(void)
>  {
> -	int cpu;
> -	unsigned long flags;
> -	bool need_full_sync = 0;
> -	struct rq *rq;
> -	struct migration_req *req;
> -	long snap;
> -	int trycount = 0;
> +	cpumask_var_t done_mask_var;
> +	struct cpumask *done_mask = NULL;
> +	int snap, trycount = 0;
> +
> +	/*
> +	 * done_mask is used to check that all cpus actually have
> +	 * finished running the stopper, which is guaranteed by
> +	 * stop_cpus() if it's called with cpu hotplug blocked.  Keep
> +	 * the paranoia for now but it's best effort if cpumask is off
> +	 * stack.
> +	 */
> +	if (zalloc_cpumask_var(&done_mask_var, GFP_ATOMIC))
> +		done_mask = done_mask_var;
> 
>  	smp_mb();  /* ensure prior mod happens before capturing snap. */
> -	snap = ACCESS_ONCE(synchronize_sched_expedited_count) + 1;
> +	snap = atomic_read(&synchronize_sched_expedited_count) + 1;
>  	get_online_cpus();
> -	while (!mutex_trylock(&rcu_sched_expedited_mutex)) {
> +	while (try_stop_cpus(cpu_online_mask,
> +			     synchronize_sched_expedited_cpu_stop,
> +			     done_mask) == -EAGAIN) {
>  		put_online_cpus();
>  		if (trycount++ < 10)
>  			udelay(trycount * num_online_cpus());
>  		else {
>  			synchronize_sched();
> -			return;
> +			goto free_out;
>  		}
> -		if (ACCESS_ONCE(synchronize_sched_expedited_count) - snap > 0) {
> +		if (atomic_read(&synchronize_sched_expedited_count) - snap > 0) {
>  			smp_mb(); /* ensure test happens before caller kfree */
> -			return;
> +			goto free_out;
>  		}
>  		get_online_cpus();
>  	}
> -	rcu_expedited_state = RCU_EXPEDITED_STATE_POST;
> -	for_each_online_cpu(cpu) {
> -		rq = cpu_rq(cpu);
> -		req = &per_cpu(rcu_migration_req, cpu);
> -		init_completion(&req->done);
> -		req->task = NULL;
> -		req->dest_cpu = RCU_MIGRATION_NEED_QS;
> -		raw_spin_lock_irqsave(&rq->lock, flags);
> -		list_add(&req->list, &rq->migration_queue);
> -		raw_spin_unlock_irqrestore(&rq->lock, flags);
> -		wake_up_process(rq->migration_thread);
> -	}
> -	for_each_online_cpu(cpu) {
> -		rcu_expedited_state = cpu;
> -		req = &per_cpu(rcu_migration_req, cpu);
> -		rq = cpu_rq(cpu);
> -		wait_for_completion(&req->done);
> -		raw_spin_lock_irqsave(&rq->lock, flags);
> -		if (unlikely(req->dest_cpu == RCU_MIGRATION_MUST_SYNC))
> -			need_full_sync = 1;
> -		req->dest_cpu = RCU_MIGRATION_IDLE;
> -		raw_spin_unlock_irqrestore(&rq->lock, flags);
> -	}
> -	rcu_expedited_state = RCU_EXPEDITED_STATE_IDLE;
> -	synchronize_sched_expedited_count++;
> -	mutex_unlock(&rcu_sched_expedited_mutex);
> +	atomic_inc(&synchronize_sched_expedited_count);
> +	if (done_mask)
> +		cpumask_xor(done_mask, done_mask, cpu_online_mask);
>  	put_online_cpus();
> -	if (need_full_sync)
> +
> +	/* paranoia - this can't happen */
> +	if (done_mask && cpumask_weight(done_mask)) {
> +		char buf[80];
> +
> +		cpulist_scnprintf(buf, sizeof(buf), done_mask);
> +		WARN_ONCE(1, "synchronize_sched_expedited: cpu online and done masks disagree on %d cpus: %s\n",
> +			  cpumask_weight(done_mask), buf);
>  		synchronize_sched();
> +	}
> +free_out:
> +	free_cpumask_var(done_mask_var);
>  }
>  EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
> 
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index cbd8b8a..217e4a9 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -2798,6 +2798,8 @@ static int need_active_balance(struct sched_domain *sd, int sd_idle, int idle)
>  	return unlikely(sd->nr_balance_failed > sd->cache_nice_tries+2);
>  }
> 
> +static int active_load_balance_cpu_stop(void *data);
> +
>  /*
>   * Check this_cpu to ensure it is balanced within domain. Attempt to move
>   * tasks if there is an imbalance.
> @@ -2887,8 +2889,9 @@ redo:
>  		if (need_active_balance(sd, sd_idle, idle)) {
>  			raw_spin_lock_irqsave(&busiest->lock, flags);
> 
> -			/* don't kick the migration_thread, if the curr
> -			 * task on busiest cpu can't be moved to this_cpu
> +			/* don't kick the active_load_balance_cpu_stop,
> +			 * if the curr task on busiest cpu can't be
> +			 * moved to this_cpu
>  			 */
>  			if (!cpumask_test_cpu(this_cpu,
>  					      &busiest->curr->cpus_allowed)) {
> @@ -2898,14 +2901,22 @@ redo:
>  				goto out_one_pinned;
>  			}
> 
> +			/*
> +			 * ->active_balance synchronizes accesses to
> +			 * ->active_balance_work.  Once set, it's cleared
> +			 * only after active load balance is finished.
> +			 */
>  			if (!busiest->active_balance) {
>  				busiest->active_balance = 1;
>  				busiest->push_cpu = this_cpu;
>  				active_balance = 1;
>  			}
>  			raw_spin_unlock_irqrestore(&busiest->lock, flags);
> +
>  			if (active_balance)
> -				wake_up_process(busiest->migration_thread);
> +				stop_one_cpu_nowait(cpu_of(busiest),
> +					active_load_balance_cpu_stop, busiest,
> +					&busiest->active_balance_work);
> 
>  			/*
>  			 * We've kicked active balancing, reset the failure
> @@ -3012,24 +3023,29 @@ static void idle_balance(int this_cpu, struct rq *this_rq)
>  }
> 
>  /*
> - * active_load_balance is run by migration threads. It pushes running tasks
> - * off the busiest CPU onto idle CPUs. It requires at least 1 task to be
> - * running on each physical CPU where possible, and avoids physical /
> - * logical imbalances.
> - *
> - * Called with busiest_rq locked.
> + * active_load_balance_cpu_stop is run by cpu stopper. It pushes
> + * running tasks off the busiest CPU onto idle CPUs. It requires at
> + * least 1 task to be running on each physical CPU where possible, and
> + * avoids physical / logical imbalances.
>   */
> -static void active_load_balance(struct rq *busiest_rq, int busiest_cpu)
> +static int active_load_balance_cpu_stop(void *data)
>  {
> +	struct rq *busiest_rq = data;
> +	int busiest_cpu = cpu_of(busiest_rq);
>  	int target_cpu = busiest_rq->push_cpu;
> +	struct rq *target_rq = cpu_rq(target_cpu);
>  	struct sched_domain *sd;
> -	struct rq *target_rq;
> +
> +	raw_spin_lock_irq(&busiest_rq->lock);
> +
> +	/* make sure the requested cpu hasn't gone down in the meantime */
> +	if (unlikely(busiest_cpu != smp_processor_id() ||
> +		     !busiest_rq->active_balance))
> +		goto out_unlock;
> 
>  	/* Is there any task to move? */
>  	if (busiest_rq->nr_running <= 1)
> -		return;
> -
> -	target_rq = cpu_rq(target_cpu);
> +		goto out_unlock;
> 
>  	/*
>  	 * This condition is "impossible", if it occurs
> @@ -3058,6 +3074,10 @@ static void active_load_balance(struct rq *busiest_rq, int busiest_cpu)
>  			schedstat_inc(sd, alb_failed);
>  	}
>  	double_unlock_balance(busiest_rq, target_rq);
> +out_unlock:
> +	busiest_rq->active_balance = 0;
> +	raw_spin_unlock_irq(&busiest_rq->lock);
> +	return 0;
>  }
> 
>  #ifdef CONFIG_NO_HZ
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 884c7a1..5b20141 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -301,7 +301,7 @@ static int __cpuinit cpu_stop_cpu_callback(struct notifier_block *nfb,
>  	case CPU_UP_PREPARE:
>  		BUG_ON(stopper->thread || stopper->enabled ||
>  		       !list_empty(&stopper->works));
> -		p = kthread_create(cpu_stopper_thread, stopper, "stopper/%d",
> +		p = kthread_create(cpu_stopper_thread, stopper, "migration/%d",
>  				   cpu);
>  		if (IS_ERR(p))
>  			return NOTIFY_BAD;
> -- 
> 1.6.4.2
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/5] sched: replace migration_thread with cpu_stop
  2010-05-06 19:29   ` Paul E. McKenney
@ 2010-05-07  5:24     ` Tejun Heo
  0 siblings, 0 replies; 13+ messages in thread
From: Tejun Heo @ 2010-05-07  5:24 UTC (permalink / raw)
  To: paulmck
  Cc: mingo, peterz, linux-kernel, Dipankar Sarma, Josh Triplett,
	Oleg Nesterov, Dimitri Sivanich

On 05/06/2010 09:29 PM, Paul E. McKenney wrote:
> With the two patches submitted:
> 
> Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

Paul's second fix (adding barrier() to UP variant) is added to the
branch.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL sched/core] cpu_stop: implement and use cpu_stop, take#3
  2010-05-06 16:52 [GIT PULL sched/core] cpu_stop: implement and use cpu_stop, take#3 Tejun Heo
                   ` (4 preceding siblings ...)
  2010-05-06 16:52 ` [PATCH 5/5] sched: correctly place paranioa memory barriers " Tejun Heo
@ 2010-05-08 11:15 ` Ingo Molnar
  2010-05-08 12:47   ` Tejun Heo
  2010-05-08 15:17   ` [PATCH] cpu_stop: add dummy implementation for UP Tejun Heo
  5 siblings, 2 replies; 13+ messages in thread
From: Ingo Molnar @ 2010-05-08 11:15 UTC (permalink / raw)
  To: Tejun Heo; +Cc: peterz, linux-kernel, paulmck


* Tejun Heo <tj@kernel.org> wrote:

> Hello, Ingo.
> 
> Please pull from the following branch to receive cpu_stop patches.
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git cpu_stop
> 
> Adding mb()'s in synchronize_sched_expedited() is the only change
> since the last take[2].  Both Peter and Paul have acked the patchset.
> I'll post the patchset as replies to this message.
> 
> This patchset is on top of the current sched/core (99bd5e2f) and
> contains the following changes.
> 
> Paul E. McKenney (1):
>       scheduler: correctly place paranioa memory barriers in synchronize_sched_expedited()
> 
> Tejun Heo (4):
>       cpu_stop: implement stop_cpu[s]()
>       stop_machine: reimplement using cpu_stop
>       scheduler: replace migration_thread with cpu_stop
>       scheduler: kill paranoia check in synchronize_sched_expedited()
> 
>  Documentation/RCU/torture.txt |   10 -
>  arch/s390/kernel/time.c       |    1 -
>  drivers/xen/manage.c          |   14 +-
>  include/linux/rcutiny.h       |    2 -
>  include/linux/rcutree.h       |    1 -
>  include/linux/stop_machine.h  |   59 +++--
>  kernel/cpu.c                  |    8 -
>  kernel/module.c               |   14 +-
>  kernel/rcutorture.c           |    2 +-
>  kernel/sched.c                |  284 +++++-----------------
>  kernel/sched_fair.c           |   48 +++-
>  kernel/stop_machine.c         |  530 +++++++++++++++++++++++++++++++----------
>  12 files changed, 538 insertions(+), 435 deletions(-)

FYI, it doesnt build on x86 allnoconfig:

kernel/sched_fair.c: In function 'load_balance':
kernel/sched_fair.c:2917: error: implicit declaration of function 'stop_one_cpu_nowait'
kernel/sched.c: In function 'sched_exec':
kernel/sched.c:3084: error: implicit declaration of function 'stop_one_cpu'
kernel/sched.c: In function 'synchronize_sched_expedited':

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL sched/core] cpu_stop: implement and use cpu_stop, take#3
  2010-05-08 11:15 ` [GIT PULL sched/core] cpu_stop: implement and use cpu_stop, take#3 Ingo Molnar
@ 2010-05-08 12:47   ` Tejun Heo
  2010-05-08 15:17   ` [PATCH] cpu_stop: add dummy implementation for UP Tejun Heo
  1 sibling, 0 replies; 13+ messages in thread
From: Tejun Heo @ 2010-05-08 12:47 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: peterz, linux-kernel, paulmck

Hello, Ingo.

On 05/08/2010 01:15 PM, Ingo Molnar wrote:
> FYI, it doesnt build on x86 allnoconfig:
> 
> kernel/sched_fair.c: In function 'load_balance':
> kernel/sched_fair.c:2917: error: implicit declaration of function 'stop_one_cpu_nowait'
> kernel/sched.c: In function 'sched_exec':
> kernel/sched.c:3084: error: implicit declaration of function 'stop_one_cpu'
> kernel/sched.c: In function 'synchronize_sched_expedited':

The commit is fc390cde362309f6892bb719194f242c466a978b, right?  Yeah,
it's missing the dummy definitions for UP.  The strange thing is that
the commit builds fine w/ allnoconfig on x86 and x86_64.  Ah... maybe
other changes in sched/core makes use of stop_cpu*() from UP code
path.  Anyways, will fix it right away.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] cpu_stop: add dummy implementation for UP
  2010-05-08 11:15 ` [GIT PULL sched/core] cpu_stop: implement and use cpu_stop, take#3 Ingo Molnar
  2010-05-08 12:47   ` Tejun Heo
@ 2010-05-08 15:17   ` Tejun Heo
  1 sibling, 0 replies; 13+ messages in thread
From: Tejun Heo @ 2010-05-08 15:17 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: peterz, linux-kernel, paulmck

When !CONFIG_SMP, cpu_stop functions weren't defined at all which
could lead to build failures if UP code uses cpu_stop facility.  Add
dummy cpu_stop implementation for UP.  The waiting variants execute
the work function directly with preempt disabled and
stop_one_cpu_nowait() schedules a workqueue work.

Makefile and ifdefs around stop_machine implementation are updated to
accomodate CONFIG_SMP && !CONFIG_STOP_MACHINE case.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ingo Molnar <mingo@elte.hu>
---
This patch should fix possible build issues on UP configurations.
I've published the patch to the cpu_stop branch.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git cpu_stop

Ingo, can you please pull from it?

Thank you.

 include/linux/stop_machine.h |   69 ++++++++++++++++++++++++++++++++++++++----
 kernel/Makefile              |    2 +-
 kernel/stop_machine.c        |    4 ++
 3 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index 0e552e7..6b524a0 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -6,8 +6,6 @@
 #include <linux/list.h>
 #include <asm/system.h>

-#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
-
 /*
  * stop_cpu[s]() is simplistic per-cpu maximum priority cpu
  * monopolization mechanism.  The caller can specify a non-sleeping
@@ -18,9 +16,10 @@
  * up and requests are guaranteed to be served as long as the target
  * cpus are online.
  */
-
 typedef int (*cpu_stop_fn_t)(void *arg);

+#ifdef CONFIG_SMP
+
 struct cpu_stop_work {
 	struct list_head	list;		/* cpu_stopper->works */
 	cpu_stop_fn_t		fn;
@@ -34,12 +33,70 @@ void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg,
 int stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg);
 int try_stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg);

+#else	/* CONFIG_SMP */
+
+#include <linux/workqueue.h>
+
+struct cpu_stop_work {
+	struct work_struct	work;
+	cpu_stop_fn_t		fn;
+	void			*arg;
+};
+
+static inline int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
+{
+	int ret = -ENOENT;
+	preempt_disable();
+	if (cpu == smp_processor_id())
+		ret = fn(arg);
+	preempt_enable();
+	return ret;
+}
+
+static void stop_one_cpu_nowait_workfn(struct work_struct *work)
+{
+	struct cpu_stop_work *stwork =
+		container_of(work, struct cpu_stop_work, work);
+	preempt_disable();
+	stwork->fn(stwork->arg);
+	preempt_enable();
+}
+
+static inline void stop_one_cpu_nowait(unsigned int cpu,
+				       cpu_stop_fn_t fn, void *arg,
+				       struct cpu_stop_work *work_buf)
+{
+	if (cpu == smp_processor_id()) {
+		INIT_WORK(&work_buf->work, stop_one_cpu_nowait_workfn);
+		work_buf->fn = fn;
+		work_buf->arg = arg;
+		schedule_work(&work_buf->work);
+	}
+}
+
+static inline int stop_cpus(const struct cpumask *cpumask,
+			    cpu_stop_fn_t fn, void *arg)
+{
+	if (cpumask_test_cpu(raw_smp_processor_id(), cpumask))
+		return stop_one_cpu(raw_smp_processor_id(), fn, arg);
+	return -ENOENT;
+}
+
+static inline int try_stop_cpus(const struct cpumask *cpumask,
+				cpu_stop_fn_t fn, void *arg)
+{
+	return stop_cpus(cpumask, fn, arg);
+}
+
+#endif	/* CONFIG_SMP */
+
 /*
  * stop_machine "Bogolock": stop the entire machine, disable
  * interrupts.  This is a very heavy lock, which is equivalent to
  * grabbing every spinlock (and more).  So the "read" side to such a
  * lock is anything which disables preeempt.
  */
+#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)

 /**
  * stop_machine: freeze the machine on all CPUs and run this function
@@ -67,7 +124,7 @@ int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus);
  */
 int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus);

-#else
+#else	 /* CONFIG_STOP_MACHINE && CONFIG_SMP */

 static inline int stop_machine(int (*fn)(void *), void *data,
 			       const struct cpumask *cpus)
@@ -79,5 +136,5 @@ static inline int stop_machine(int (*fn)(void *), void *data,
 	return ret;
 }

-#endif /* CONFIG_SMP */
-#endif /* _LINUX_STOP_MACHINE */
+#endif	/* CONFIG_STOP_MACHINE && CONFIG_SMP */
+#endif	/* _LINUX_STOP_MACHINE */
diff --git a/kernel/Makefile b/kernel/Makefile
index a987aa1..149e18e 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -68,7 +68,7 @@ obj-$(CONFIG_USER_NS) += user_namespace.o
 obj-$(CONFIG_PID_NS) += pid_namespace.o
 obj-$(CONFIG_IKCONFIG) += configs.o
 obj-$(CONFIG_RESOURCE_COUNTERS) += res_counter.o
-obj-$(CONFIG_STOP_MACHINE) += stop_machine.o
+obj-$(CONFIG_SMP) += stop_machine.o
 obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o
 obj-$(CONFIG_AUDIT) += audit.o auditfilter.o audit_watch.o
 obj-$(CONFIG_AUDITSYSCALL) += auditsc.o
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 5b20141..ef51d1f 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -375,6 +375,8 @@ static int __init cpu_stop_init(void)
 }
 early_initcall(cpu_stop_init);

+#ifdef CONFIG_STOP_MACHINE
+
 /* This controls the threads on each CPU. */
 enum stopmachine_state {
 	/* Dummy starting state for thread. */
@@ -477,3 +479,5 @@ int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
 	return ret;
 }
 EXPORT_SYMBOL_GPL(stop_machine);
+
+#endif	/* CONFIG_STOP_MACHINE */
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/5] stop_machine: reimplement using cpu_stop
  2010-05-06 16:52 ` [PATCH 2/5] stop_machine: reimplement using cpu_stop Tejun Heo
@ 2010-05-11 21:50   ` Andrew Morton
  2010-05-12  7:20     ` Tejun Heo
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2010-05-11 21:50 UTC (permalink / raw)
  To: Tejun Heo
  Cc: mingo, peterz, linux-kernel, paulmck, Oleg Nesterov,
	Dimitri Sivanich

On Thu,  6 May 2010 18:52:49 +0200
Tejun Heo <tj@kernel.org> wrote:

> Reimplement stop_machine using cpu_stop.  As cpu stoppers are
> guaranteed to be available for all online cpus,
> stop_machine_create/destroy() are no longer necessary and removed.
> 
> With resource management and synchronization handled by cpu_stop, the
> new implementation is much simpler.  Asking the cpu_stop to execute
> the stop_cpu() state machine on all online cpus with cpu hotplug
> disabled is enough.
> 
> stop_machine itself doesn't need to manage any global resources
> anymore, so all per-instance information is rolled into struct
> stop_machine_data and the mutex and all static data variables are
> removed.
> 
> The previous implementation created and destroyed RT workqueues as
> necessary which made stop_machine() calls highly expensive on very
> large machines.  According to Dimitri Sivanich, preventing the dynamic
> creation/destruction makes booting faster more than twice on very
> large machines.  cpu_stop resources are preallocated for all online
> cpus and should have the same effect.
> 
> ...
>
> @@ -361,9 +357,6 @@ int disable_nonboot_cpus(void)
>  {
>  	int cpu, first_cpu, error;
>  
> -	error = stop_machine_create();
> -	if (error)
> -		return error;
>  	cpu_maps_update_begin();
>  	first_cpu = cpumask_first(cpu_online_mask);
>  	/*

kernel/cpu.c: In function 'disable_nonboot_cpus':
kernel/cpu.c:397: warning: 'error' may be used uninitialized in this function

Looks like this is a real bug if the cpumask happens to contain only a
single CPU for some reason.

--- a/kernel/cpu.c~a
+++ a/kernel/cpu.c
@@ -355,7 +355,8 @@ static cpumask_var_t frozen_cpus;
 
 int disable_nonboot_cpus(void)
 {
-	int cpu, first_cpu, error;
+	int cpu, first_cpu;
+	int error = 0;
 
 	cpu_maps_update_begin();
 	first_cpu = cpumask_first(cpu_online_mask);
_


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/5] stop_machine: reimplement using cpu_stop
  2010-05-11 21:50   ` Andrew Morton
@ 2010-05-12  7:20     ` Tejun Heo
  0 siblings, 0 replies; 13+ messages in thread
From: Tejun Heo @ 2010-05-12  7:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: mingo, peterz, linux-kernel, paulmck, Oleg Nesterov,
	Dimitri Sivanich

Hello,

On 05/11/2010 11:50 PM, Andrew Morton wrote:
> kernel/cpu.c: In function 'disable_nonboot_cpus':
> kernel/cpu.c:397: warning: 'error' may be used uninitialized in this function
> 
> Looks like this is a real bug if the cpumask happens to contain only a
> single CPU for some reason.

Indeed.  Hmm... I wonder why my builds didn't find it.

Acked-by: Tejun Heo <tj@kernel.org>

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-05-12  7:20 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-06 16:52 [GIT PULL sched/core] cpu_stop: implement and use cpu_stop, take#3 Tejun Heo
2010-05-06 16:52 ` [PATCH 1/5] cpu_stop: implement stop_cpu[s]() Tejun Heo
2010-05-06 16:52 ` [PATCH 2/5] stop_machine: reimplement using cpu_stop Tejun Heo
2010-05-11 21:50   ` Andrew Morton
2010-05-12  7:20     ` Tejun Heo
2010-05-06 16:52 ` [PATCH 3/5] sched: replace migration_thread with cpu_stop Tejun Heo
2010-05-06 19:29   ` Paul E. McKenney
2010-05-07  5:24     ` Tejun Heo
2010-05-06 16:52 ` [PATCH 4/5] sched: kill paranoia check in synchronize_sched_expedited() Tejun Heo
2010-05-06 16:52 ` [PATCH 5/5] sched: correctly place paranioa memory barriers " Tejun Heo
2010-05-08 11:15 ` [GIT PULL sched/core] cpu_stop: implement and use cpu_stop, take#3 Ingo Molnar
2010-05-08 12:47   ` Tejun Heo
2010-05-08 15:17   ` [PATCH] cpu_stop: add dummy implementation for UP Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).