[patch 0/4] MTRR rendezvous deadlock fix and cleanups using stop

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [patch 0/4] MTRR rendezvous deadlock fix and cleanups using stop_machine()
@ 2011-06-22 22:20 Suresh Siddha
  2011-06-22 22:20 ` [patch 1/4] x86, mtrr: lock stop machine during MTRR rendezvous sequence Suresh Siddha
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Suresh Siddha @ 2011-06-22 22:20 UTC (permalink / raw)
  To: mingo, tglx, hpa, trenn, prarit, tj, rusty, a.p.zijlstra, akpm,
	torvalds
  Cc: linux-kernel, suresh.b.siddha, youquan.song

Ingo, Peter:

Here is the updated MTRR stop machine patch-set.

First patch is a quick fix targeting 3.0 and the stable kernels.
This fixes the boot deadlock reported in the
https://bugzilla.novell.com/show_bug.cgi?id=672008

Second patch reorganizes the stop_cpus() code.

Third patch introduces stop_machine_from_offline_cpu() so that we can do
stop machine from the cpu hotplug path, where the calling cpu is not
yet online.

Fourth patch uses the stop_machine() and stop_machine_from_offline_cpu()
to implement the x86 MTRR rendezvous sequence and thus remove the duplicate
implementation of stop machine using stop_one_cpu_nowait().

thanks,
suresh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 1/4] x86, mtrr: lock stop machine during MTRR rendezvous sequence
  2011-06-22 22:20 [patch 0/4] MTRR rendezvous deadlock fix and cleanups using stop_machine() Suresh Siddha
@ 2011-06-22 22:20 ` Suresh Siddha
  2011-06-23  9:05   ` Peter Zijlstra
  2011-06-22 22:20 ` [patch 2/4] stop_machine: reorganize stop_cpus() implementation Suresh Siddha
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Suresh Siddha @ 2011-06-22 22:20 UTC (permalink / raw)
  To: mingo, tglx, hpa, trenn, prarit, tj, rusty, a.p.zijlstra, akpm,
	torvalds
  Cc: linux-kernel, suresh.b.siddha, youquan.song, stable

[-- Attachment #1: mtrr_stop_machine_quick_fix.patch --]
[-- Type: text/plain, Size: 4025 bytes --]

From: Suresh Siddha <suresh.b.siddha@intel.com>

MTRR rendezvous sequence using stop_one_cpu_nowait() can potentially
happen in parallel with another system wide rendezvous using
stop_machine(). This can lead to deadlock (The order in which
works are queued can be different on different cpu's. Some cpu's
will be running the first rendezvous handler and others will be running
the second rendezvous handler. Each set waiting for the other set to join
for the system wide rendezvous, leading to a deadlock).

MTRR rendezvous sequence is not implemented using stop_machine() as this
gets called both from the process context aswell as the cpu online paths
(where the cpu has not come online and the interrupts are disabled etc).
stop_machine() works with only online cpus.

For now, take the stop_machine mutex in the MTRR rendezvous sequence that
gets called from an online cpu (here we are in the process context
and can potentially sleep while taking the mutex). And the MTRR rendezvous
that gets triggered during cpu online doesn't need to take this stop_machine
lock (as the stop_machine() already ensures that there is no cpu hotplug
going on in parallel by doing get_online_cpus())

    TBD: Pursue a cleaner solution of extending the stop_machine()
         infrastructure to handle the case where the calling cpu is
         still not online and use this for MTRR rendezvous sequence.

fixes: https://bugzilla.novell.com/show_bug.cgi?id=672008

(will be forwarded to stable series for inclusion in kernels v2.6.35-v2.6.39
 after some testing in mainline).

Reported-by: Vadim Kotelnikov <vadimuzzz@inbox.ru>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: stable@kernel.org # 2.6.35+, backport a week or two after this gets more testing in mainline
---
 arch/x86/kernel/cpu/mtrr/main.c |   16 ++++++++++++++++
 include/linux/stop_machine.h    |    2 ++
 kernel/stop_machine.c           |    2 +-
 3 files changed, 19 insertions(+), 1 deletion(-)

Index: linux-2.6-tip/arch/x86/kernel/cpu/mtrr/main.c
===================================================================
--- linux-2.6-tip.orig/arch/x86/kernel/cpu/mtrr/main.c
+++ linux-2.6-tip/arch/x86/kernel/cpu/mtrr/main.c
@@ -248,6 +248,18 @@ set_mtrr(unsigned int reg, unsigned long
 	unsigned long flags;
 	int cpu;
 
+#ifdef CONFIG_SMP
+	/*
+	 * If we are not yet online, then there can be no stop_machine() in
+	 * parallel. Stop machine ensures this by using get_online_cpus().
+	 *
+	 * If we are online, then we need to prevent a stop_machine() happening
+	 * in parallel by taking the stop cpus mutex.
+	 */
+	if (cpu_online(raw_smp_processor_id()))
+		mutex_lock(&stop_cpus_mutex);
+#endif
+
 	preempt_disable();
 
 	data.smp_reg = reg;
@@ -330,6 +342,10 @@ set_mtrr(unsigned int reg, unsigned long
 
 	local_irq_restore(flags);
 	preempt_enable();
+#ifdef CONFIG_SMP
+	if (cpu_online(raw_smp_processor_id()))
+		mutex_unlock(&stop_cpus_mutex);
+#endif
 }
 
 /**
Index: linux-2.6-tip/include/linux/stop_machine.h
===================================================================
--- linux-2.6-tip.orig/include/linux/stop_machine.h
+++ linux-2.6-tip/include/linux/stop_machine.h
@@ -27,6 +27,8 @@ struct cpu_stop_work {
 	struct cpu_stop_done	*done;
 };
 
+extern struct mutex stop_cpus_mutex;
+
 int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg);
 void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg,
 			 struct cpu_stop_work *work_buf);
Index: linux-2.6-tip/kernel/stop_machine.c
===================================================================
--- linux-2.6-tip.orig/kernel/stop_machine.c
+++ linux-2.6-tip/kernel/stop_machine.c
@@ -132,8 +132,8 @@ void stop_one_cpu_nowait(unsigned int cp
 	cpu_stop_queue_work(&per_cpu(cpu_stopper, cpu), work_buf);
 }
 
+DEFINE_MUTEX(stop_cpus_mutex);
 /* static data for stop_cpus */
-static DEFINE_MUTEX(stop_cpus_mutex);
 static DEFINE_PER_CPU(struct cpu_stop_work, stop_cpus_work);
 
 int __stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg)



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 2/4] stop_machine: reorganize stop_cpus() implementation
  2011-06-22 22:20 [patch 0/4] MTRR rendezvous deadlock fix and cleanups using stop_machine() Suresh Siddha
  2011-06-22 22:20 ` [patch 1/4] x86, mtrr: lock stop machine during MTRR rendezvous sequence Suresh Siddha
@ 2011-06-22 22:20 ` Suresh Siddha
  2011-06-22 22:20 ` [patch 3/4] stop_machine: implement stop_machine_from_offline_cpu() Suresh Siddha
  2011-06-22 22:20 ` [patch 4/4] x86, mtrr: use stop_machine() for doing MTRR rendezvous Suresh Siddha
  3 siblings, 0 replies; 15+ messages in thread
From: Suresh Siddha @ 2011-06-22 22:20 UTC (permalink / raw)
  To: mingo, tglx, hpa, trenn, prarit, tj, rusty, a.p.zijlstra, akpm,
	torvalds
  Cc: linux-kernel, suresh.b.siddha, youquan.song

[-- Attachment #1: reorganize_stop_cpus.patch --]
[-- Type: text/plain, Size: 2101 bytes --]

From: Tejun Heo <tj@kernel.org>

Refactor the queuing part of the stop cpus work from __stop_cpus() into
queue_stop_cpus_work().

The reorganization is to help future improvements to stop_machine()
and doesn't introduce any behavior difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/stop_machine.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

Index: linux-2.6-tip/kernel/stop_machine.c
===================================================================
--- linux-2.6-tip.orig/kernel/stop_machine.c
+++ linux-2.6-tip/kernel/stop_machine.c
@@ -136,10 +136,11 @@ DEFINE_MUTEX(stop_cpus_mutex);
 /* static data for stop_cpus */
 static DEFINE_PER_CPU(struct cpu_stop_work, stop_cpus_work);
 
-int __stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg)
+static void queue_stop_cpus_work(const struct cpumask *cpumask,
+				 cpu_stop_fn_t fn, void *arg,
+				 struct cpu_stop_done *done)
 {
 	struct cpu_stop_work *work;
-	struct cpu_stop_done done;
 	unsigned int cpu;
 
 	/* initialize works and done */
@@ -147,9 +148,8 @@ int __stop_cpus(const struct cpumask *cp
 		work = &per_cpu(stop_cpus_work, cpu);
 		work->fn = fn;
 		work->arg = arg;
-		work->done = &done;
+		work->done = done;
 	}
-	cpu_stop_init_done(&done, cpumask_weight(cpumask));
 
 	/*
 	 * Disable preemption while queueing to avoid getting
@@ -161,7 +161,15 @@ int __stop_cpus(const struct cpumask *cp
 		cpu_stop_queue_work(&per_cpu(cpu_stopper, cpu),
 				    &per_cpu(stop_cpus_work, cpu));
 	preempt_enable();
+}
 
+static int __stop_cpus(const struct cpumask *cpumask,
+		       cpu_stop_fn_t fn, void *arg)
+{
+	struct cpu_stop_done done;
+
+	cpu_stop_init_done(&done, cpumask_weight(cpumask));
+	queue_stop_cpus_work(cpumask, fn, arg, &done);
 	wait_for_completion(&done.completion);
 	return done.executed ? done.ret : -ENOENT;
 }



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 3/4] stop_machine: implement stop_machine_from_offline_cpu()
  2011-06-22 22:20 [patch 0/4] MTRR rendezvous deadlock fix and cleanups using stop_machine() Suresh Siddha
  2011-06-22 22:20 ` [patch 1/4] x86, mtrr: lock stop machine during MTRR rendezvous sequence Suresh Siddha
  2011-06-22 22:20 ` [patch 2/4] stop_machine: reorganize stop_cpus() implementation Suresh Siddha
@ 2011-06-22 22:20 ` Suresh Siddha
  2011-06-23  9:25   ` Peter Zijlstra
  2011-06-22 22:20 ` [patch 4/4] x86, mtrr: use stop_machine() for doing MTRR rendezvous Suresh Siddha
  3 siblings, 1 reply; 15+ messages in thread
From: Suresh Siddha @ 2011-06-22 22:20 UTC (permalink / raw)
  To: mingo, tglx, hpa, trenn, prarit, tj, rusty, a.p.zijlstra, akpm,
	torvalds
  Cc: linux-kernel, suresh.b.siddha, youquan.song

[-- Attachment #1: implement_stop_machine_from_offline_cpu.patch --]
[-- Type: text/plain, Size: 5264 bytes --]

From: Tejun Heo <tj@kernel.org>

Currently, mtrr wants stop_machine functionality while a CPU is being
brought up.  As stop_machine() requires the calling CPU to be online,
mtrr implements its own stop_machine using stop_one_cpu() on each
online CPU.  This doesn't only unnecessarily duplicate complex logic
but also introduces a possibility of deadlock when it races against
the generic stop_machine().

This patch implements stop_machine_from_offline_cpu() to serve such
use cases.  Its functionality is basically the same as stop_machine();
however, it should be called from a CPU which isn't online and doesn't
depend on working scheduling on the calling CPU.

This is achieved by using busy loops for synchronization and
open-coding stop_cpus queuing and waiting with direct invocation of
fn() for local CPU inbetween.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/linux/stop_machine.h |   14 ++++++++-
 kernel/stop_machine.c        |   62 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 73 insertions(+), 3 deletions(-)

Index: linux-2.6-tip/kernel/stop_machine.c
===================================================================
--- linux-2.6-tip.orig/kernel/stop_machine.c
+++ linux-2.6-tip/kernel/stop_machine.c
@@ -439,8 +439,15 @@ static int stop_machine_cpu_stop(void *d
 	struct stop_machine_data *smdata = data;
 	enum stopmachine_state curstate = STOPMACHINE_NONE;
 	int cpu = smp_processor_id(), err = 0;
+	unsigned long flags;
 	bool is_active;
 
+	/*
+	 * When called from stop_machine_from_offline_cpu(), irq might
+	 * already be disabled.  Save the state and restore it on exit.
+	 */
+	local_save_flags(flags);
+
 	if (!smdata->active_cpus)
 		is_active = cpu == cpumask_first(cpu_online_mask);
 	else
@@ -468,7 +475,7 @@ static int stop_machine_cpu_stop(void *d
 		}
 	} while (curstate != STOPMACHINE_EXIT);
 
-	local_irq_enable();
+	local_irq_restore(flags);
 	return err;
 }
 
@@ -495,4 +502,57 @@ int stop_machine(int (*fn)(void *), void
 }
 EXPORT_SYMBOL_GPL(stop_machine);
 
+/**
+ * stop_machine_from_offline_cpu - stop_machine() from offline CPU
+ * @fn: the function to run
+ * @data: the data ptr for the @fn()
+ * @cpus: the cpus to run the @fn() on (NULL = any online cpu)
+ *
+ * This is identical to stop_machine() but can be called from a CPU which
+ * isn't online.  The local CPU is in the process of hotplug (so no other
+ * CPU hotplug can start) and not marked online and doesn't have enough
+ * context to sleep.
+ *
+ * This function provides stop_machine() functionality for such state by
+ * using busy-wait for synchronization and executing @fn directly for local
+ * CPU.
+ *
+ * CONTEXT:
+ * Local CPU is offline.  Temporarily stops all online CPUs.
+ *
+ * RETURNS:
+ * 0 if all executions of @fn returned 0, any non zero return value if any
+ * returned non zero.
+ */
+int stop_machine_from_offline_cpu(int (*fn)(void *), void *data,
+				  const struct cpumask *cpus)
+{
+	struct stop_machine_data smdata = { .fn = fn, .data = data,
+					    .active_cpus = cpus };
+	struct cpu_stop_done done;
+	int ret;
+
+	/* Local CPU must be offline and CPU hotplug in progress. */
+	BUG_ON(cpu_online(raw_smp_processor_id()));
+	smdata.num_threads = num_online_cpus() + 1;	/* +1 for local */
+
+	/* No proper task established and can't sleep - busy wait for lock. */
+	while (!mutex_trylock(&stop_cpus_mutex))
+		cpu_relax();
+
+	/* Schedule work on other CPUs and execute directly for local CPU */
+	set_state(&smdata, STOPMACHINE_PREPARE);
+	cpu_stop_init_done(&done, num_online_cpus());
+	queue_stop_cpus_work(cpu_online_mask, stop_machine_cpu_stop, &smdata,
+			     &done);
+	ret = stop_machine_cpu_stop(&smdata);
+
+	/* Busy wait for completion. */
+	while (!completion_done(&done.completion))
+		cpu_relax();
+
+	mutex_unlock(&stop_cpus_mutex);
+	return ret ?: done.ret;
+}
+
 #endif	/* CONFIG_STOP_MACHINE */
Index: linux-2.6-tip/include/linux/stop_machine.h
===================================================================
--- linux-2.6-tip.orig/include/linux/stop_machine.h
+++ linux-2.6-tip/include/linux/stop_machine.h
@@ -126,15 +126,19 @@ int stop_machine(int (*fn)(void *), void
  */
 int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus);
 
+int stop_machine_from_offline_cpu(int (*fn)(void *), void *data,
+				  const struct cpumask *cpus);
+
 #else	 /* CONFIG_STOP_MACHINE && CONFIG_SMP */
 
 static inline int __stop_machine(int (*fn)(void *), void *data,
 				 const struct cpumask *cpus)
 {
+	unsigned long flags;
 	int ret;
-	local_irq_disable();
+	local_irq_save(flags);
 	ret = fn(data);
-	local_irq_enable();
+	local_irq_restore(flags);
 	return ret;
 }
 
@@ -144,5 +148,11 @@ static inline int stop_machine(int (*fn)
 	return __stop_machine(fn, data, cpus);
 }
 
+static inline int stop_machine_from_offline_cpu(int (*fn)(void *), void *data,
+						const struct cpumask *cpus)
+{
+	return __stop_machine(fn, data, cpus);
+}
+
 #endif	/* CONFIG_STOP_MACHINE && CONFIG_SMP */
 #endif	/* _LINUX_STOP_MACHINE */



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 4/4] x86, mtrr: use stop_machine() for doing MTRR rendezvous
  2011-06-22 22:20 [patch 0/4] MTRR rendezvous deadlock fix and cleanups using stop_machine() Suresh Siddha
                   ` (2 preceding siblings ...)
  2011-06-22 22:20 ` [patch 3/4] stop_machine: implement stop_machine_from_offline_cpu() Suresh Siddha
@ 2011-06-22 22:20 ` Suresh Siddha
  3 siblings, 0 replies; 15+ messages in thread
From: Suresh Siddha @ 2011-06-22 22:20 UTC (permalink / raw)
  To: mingo, tglx, hpa, trenn, prarit, tj, rusty, a.p.zijlstra, akpm,
	torvalds
  Cc: linux-kernel, suresh.b.siddha, youquan.song

[-- Attachment #1: use_stop_machine_for_mtrr_rendezvous.patch --]
[-- Type: text/plain, Size: 9579 bytes --]

From: Suresh Siddha <suresh.b.siddha@intel.com>

MTRR rendezvous sequence is not implemented using stop_machine() before, as this
gets called both from the process context aswell as the cpu online paths
(where the cpu has not come online and the interrupts are disabled etc).

Now that we have a new stop_machine_from_offline_cpu() API, use it for
rendezvous during mtrr init of a logical processor that is coming online.

For the rest (runtime MTRR modification, system boot, resume paths), use
stop_machine() to implement the rendezvous sequence. This will consolidate and
cleanup the code.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
 arch/x86/kernel/cpu/mtrr/main.c |  187 ++++++++--------------------------------
 include/linux/stop_machine.h    |    2 
 kernel/stop_machine.c           |    2 
 3 files changed, 43 insertions(+), 148 deletions(-)

Index: linux-2.6-tip/arch/x86/kernel/cpu/mtrr/main.c
===================================================================
--- linux-2.6-tip.orig/arch/x86/kernel/cpu/mtrr/main.c
+++ linux-2.6-tip/arch/x86/kernel/cpu/mtrr/main.c
@@ -137,55 +137,43 @@ static void __init init_table(void)
 }
 
 struct set_mtrr_data {
-	atomic_t	count;
-	atomic_t	gate;
 	unsigned long	smp_base;
 	unsigned long	smp_size;
 	unsigned int	smp_reg;
 	mtrr_type	smp_type;
 };
 
-static DEFINE_PER_CPU(struct cpu_stop_work, mtrr_work);
-
 /**
- * mtrr_work_handler - Synchronisation handler. Executed by "other" CPUs.
+ * mtrr_rendezvous_handler - Work done in the synchronization handler. Executed
+ * by all the CPUs.
  * @info: pointer to mtrr configuration data
  *
  * Returns nothing.
  */
-static int mtrr_work_handler(void *info)
+static int mtrr_rendezvous_handler(void *info)
 {
 #ifdef CONFIG_SMP
 	struct set_mtrr_data *data = info;
-	unsigned long flags;
-
-	atomic_dec(&data->count);
-	while (!atomic_read(&data->gate))
-		cpu_relax();
-
-	local_irq_save(flags);
-
-	atomic_dec(&data->count);
-	while (atomic_read(&data->gate))
-		cpu_relax();
 
-	/*  The master has cleared me to execute  */
+	/*
+	 * We use this same function to initialize the mtrrs during boot,
+	 * resume, runtime cpu online and on an explicit request to set a
+	 * specific MTRR.
+	 *
+	 * During boot or suspend, the state of the boot cpu's mtrrs has been
+	 * saved, and we want to replicate that across all the cpus that come
+	 * online (either at the end of boot or resume or during a runtime cpu
+	 * online). If we're doing that, @reg is set to something special and on
+	 * all the cpu's we do mtrr_if->set_all() (On the logical cpu that
+	 * started the boot/resume sequence, this might be a duplicate
+	 * set_all()).
+	 */
 	if (data->smp_reg != ~0U) {
 		mtrr_if->set(data->smp_reg, data->smp_base,
 			     data->smp_size, data->smp_type);
-	} else if (mtrr_aps_delayed_init) {
-		/*
-		 * Initialize the MTRRs inaddition to the synchronisation.
-		 */
+	} else if (mtrr_aps_delayed_init || !cpu_online(smp_processor_id())) {
 		mtrr_if->set_all();
 	}
-
-	atomic_dec(&data->count);
-	while (!atomic_read(&data->gate))
-		cpu_relax();
-
-	atomic_dec(&data->count);
-	local_irq_restore(flags);
 #endif
 	return 0;
 }
@@ -223,20 +211,11 @@ static inline int types_compatible(mtrr_
  * 14. Wait for buddies to catch up
  * 15. Enable interrupts.
  *
- * What does that mean for us? Well, first we set data.count to the number
- * of CPUs. As each CPU announces that it started the rendezvous handler by
- * decrementing the count, We reset data.count and set the data.gate flag
- * allowing all the cpu's to proceed with the work. As each cpu disables
- * interrupts, it'll decrement data.count once. We wait until it hits 0 and
- * proceed. We clear the data.gate flag and reset data.count. Meanwhile, they
- * are waiting for that flag to be cleared. Once it's cleared, each
- * CPU goes through the transition of updating MTRRs.
- * The CPU vendors may each do it differently,
- * so we call mtrr_if->set() callback and let them take care of it.
- * When they're done, they again decrement data->count and wait for data.gate
- * to be set.
- * When we finish, we wait for data.count to hit 0 and toggle the data.gate flag
- * Everyone then enables interrupts and we all continue on.
+ * What does that mean for us? Well, stop_machine() will ensure that
+ * the rendezvous handler is started on each CPU. And in lockstep they
+ * do the state transition of disabling interrupts, updating MTRR's
+ * (the CPU vendors may each do it differently, so we call mtrr_if->set()
+ * callback and let them take care of it.) and enabling interrupts.
  *
  * Note that the mechanism is the same for UP systems, too; all the SMP stuff
  * becomes nops.
@@ -244,108 +223,26 @@ static inline int types_compatible(mtrr_
 static void
 set_mtrr(unsigned int reg, unsigned long base, unsigned long size, mtrr_type type)
 {
-	struct set_mtrr_data data;
-	unsigned long flags;
-	int cpu;
-
-#ifdef CONFIG_SMP
-	/*
-	 * If we are not yet online, then there can be no stop_machine() in
-	 * parallel. Stop machine ensures this by using get_online_cpus().
-	 *
-	 * If we are online, then we need to prevent a stop_machine() happening
-	 * in parallel by taking the stop cpus mutex.
-	 */
-	if (cpu_online(raw_smp_processor_id()))
-		mutex_lock(&stop_cpus_mutex);
-#endif
-
-	preempt_disable();
-
-	data.smp_reg = reg;
-	data.smp_base = base;
-	data.smp_size = size;
-	data.smp_type = type;
-	atomic_set(&data.count, num_booting_cpus() - 1);
-
-	/* Make sure data.count is visible before unleashing other CPUs */
-	smp_wmb();
-	atomic_set(&data.gate, 0);
-
-	/* Start the ball rolling on other CPUs */
-	for_each_online_cpu(cpu) {
-		struct cpu_stop_work *work = &per_cpu(mtrr_work, cpu);
+	struct set_mtrr_data data = { .smp_reg = reg,
+				      .smp_base = base,
+				      .smp_size = size,
+				      .smp_type = type
+				    };
+
+	stop_machine(mtrr_rendezvous_handler, &data, cpu_online_mask);
+}
+
+static void set_mtrr_from_offline_cpu(unsigned int reg, unsigned long base,
+				      unsigned long size, mtrr_type type)
+{
+	struct set_mtrr_data data = { .smp_reg = reg,
+				      .smp_base = base,
+				      .smp_size = size,
+				      .smp_type = type
+				    };
 
-		if (cpu == smp_processor_id())
-			continue;
-
-		stop_one_cpu_nowait(cpu, mtrr_work_handler, &data, work);
-	}
-
-
-	while (atomic_read(&data.count))
-		cpu_relax();
-
-	/* Ok, reset count and toggle gate */
-	atomic_set(&data.count, num_booting_cpus() - 1);
-	smp_wmb();
-	atomic_set(&data.gate, 1);
-
-	local_irq_save(flags);
-
-	while (atomic_read(&data.count))
-		cpu_relax();
-
-	/* Ok, reset count and toggle gate */
-	atomic_set(&data.count, num_booting_cpus() - 1);
-	smp_wmb();
-	atomic_set(&data.gate, 0);
-
-	/* Do our MTRR business */
-
-	/*
-	 * HACK!
-	 *
-	 * We use this same function to initialize the mtrrs during boot,
-	 * resume, runtime cpu online and on an explicit request to set a
-	 * specific MTRR.
-	 *
-	 * During boot or suspend, the state of the boot cpu's mtrrs has been
-	 * saved, and we want to replicate that across all the cpus that come
-	 * online (either at the end of boot or resume or during a runtime cpu
-	 * online). If we're doing that, @reg is set to something special and on
-	 * this cpu we still do mtrr_if->set_all(). During boot/resume, this
-	 * is unnecessary if at this point we are still on the cpu that started
-	 * the boot/resume sequence. But there is no guarantee that we are still
-	 * on the same cpu. So we do mtrr_if->set_all() on this cpu aswell to be
-	 * sure that we are in sync with everyone else.
-	 */
-	if (reg != ~0U)
-		mtrr_if->set(reg, base, size, type);
-	else
-		mtrr_if->set_all();
-
-	/* Wait for the others */
-	while (atomic_read(&data.count))
-		cpu_relax();
-
-	atomic_set(&data.count, num_booting_cpus() - 1);
-	smp_wmb();
-	atomic_set(&data.gate, 1);
-
-	/*
-	 * Wait here for everyone to have seen the gate change
-	 * So we're the last ones to touch 'data'
-	 */
-	while (atomic_read(&data.count))
-		cpu_relax();
-
-	local_irq_restore(flags);
-	preempt_enable();
-#ifdef CONFIG_SMP
-	if (cpu_online(raw_smp_processor_id()))
-		mutex_unlock(&stop_cpus_mutex);
-#endif
+	stop_machine_from_offline_cpu(mtrr_rendezvous_handler, &data,
+				      cpu_callout_mask);
 }
 
 /**
@@ -799,7 +696,7 @@ void mtrr_ap_init(void)
 	 *   2. cpu hotadd time. We let mtrr_add/del_page hold cpuhotplug
 	 *      lock to prevent mtrr entry changes
 	 */
-	set_mtrr(~0U, 0, 0, 0);
+	set_mtrr_from_offline_cpu(~0U, 0, 0, 0);
 }
 
 /**
Index: linux-2.6-tip/include/linux/stop_machine.h
===================================================================
--- linux-2.6-tip.orig/include/linux/stop_machine.h
+++ linux-2.6-tip/include/linux/stop_machine.h
@@ -27,8 +27,6 @@ struct cpu_stop_work {
 	struct cpu_stop_done	*done;
 };
 
-extern struct mutex stop_cpus_mutex;
-
 int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg);
 void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg,
 			 struct cpu_stop_work *work_buf);
Index: linux-2.6-tip/kernel/stop_machine.c
===================================================================
--- linux-2.6-tip.orig/kernel/stop_machine.c
+++ linux-2.6-tip/kernel/stop_machine.c
@@ -132,8 +132,8 @@ void stop_one_cpu_nowait(unsigned int cp
 	cpu_stop_queue_work(&per_cpu(cpu_stopper, cpu), work_buf);
 }
 
-DEFINE_MUTEX(stop_cpus_mutex);
 /* static data for stop_cpus */
+static DEFINE_MUTEX(stop_cpus_mutex);
 static DEFINE_PER_CPU(struct cpu_stop_work, stop_cpus_work);
 
 static void queue_stop_cpus_work(const struct cpumask *cpumask,



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 1/4] x86, mtrr: lock stop machine during MTRR rendezvous sequence
  2011-06-22 22:20 ` [patch 1/4] x86, mtrr: lock stop machine during MTRR rendezvous sequence Suresh Siddha
@ 2011-06-23  9:05   ` Peter Zijlstra
  2011-06-23  9:33     ` Thomas Gleixner
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2011-06-23  9:05 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: mingo, tglx, hpa, trenn, prarit, tj, rusty, akpm, torvalds,
	linux-kernel, youquan.song, stable

On Wed, 2011-06-22 at 15:20 -0700, Suresh Siddha wrote:
> +#ifdef CONFIG_SMP
> +       /*
> +        * If we are not yet online, then there can be no stop_machine() in
> +        * parallel. Stop machine ensures this by using get_online_cpus().
> +        *
> +        * If we are online, then we need to prevent a stop_machine() happening
> +        * in parallel by taking the stop cpus mutex.
> +        */
> +       if (cpu_online(raw_smp_processor_id()))
> +               mutex_lock(&stop_cpus_mutex);
> +#endif 

This reads like an optimization, is it really worth-while to not take
the mutex in the rare offline case?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 3/4] stop_machine: implement stop_machine_from_offline_cpu()
  2011-06-22 22:20 ` [patch 3/4] stop_machine: implement stop_machine_from_offline_cpu() Suresh Siddha
@ 2011-06-23  9:25   ` Peter Zijlstra
  2011-06-23  9:28     ` Tejun Heo
  2011-06-23 18:19     ` Suresh Siddha
  0 siblings, 2 replies; 15+ messages in thread
From: Peter Zijlstra @ 2011-06-23  9:25 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: mingo, tglx, hpa, trenn, prarit, tj, rusty, akpm, torvalds,
	linux-kernel, youquan.song

On Wed, 2011-06-22 at 15:20 -0700, Suresh Siddha wrote:
> +int stop_machine_from_offline_cpu(int (*fn)(void *), void *data,
> +                                 const struct cpumask *cpus)
> +{
> +       struct stop_machine_data smdata = { .fn = fn, .data = data,
> +                                           .active_cpus = cpus };
> +       struct cpu_stop_done done;
> +       int ret;
> +
> +       /* Local CPU must be offline and CPU hotplug in progress. */
> +       BUG_ON(cpu_online(raw_smp_processor_id()));
> +       smdata.num_threads = num_online_cpus() + 1;     /* +1 for local */
> +
> +       /* No proper task established and can't sleep - busy wait for lock. */
> +       while (!mutex_trylock(&stop_cpus_mutex))
> +               cpu_relax();
> +
> +       /* Schedule work on other CPUs and execute directly for local CPU */
> +       set_state(&smdata, STOPMACHINE_PREPARE);
> +       cpu_stop_init_done(&done, num_online_cpus());
> +       queue_stop_cpus_work(cpu_online_mask, stop_machine_cpu_stop, &smdata,
> +                            &done);
> +       ret = stop_machine_cpu_stop(&smdata);
> +
> +       /* Busy wait for completion. */
> +       while (!completion_done(&done.completion))
> +               cpu_relax();
> +
> +       mutex_unlock(&stop_cpus_mutex);
> +       return ret ?: done.ret;
> +} 

Damn thats ugly, I sure hope you're going to make those hardware folks
pay for this :-)

In commit d0af9eed5aa91b6b7b5049cae69e5ea956fd85c3 you mention that its
specific to HT, wouldn't it make sense to limit the stop-machine use in
the next patch to the sibling mask instead of the whole machine?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 3/4] stop_machine: implement stop_machine_from_offline_cpu()
  2011-06-23  9:25   ` Peter Zijlstra
@ 2011-06-23  9:28     ` Tejun Heo
  2011-06-23  9:31       ` Peter Zijlstra
  2011-06-23 18:19     ` Suresh Siddha
  1 sibling, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2011-06-23  9:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Suresh Siddha, mingo, tglx, hpa, trenn, prarit, rusty, akpm,
	torvalds, linux-kernel, youquan.song

Hello, Peter.

On Thu, Jun 23, 2011 at 11:25:19AM +0200, Peter Zijlstra wrote:
> On Wed, 2011-06-22 at 15:20 -0700, Suresh Siddha wrote:
> > +int stop_machine_from_offline_cpu(int (*fn)(void *), void *data,
> > +                                 const struct cpumask *cpus)
> > +{
> > +       struct stop_machine_data smdata = { .fn = fn, .data = data,
> > +                                           .active_cpus = cpus };
> > +       struct cpu_stop_done done;
> > +       int ret;
> > +
> > +       /* Local CPU must be offline and CPU hotplug in progress. */
> > +       BUG_ON(cpu_online(raw_smp_processor_id()));
> > +       smdata.num_threads = num_online_cpus() + 1;     /* +1 for local */
> > +
> > +       /* No proper task established and can't sleep - busy wait for lock. */
> > +       while (!mutex_trylock(&stop_cpus_mutex))
> > +               cpu_relax();
> > +
> > +       /* Schedule work on other CPUs and execute directly for local CPU */
> > +       set_state(&smdata, STOPMACHINE_PREPARE);
> > +       cpu_stop_init_done(&done, num_online_cpus());
> > +       queue_stop_cpus_work(cpu_online_mask, stop_machine_cpu_stop, &smdata,
> > +                            &done);
> > +       ret = stop_machine_cpu_stop(&smdata);
> > +
> > +       /* Busy wait for completion. */
> > +       while (!completion_done(&done.completion))
> > +               cpu_relax();
> > +
> > +       mutex_unlock(&stop_cpus_mutex);
> > +       return ret ?: done.ret;
> > +} 
> 
> Damn thats ugly, I sure hope you're going to make those hardware folks
> pay for this :-)

Oh, I agree it's fugly.  It's trying to orchestrate stop_machine from
a CPU which doesn't have proper scheduler/task set up.  At least it's
contained in a single relatively short function.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 3/4] stop_machine: implement stop_machine_from_offline_cpu()
  2011-06-23  9:28     ` Tejun Heo
@ 2011-06-23  9:31       ` Peter Zijlstra
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2011-06-23  9:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Suresh Siddha, mingo, tglx, hpa, trenn, prarit, rusty, akpm,
	torvalds, linux-kernel, youquan.song

On Thu, 2011-06-23 at 11:28 +0200, Tejun Heo wrote:
> Hello, Peter.
> 
> On Thu, Jun 23, 2011 at 11:25:19AM +0200, Peter Zijlstra wrote:
> > On Wed, 2011-06-22 at 15:20 -0700, Suresh Siddha wrote:
> > > +int stop_machine_from_offline_cpu(int (*fn)(void *), void *data,
> > > +                                 const struct cpumask *cpus)
> > > +{
> > > +       struct stop_machine_data smdata = { .fn = fn, .data = data,
> > > +                                           .active_cpus = cpus };
> > > +       struct cpu_stop_done done;
> > > +       int ret;
> > > +
> > > +       /* Local CPU must be offline and CPU hotplug in progress. */
> > > +       BUG_ON(cpu_online(raw_smp_processor_id()));
> > > +       smdata.num_threads = num_online_cpus() + 1;     /* +1 for local */
> > > +
> > > +       /* No proper task established and can't sleep - busy wait for lock. */
> > > +       while (!mutex_trylock(&stop_cpus_mutex))
> > > +               cpu_relax();
> > > +
> > > +       /* Schedule work on other CPUs and execute directly for local CPU */
> > > +       set_state(&smdata, STOPMACHINE_PREPARE);
> > > +       cpu_stop_init_done(&done, num_online_cpus());
> > > +       queue_stop_cpus_work(cpu_online_mask, stop_machine_cpu_stop, &smdata,
> > > +                            &done);
> > > +       ret = stop_machine_cpu_stop(&smdata);
> > > +
> > > +       /* Busy wait for completion. */
> > > +       while (!completion_done(&done.completion))
> > > +               cpu_relax();
> > > +
> > > +       mutex_unlock(&stop_cpus_mutex);
> > > +       return ret ?: done.ret;
> > > +} 
> > 
> > Damn thats ugly, I sure hope you're going to make those hardware folks
> > pay for this :-)
> 
> Oh, I agree it's fugly.  It's trying to orchestrate stop_machine from
> a CPU which doesn't have proper scheduler/task set up.  At least it's
> contained in a single relatively short function.

Yeah, no complaints on that grounds, its just sad we need to actually do
it, but given its a hardware constraint we're working with we don't have
much choice.

Just wanted to make sure Suresh (and possibly others) are going to tell
the hardware folks that such constraints are not cool.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 1/4] x86, mtrr: lock stop machine during MTRR rendezvous sequence
  2011-06-23  9:05   ` Peter Zijlstra
@ 2011-06-23  9:33     ` Thomas Gleixner
  2011-06-23  9:41       ` Peter Zijlstra
  2011-06-23 18:16       ` Suresh Siddha
  0 siblings, 2 replies; 15+ messages in thread
From: Thomas Gleixner @ 2011-06-23  9:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Suresh Siddha, mingo, hpa, trenn, prarit, tj, rusty, akpm,
	torvalds, linux-kernel, youquan.song, stable

On Thu, 23 Jun 2011, Peter Zijlstra wrote:

> On Wed, 2011-06-22 at 15:20 -0700, Suresh Siddha wrote:
> > +#ifdef CONFIG_SMP
> > +       /*
> > +        * If we are not yet online, then there can be no stop_machine() in
> > +        * parallel. Stop machine ensures this by using get_online_cpus().
> > +        *
> > +        * If we are online, then we need to prevent a stop_machine() happening
> > +        * in parallel by taking the stop cpus mutex.
> > +        */
> > +       if (cpu_online(raw_smp_processor_id()))
> > +               mutex_lock(&stop_cpus_mutex);
> > +#endif 
> 
> This reads like an optimization, is it really worth-while to not take
> the mutex in the rare offline case?
 
You cannot block on a mutex when you are not online, in fact you
cannot block on it when not active, so the check is wrong anyway.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 1/4] x86, mtrr: lock stop machine during MTRR rendezvous sequence
  2011-06-23  9:33     ` Thomas Gleixner
@ 2011-06-23  9:41       ` Peter Zijlstra
  2011-06-23 18:16       ` Suresh Siddha
  1 sibling, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2011-06-23  9:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Suresh Siddha, mingo, hpa, trenn, prarit, tj, rusty, akpm,
	torvalds, linux-kernel, youquan.song, stable

On Thu, 2011-06-23 at 11:33 +0200, Thomas Gleixner wrote:
> On Thu, 23 Jun 2011, Peter Zijlstra wrote:
> 
> > On Wed, 2011-06-22 at 15:20 -0700, Suresh Siddha wrote:
> > > +#ifdef CONFIG_SMP
> > > +       /*
> > > +        * If we are not yet online, then there can be no stop_machine() in
> > > +        * parallel. Stop machine ensures this by using get_online_cpus().
> > > +        *
> > > +        * If we are online, then we need to prevent a stop_machine() happening
> > > +        * in parallel by taking the stop cpus mutex.
> > > +        */
> > > +       if (cpu_online(raw_smp_processor_id()))
> > > +               mutex_lock(&stop_cpus_mutex);
> > > +#endif 
> > 
> > This reads like an optimization, is it really worth-while to not take
> > the mutex in the rare offline case?
>  
> You cannot block on a mutex when you are not online, in fact you
> cannot block on it when not active, so the check is wrong anyway.

Duh, yeah. Comment totally mislead me.

On that whole active thing, so cpu_active() is brought into life to sort
an cpu-down problem, where we want the lb to stop using a cpu before we
can re-build the sched_domains.

But now we're having trouble because of that on the cpu-up part, where
we update the sched_domains too late (CPU_ONLINE) and hence also set
cpu_active() too late (again CPU_ONLINE).

Couldn't we update the sched_domain tree on CPU_PREPARE_UP to include
the new cpu and then set cpu_active() right along with cpu_online()?

That would also sort your other wait for active while bringup issue..

Note, I'll now go and have my morning juice, so the above might be total
crap.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 1/4] x86, mtrr: lock stop machine during MTRR rendezvous sequence
  2011-06-23  9:33     ` Thomas Gleixner
  2011-06-23  9:41       ` Peter Zijlstra
@ 2011-06-23 18:16       ` Suresh Siddha
  1 sibling, 0 replies; 15+ messages in thread
From: Suresh Siddha @ 2011-06-23 18:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, mingo@elte.hu, hpa@zytor.com, trenn@novell.com,
	prarit@redhat.com, tj@kernel.org, rusty@rustcorp.com.au,
	akpm@linux-foundation.org, torvalds@linux-foundation.org,
	linux-kernel@vger.kernel.org, Song, Youquan, stable@kernel.org

On Thu, 2011-06-23 at 02:33 -0700, Thomas Gleixner wrote:
> On Thu, 23 Jun 2011, Peter Zijlstra wrote:
> 
> > On Wed, 2011-06-22 at 15:20 -0700, Suresh Siddha wrote:
> > > +#ifdef CONFIG_SMP
> > > +       /*
> > > +        * If we are not yet online, then there can be no stop_machine() in
> > > +        * parallel. Stop machine ensures this by using get_online_cpus().
> > > +        *
> > > +        * If we are online, then we need to prevent a stop_machine() happening
> > > +        * in parallel by taking the stop cpus mutex.
> > > +        */
> > > +       if (cpu_online(raw_smp_processor_id()))
> > > +               mutex_lock(&stop_cpus_mutex);
> > > +#endif 
> > 
> > This reads like an optimization, is it really worth-while to not take
> > the mutex in the rare offline case?
>  
> You cannot block on a mutex when you are not online, in fact you
> cannot block on it when not active, so the check is wrong anyway.
> 

Ok. Thanks for educating me on that.

Here we are neither online nor active. So we should be ok. But to be
safe, I changed the online checks to active checks and updated the above
comment also in the new revision.

thanks,
suresh


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 3/4] stop_machine: implement stop_machine_from_offline_cpu()
  2011-06-23  9:25   ` Peter Zijlstra
  2011-06-23  9:28     ` Tejun Heo
@ 2011-06-23 18:19     ` Suresh Siddha
  2011-06-24  7:45       ` Peter Zijlstra
  1 sibling, 1 reply; 15+ messages in thread
From: Suresh Siddha @ 2011-06-23 18:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo@elte.hu, tglx@linutronix.de, hpa@zytor.com,
	trenn@novell.com, prarit@redhat.com, tj@kernel.org,
	rusty@rustcorp.com.au, akpm@linux-foundation.org,
	torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
	Song, Youquan

On Thu, 2011-06-23 at 02:25 -0700, Peter Zijlstra wrote:
> On Wed, 2011-06-22 at 15:20 -0700, Suresh Siddha wrote:
> > +int stop_machine_from_offline_cpu(int (*fn)(void *), void *data,
> > +                                 const struct cpumask *cpus)
> > +{
> > +       struct stop_machine_data smdata = { .fn = fn, .data = data,
> > +                                           .active_cpus = cpus };
> > +       struct cpu_stop_done done;
> > +       int ret;
> > +
> > +       /* Local CPU must be offline and CPU hotplug in progress. */
> > +       BUG_ON(cpu_online(raw_smp_processor_id()));
> > +       smdata.num_threads = num_online_cpus() + 1;     /* +1 for local */
> > +
> > +       /* No proper task established and can't sleep - busy wait for lock. */
> > +       while (!mutex_trylock(&stop_cpus_mutex))
> > +               cpu_relax();
> > +
> > +       /* Schedule work on other CPUs and execute directly for local CPU */
> > +       set_state(&smdata, STOPMACHINE_PREPARE);
> > +       cpu_stop_init_done(&done, num_online_cpus());
> > +       queue_stop_cpus_work(cpu_online_mask, stop_machine_cpu_stop, &smdata,
> > +                            &done);
> > +       ret = stop_machine_cpu_stop(&smdata);
> > +
> > +       /* Busy wait for completion. */
> > +       while (!completion_done(&done.completion))
> > +               cpu_relax();
> > +
> > +       mutex_unlock(&stop_cpus_mutex);
> > +       return ret ?: done.ret;
> > +} 
> 
> Damn thats ugly, I sure hope you're going to make those hardware folks
> pay for this :-)

yeah, with more cores and sockets, this is becoming a pain. We will
bring this up with HW folks.

> 
> In commit d0af9eed5aa91b6b7b5049cae69e5ea956fd85c3 you mention that its
> specific to HT, wouldn't it make sense to limit the stop-machine use in
> the next patch to the sibling mask instead of the whole machine?

That specific issue was seen in the context of HT. But the SDM
guidelines (pre date HT and) are applicable for SMP too.

thanks,
suresh


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 3/4] stop_machine: implement stop_machine_from_offline_cpu()
  2011-06-23 18:19     ` Suresh Siddha
@ 2011-06-24  7:45       ` Peter Zijlstra
  2011-06-24 17:55         ` Suresh Siddha
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2011-06-24  7:45 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: mingo@elte.hu, tglx@linutronix.de, hpa@zytor.com,
	trenn@novell.com, prarit@redhat.com, tj@kernel.org,
	rusty@rustcorp.com.au, akpm@linux-foundation.org,
	torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
	Song, Youquan

On Thu, 2011-06-23 at 11:19 -0700, Suresh Siddha wrote:
> > In commit d0af9eed5aa91b6b7b5049cae69e5ea956fd85c3 you mention that its
> > specific to HT, wouldn't it make sense to limit the stop-machine use in
> > the next patch to the sibling mask instead of the whole machine?
> 
> That specific issue was seen in the context of HT. But the SDM
> guidelines (pre date HT and) are applicable for SMP too. 

Sure, but we managed to ignore those long enough, could we not continue
to violate them and keep to the minimum that is working in practice?

>From what I understand the explosion is WSM+/VMX on HT only because the
siblings share state, or do we have proof that it yields problems
between cores as well?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 3/4] stop_machine: implement stop_machine_from_offline_cpu()
  2011-06-24  7:45       ` Peter Zijlstra
@ 2011-06-24 17:55         ` Suresh Siddha
  0 siblings, 0 replies; 15+ messages in thread
From: Suresh Siddha @ 2011-06-24 17:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo@elte.hu, tglx@linutronix.de, hpa@zytor.com,
	trenn@novell.com, prarit@redhat.com, tj@kernel.org,
	rusty@rustcorp.com.au, akpm@linux-foundation.org,
	torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
	Song, Youquan

On Fri, 2011-06-24 at 00:45 -0700, Peter Zijlstra wrote:
> On Thu, 2011-06-23 at 11:19 -0700, Suresh Siddha wrote:
> > > In commit d0af9eed5aa91b6b7b5049cae69e5ea956fd85c3 you mention that its
> > > specific to HT, wouldn't it make sense to limit the stop-machine use in
> > > the next patch to the sibling mask instead of the whole machine?
> > 
> > That specific issue was seen in the context of HT. But the SDM
> > guidelines (pre date HT and) are applicable for SMP too. 
> 
> Sure, but we managed to ignore those long enough, could we not continue
> to violate them and keep to the minimum that is working in practice?

No we didn't violate all of them.

There are two paths where we do the rendezvous. One is during cpu online
(either boot, hotplug, suspend/resume etc) where we init the MTRR
registers and another is when MTRR's change (through /proc/mtrr
interface) during runtime.

Before the commit 'd0af9eed5aa91b6b7b5049cae69e5ea956fd85c3' we were
indeed doing rendezvous of all the cpu's when we were dynamically
changing MTRR's.

And even in older kernels prior to 2.6.11 or so, we were doing
rendezvous on all occasions. And during cpu hotplug code revamp, we
broke the MTRR rendezvous for online paths which led to the WSM issue we
fixed in the commit 'd0af9eed5aa91b6b7b5049cae69e5ea956fd85c3'.

> From what I understand the explosion is WSM+/VMX on HT only because the
> siblings share state, or do we have proof that it yields problems
> between cores as well?

During dynamic MTRR register changes, I believe rendezvous of all the
cpu's is more critical, otherwise, there can be multiple cpu's accessing
the same memory location with different memory attributes and that can
potentially lead to memory corruption, machine-check etc.

During cpu online, I think we can be less aggressive (like perhaps
rendezvous only some of the HT/core siblings etc).

This is what we are planning to bring up with the cpu folks and see what
we can cut down and what is essential.

thanks,
suresh

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-06-24 17:56 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-22 22:20 [patch 0/4] MTRR rendezvous deadlock fix and cleanups using stop_machine() Suresh Siddha
2011-06-22 22:20 ` [patch 1/4] x86, mtrr: lock stop machine during MTRR rendezvous sequence Suresh Siddha
2011-06-23  9:05   ` Peter Zijlstra
2011-06-23  9:33     ` Thomas Gleixner
2011-06-23  9:41       ` Peter Zijlstra
2011-06-23 18:16       ` Suresh Siddha
2011-06-22 22:20 ` [patch 2/4] stop_machine: reorganize stop_cpus() implementation Suresh Siddha
2011-06-22 22:20 ` [patch 3/4] stop_machine: implement stop_machine_from_offline_cpu() Suresh Siddha
2011-06-23  9:25   ` Peter Zijlstra
2011-06-23  9:28     ` Tejun Heo
2011-06-23  9:31       ` Peter Zijlstra
2011-06-23 18:19     ` Suresh Siddha
2011-06-24  7:45       ` Peter Zijlstra
2011-06-24 17:55         ` Suresh Siddha
2011-06-22 22:20 ` [patch 4/4] x86, mtrr: use stop_machine() for doing MTRR rendezvous Suresh Siddha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox