Linux Power Management development

Linux Power Management development
 help / color / mirror / Atom feed

* [RFC PATCH v3 7/9] yield_to(), cpu-hotplug: Prevent offlining of other CPUs properly
From: Srivatsa S. Bhat @ 2012-12-07 17:39 UTC (permalink / raw)
  To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, oleg
  Cc: sbw, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121207173702.27305.1486.stgit@srivatsabhat.in.ibm.com>

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on local_irq_save() to prevent CPUs from going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/sched/core.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cff7656..4b982bf 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4312,6 +4312,7 @@ bool __sched yield_to(struct task_struct *p, bool preempt)
 	unsigned long flags;
 	bool yielded = 0;
 
+	get_online_cpus_atomic();
 	local_irq_save(flags);
 	rq = this_rq();
 
@@ -4339,13 +4340,14 @@ again:
 		 * Make p's CPU reschedule; pick_next_entity takes care of
 		 * fairness.
 		 */
-		if (preempt && rq != p_rq)
+		if (preempt && rq != p_rq && cpu_online(task_cpu(p)))
 			resched_task(p_rq->curr);
 	}
 
 out:
 	double_rq_unlock(rq, p_rq);
 	local_irq_restore(flags);
+	put_online_cpus_atomic();
 
 	if (yielded)
 		schedule();

^ permalink raw reply related

* [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Srivatsa S. Bhat @ 2012-12-07 17:38 UTC (permalink / raw)
  To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, oleg
  Cc: sbw, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121207173702.27305.1486.stgit@srivatsabhat.in.ibm.com>

There are places where preempt_disable() is used to prevent any CPU from
going offline during the critical section. Let us call them as "atomic
hotplug readers" ("atomic" because they run in atomic contexts).

Today, preempt_disable() works because the writer uses stop_machine().
But once stop_machine() is gone, the readers won't be able to prevent
CPUs from going offline using preempt_disable().

The intent of this patch is to provide synchronization APIs for such
atomic hotplug readers, to prevent (any) CPUs from going offline, without
depending on stop_machine() at the writer-side. The new APIs will look
something like this:  get/put_online_cpus_atomic()

Some important design requirements and considerations:
-----------------------------------------------------

1. Scalable synchronization at the reader-side, especially in the fast-path

   Any synchronization at the atomic hotplug readers side must be highly
   scalable - avoid global single-holder locks/counters etc. Because, these
   paths currently use the extremely fast preempt_disable(); our replacement
   to preempt_disable() should not become ridiculously costly and also should
   not serialize the readers among themselves needlessly.

   At a minimum, the new APIs must be extremely fast at the reader side
   atleast in the fast-path, when no CPU offline writers are active.

2. preempt_disable() was recursive. The replacement should also be recursive.

3. No (new) lock-ordering restrictions

   preempt_disable() was super-flexible. It didn't impose any ordering
   restrictions or rules for nesting. Our replacement should also be equally
   flexible and usable.

4. No deadlock possibilities

   Per-cpu locking is not the way to go if we want to have relaxed rules
   for lock-ordering. Because, we can end up in circular-locking dependencies
   as explained in https://lkml.org/lkml/2012/12/6/290

   So, avoid per-cpu locking schemes (per-cpu locks/per-cpu atomic counters
   with spin-on-contention etc) as much as possible.

Implementation of the design:
----------------------------

We use global rwlocks for synchronization, because then we won't get into
lock-ordering related problems (unlike per-cpu locks). However, global
rwlocks lead to unnecessary cache-line bouncing even when there are no
hotplug writers present, which can slow down the system needlessly.

Per-cpu counters can help solve the cache-line bouncing problem. So we
actually use the best of both: per-cpu counters (no-waiting) at the reader
side in the fast-path, and global rwlocks in the slowpath.

[ Fastpath = no writer is active; Slowpath = a writer is active ]

IOW, the hotplug readers just increment/decrement their per-cpu refcounts
when no writer is active. When a writer becomes active, he signals all
readers to switch to global rwlocks for the duration of the CPU offline
operation. The readers ACK the writer's signal when it is safe for them
(ie., when they are about to start a fresh, non-nested read-side critical
section) and start using (holding) the global rwlock for read in their
subsequent critical sections.

The hotplug writer waits for ACKs from every reader, and then acquires
the global rwlock for write and takes the CPU offline. Then the writer
signals all readers that the CPU offline is done, and that they can go back
to using their per-cpu refcounts again.

Note that the lock-safety (despite the per-cpu scheme) comes from the fact
that the readers can *choose* when to ACK the writer's signal (iow, when
to switch to the global rwlock). And the readers don't wait on anybody based
on the per-cpu counters. The only true synchronization that involves waiting
at the reader-side in this scheme, is the one arising from the global rwlock,
which is safe from the circular locking dependency problems mentioned above
(because it is global).

Reader-writer locks and per-cpu counters are recursive, so they can be
used in a nested fashion in the reader-path. And the "signal-and-ack-with-no-
reader-spinning" design of switching the synchronization scheme ensures that
you can safely nest and call these APIs in any way you want, just like
preempt_disable()/enable.

Together, these satisfy all the requirements mentioned above.

I'm indebted to Michael Wang and Xiao Guangrong for their numerous thoughtful
suggestions and ideas, which inspired and influenced many of the decisions in
this as well as previous designs. Thanks a lot Michael and Xiao!

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/cpu.h |    4 +
 kernel/cpu.c        |  252 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 253 insertions(+), 3 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index ce7a074..cf24da1 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -175,6 +175,8 @@ extern struct bus_type cpu_subsys;

 extern void get_online_cpus(void);
 extern void put_online_cpus(void);
+extern void get_online_cpus_atomic(void);
+extern void put_online_cpus_atomic(void);
 #define hotcpu_notifier(fn, pri)	cpu_notifier(fn, pri)
 #define register_hotcpu_notifier(nb)	register_cpu_notifier(nb)
 #define unregister_hotcpu_notifier(nb)	unregister_cpu_notifier(nb)
@@ -198,6 +200,8 @@ static inline void cpu_hotplug_driver_unlock(void)

 #define get_online_cpus()	do { } while (0)
 #define put_online_cpus()	do { } while (0)
+#define get_online_cpus_atomic()	do { } while (0)
+#define put_online_cpus_atomic()	do { } while (0)
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 42bd331..d2076fa 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -19,6 +19,7 @@
 #include <linux/mutex.h>
 #include <linux/gfp.h>
 #include <linux/suspend.h>
+#include <linux/atomic.h>

 #include "smpboot.h"

@@ -133,6 +134,161 @@ static void cpu_hotplug_done(void)
 	mutex_unlock(&cpu_hotplug.lock);
 }

+/*
+ * Reader-writer lock to synchronize between atomic hotplug readers
+ * and the CPU offline hotplug writer.
+ */
+static DEFINE_RWLOCK(hotplug_rwlock);
+
+static DEFINE_PER_CPU(atomic_t, reader_refcount);
+static DEFINE_PER_CPU(atomic_t, writer_signal);
+
+static inline void ack_writer_signal(void)
+{
+	/* Writer raised it to 2. Reduce it to 1 */
+	atomic_dec(&__get_cpu_var(writer_signal));
+	smp_mb__after_atomic_dec();
+}
+
+
+#define reader_active(cpu)						\
+			(atomic_read(&per_cpu(reader_refcount, cpu)) > 0)
+
+#define writer_active(cpu)						\
+			(atomic_read(&per_cpu(writer_signal, cpu)) > 0)
+
+#define is_new_writer(cpu)						\
+			(atomic_read(&per_cpu(writer_signal, cpu)) == 2)
+
+#define reader_acked(cpu)						\
+			(atomic_read(&per_cpu(writer_signal, cpu)) == 1)
+
+
+static inline void mark_reader_fastpath(void)
+{
+	atomic_inc(&__get_cpu_var(reader_refcount));
+	smp_mb__after_atomic_inc();
+}
+
+static inline void unmark_reader_fastpath(void)
+{
+	atomic_dec(&__get_cpu_var(reader_refcount));
+	smp_mb__after_atomic_dec();
+}
+
+static inline void mark_reader_slowpath(void)
+{
+	unsigned int cpu = smp_processor_id();
+
+	mark_reader_fastpath();
+
+	read_lock(&hotplug_rwlock);
+
+	/*
+	 * Release the lock before returning, if the writer has finished.
+	 * This will help reduce the burden on put_online_cpus_atomic().
+	 */
+	if (!writer_active(cpu))
+		read_unlock(&hotplug_rwlock);
+}
+
+/*
+ * Invoked by hotplug reader, to prevent CPUs from going offline.
+ *
+ * If there are no CPU offline writers active, just increment the
+ * per-cpu counter 'reader_refcount' and proceed.
+ *
+ * If a CPU offline hotplug writer is active, we need to ACK his
+ * signal and switch to using the global rwlocks, when the time is
+ * right.
+ *
+ * It is not safe to switch the synchronization scheme when we are
+ * already in a read-side critical section (because their corresponding
+ * put_online_cpus_atomic() won't know of the switch and will continue
+ * to use the old scheme (per-cpu counter updates), which would be
+ * wrong).
+ *
+ * So don't ACK the writer's signal (and don't switch to rwlocks) unless
+ * this is a fresh non-nested read-side critical section on this CPU.
+ *
+ * Once you switch, keep using the rwlocks for synchronization, until
+ * the writer signals the end of CPU offline.
+ *
+ * You can call this recursively, without fear of locking problems.
+ *
+ * Returns with preemption disabled.
+ */
+void get_online_cpus_atomic(void)
+{
+	unsigned int cpu = smp_processor_id();
+	unsigned long flags;
+
+	preempt_disable();
+	local_irq_save(flags);
+
+	if (cpu_hotplug.active_writer == current)
+		goto out;
+
+	smp_rmb(); /* Paired with smp_wmb() in drop_writer_signal() */
+
+	if (likely(!writer_active(cpu))) {
+		mark_reader_fastpath();
+		goto out;
+	}
+
+	/* CPU offline writer is active... */
+
+	/* ACK his signal only once */
+	if (is_new_writer(cpu)) {
+		/*
+		 * ACK the writer's signal only if this is a fresh read-side
+		 * critical section, and not just an extension of a running
+		 * (nested) read-side critical section.
+		 */
+		if (!reader_active(cpu)) {
+			ack_writer_signal();
+			mark_reader_slowpath();
+		} else {
+			/*
+			 * Keep taking the fastpath until all existing nested
+			 * read-side critical sections complete on this CPU.
+			 */
+			mark_reader_fastpath();
+		}
+	} else {
+		/*
+		 * The writer is not new because we have already acked his
+		 * signal before. So just take the slowpath and proceed.
+		 */
+		mark_reader_slowpath();
+	}
+
+out:
+	local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(get_online_cpus_atomic);
+
+void put_online_cpus_atomic(void)
+{
+	unsigned int cpu = smp_processor_id();
+	unsigned long flags;
+
+	local_irq_save(flags);
+
+	if (cpu_hotplug.active_writer == current)
+		goto out;
+
+	if (unlikely(writer_active(cpu)) && reader_acked(cpu))
+		read_unlock(&hotplug_rwlock);
+
+	unmark_reader_fastpath();
+
+out:
+	local_irq_restore(flags);
+	preempt_enable();
+}
+EXPORT_SYMBOL_GPL(put_online_cpus_atomic);
+
 #else /* #if CONFIG_HOTPLUG_CPU */
 static void cpu_hotplug_begin(void) {}
 static void cpu_hotplug_done(void) {}
@@ -237,6 +393,66 @@ static inline void check_for_tasks(int cpu)
 	write_unlock_irq(&tasklist_lock);
 }

+/* Set the per-cpu variable writer_signal to 2 */
+static inline void raise_writer_signal(unsigned int cpu)
+{
+	atomic_add(2, &per_cpu(writer_signal, cpu));
+	smp_mb__after_atomic_inc();
+}
+
+/* Reset the per-cpu variable writer_signal to 0 */
+static inline void drop_writer_signal(unsigned int cpu)
+{
+	atomic_set(&per_cpu(writer_signal, cpu), 0);
+	smp_wmb(); /* Paired with smp_rmb() in get_online_cpus_atomic() */
+}
+
+static void announce_cpu_offline_begin(void)
+{
+	unsigned int cpu;
+
+	for_each_online_cpu(cpu)
+		raise_writer_signal(cpu);
+}
+
+static void announce_cpu_offline_end(unsigned int dead_cpu)
+{
+	unsigned int cpu;
+
+	drop_writer_signal(dead_cpu);
+
+	for_each_online_cpu(cpu)
+		drop_writer_signal(cpu);
+}
+
+/*
+ * Check if the reader saw and acked the writer's signal.
+ * If he has, return immediately.
+ *
+ * If he hasn't acked it, there are 2 possibilities:
+ *  a. He is not an active reader; so we can return safely.
+ *  b. He is an active reader, so we need to retry until
+ *     he acks the writer's signal.
+ */
+static void sync_atomic_reader(unsigned int cpu)
+{
+
+retry:
+	if (reader_acked(cpu))
+		return;
+
+	if (reader_active(cpu))
+		goto retry;
+}
+
+static void sync_all_readers(void)
+{
+	unsigned int cpu;
+
+	for_each_online_cpu(cpu)
+		sync_atomic_reader(cpu);
+}
+
 struct take_cpu_down_param {
 	unsigned long mod;
 	void *hcpu;
@@ -246,15 +462,45 @@ struct take_cpu_down_param {
 static int __ref take_cpu_down(void *_param)
 {
 	struct take_cpu_down_param *param = _param;
-	int err;
+	unsigned long flags;
+	unsigned int cpu = (long)(param->hcpu);
+	int err = 0;
+
+
+	/*
+	 * Inform all atomic readers that we are going to offline a CPU
+	 * and that they need to switch from per-cpu counters to the
+	 * global hotplug_rwlock.
+	 */
+	announce_cpu_offline_begin();
+
+	/* Wait for every reader to notice the announcement and ACK it */
+	sync_all_readers();
+
+	/*
+	 * Now all the readers have switched to using the global hotplug_rwlock.
+	 * So now is our chance, go bring down the CPU!
+	 */
+
+	write_lock_irqsave(&hotplug_rwlock, flags);

 	/* Ensure this CPU doesn't handle any more interrupts. */
 	err = __cpu_disable();
 	if (err < 0)
-		return err;
+		goto out;

 	cpu_notify(CPU_DYING | param->mod, param->hcpu);
-	return 0;
+
+out:
+	/*
+	 * Inform all atomic readers that we are done with the CPU offline
+	 * operation, so that they can switch back to their per-cpu counters.
+	 * (We don't need to wait for them to see it).
+	 */
+	announce_cpu_offline_end(cpu);
+
+	write_unlock_irqrestore(&hotplug_rwlock, flags);
+	return err;
 }

 /* Requires cpu_add_remove_lock to be held */

^ permalink raw reply related

* [RFC PATCH v3 6/9] kick_process(), cpu-hotplug: Prevent offlining of target CPU properly
From: Srivatsa S. Bhat @ 2012-12-07 17:39 UTC (permalink / raw)
  To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, oleg
  Cc: sbw, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121207173702.27305.1486.stgit@srivatsabhat.in.ibm.com>

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() to prevent CPUs from going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/sched/core.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f51e0aa..cff7656 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1091,11 +1091,11 @@ void kick_process(struct task_struct *p)
 {
 	int cpu;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	cpu = task_cpu(p);
 	if ((cpu != smp_processor_id()) && task_curr(p))
 		smp_send_reschedule(cpu);
-	preempt_enable();
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL_GPL(kick_process);
 #endif /* CONFIG_SMP */

^ permalink raw reply related

* [RFC PATCH v3 0/9] CPU hotplug: stop_machine()-free CPU hotplug
From: Srivatsa S. Bhat @ 2012-12-07 17:37 UTC (permalink / raw)
  To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, oleg
  Cc: sbw, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel

Hi,

This patchset removes CPU hotplug's dependence on stop_machine() from the CPU
offline path and provides an alternative (set of APIs) to preempt_disable() to
prevent CPUs from going offline, which can be invoked from atomic context.

This is an RFC patchset with only a few call-sites of preempt_disable()
converted to the new APIs for now, and the main goal is to get feedback on the
design of the new atomic APIs and see if it serves as a viable replacement for
stop_machine()-free CPU hotplug. A brief description of the algorithm is
available in the "Changes in vN" section.

Overview of the patches:
-----------------------

Patch 1 introduces the new APIs that can be used from atomic context, to
prevent CPUs from going offline.

Patch 2 is a cleanup; it converts preprocessor macros to static inline
functions.

Patches 3 to 8 convert various call-sites to use the new APIs.

Patch 9 is the one which actually removes stop_machine() from the CPU
offline path.

Changes in v3:
--------------
* Dropped the _light() and _full() variants of the APIs. Provided a single
  interface: get/put_online_cpus_atomic().

* Completely redesigned the synchronization mechanism again, to make it
  fast and scalable at the reader-side in the fast-path (when no hotplug
  writers are active). This new scheme also ensures that there is no
  possibility of deadlocks due to circular locking dependency.
  In summary, this provides the scalability and speed of per-cpu rwlocks
  (without actually using them), while avoiding the downside (deadlock
  possibilities) which is inherent in any per-cpu locking scheme that is
  meant to compete with preempt_disable()/enable() in terms of flexibility.

  The problem with using per-cpu locking to replace preempt_disable()/enable
  was explained here:
  https://lkml.org/lkml/2012/12/6/290

  Basically we use per-cpu counters (for scalability) when no writers are
  active, and then switch to global rwlocks (for lock-safety) when a writer
  becomes active. It is a slightly complex scheme, but it is based on
  standard principles of distributed algorithms.

Changes in v2:
-------------
* Completely redesigned the synchronization scheme to avoid using any extra
  cpumasks.

* Provided APIs for 2 types of atomic hotplug readers: "light" (for
  light-weight) and "full". We wish to have more "light" readers than
  the "full" ones, to avoid indirectly inducing the "stop_machine effect"
  without even actually using stop_machine().

  And the patches show that it _is_ generally true: 5 patches deal with
  "light" readers, whereas only 1 patch deals with a "full" reader.

  Also, the "light" readers happen to be in very hot paths. So it makes a
  lot of sense to have such a distinction and a corresponding light-weight
  API.

Links to previous versions:
v2: https://lkml.org/lkml/2012/12/5/322
v1: https://lkml.org/lkml/2012/12/4/88

Comments and suggestions welcome!

--
 Paul E. McKenney (1):
      cpu: No more __stop_machine() in _cpu_down()

Srivatsa S. Bhat (8):
      CPU hotplug: Provide APIs to prevent CPU offline from atomic context
      CPU hotplug: Convert preprocessor macros to static inline functions
      smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly
      smp, cpu hotplug: Fix on_each_cpu_*() to prevent CPU offline properly
      sched, cpu hotplug: Use stable online cpus in try_to_wake_up() & select_task_rq()
      kick_process(), cpu-hotplug: Prevent offlining of target CPU properly
      yield_to(), cpu-hotplug: Prevent offlining of other CPUs properly
      kvm, vmx: Add atomic synchronization with CPU Hotplug

  arch/x86/kvm/vmx.c  |    8 +-
 include/linux/cpu.h |    8 +-
 kernel/cpu.c        |  254 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 kernel/sched/core.c |   22 ++++
 kernel/smp.c        |   63 ++++++++-----
 5 files changed, 321 insertions(+), 34 deletions(-)

Thanks,
Srivatsa S. Bhat
IBM Linux Technology Center

^ permalink raw reply

* [RFC PATCH v3 4/9] smp, cpu hotplug: Fix on_each_cpu_*() to prevent CPU offline properly
From: Srivatsa S. Bhat @ 2012-12-07 17:39 UTC (permalink / raw)
  To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, oleg
  Cc: sbw, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121207173702.27305.1486.stgit@srivatsabhat.in.ibm.com>

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() to prevent CPUs from going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/smp.c |   25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index ce1a866..0031000 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -688,12 +688,12 @@ int on_each_cpu(void (*func) (void *info), void *info, int wait)
 	unsigned long flags;
 	int ret = 0;
 
-	preempt_disable();
+	get_online_cpus_atomic();
 	ret = smp_call_function(func, info, wait);
 	local_irq_save(flags);
 	func(info);
 	local_irq_restore(flags);
-	preempt_enable();
+	put_online_cpus_atomic();
 	return ret;
 }
 EXPORT_SYMBOL(on_each_cpu);
@@ -715,7 +715,11 @@ EXPORT_SYMBOL(on_each_cpu);
 void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
 			void *info, bool wait)
 {
-	int cpu = get_cpu();
+	int cpu;
+
+	get_online_cpus_atomic();
+
+	cpu = smp_processor_id();
 
 	smp_call_function_many(mask, func, info, wait);
 	if (cpumask_test_cpu(cpu, mask)) {
@@ -723,7 +727,7 @@ void on_each_cpu_mask(const struct cpumask *mask, smp_call_func_t func,
 		func(info);
 		local_irq_enable();
 	}
-	put_cpu();
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL(on_each_cpu_mask);
 
@@ -748,8 +752,9 @@ EXPORT_SYMBOL(on_each_cpu_mask);
  * The function might sleep if the GFP flags indicates a non
  * atomic allocation is allowed.
  *
- * Preemption is disabled to protect against CPUs going offline but not online.
- * CPUs going online during the call will not be seen or sent an IPI.
+ * We use get/put_online_cpus_atomic() to prevent CPUs from going
+ * offline in-between our operation. CPUs coming online during the
+ * call will not be seen or sent an IPI.
  *
  * You must not call this function with disabled interrupts or
  * from a hardware interrupt handler or from a bottom half handler.
@@ -764,26 +769,26 @@ void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info),
 	might_sleep_if(gfp_flags & __GFP_WAIT);
 
 	if (likely(zalloc_cpumask_var(&cpus, (gfp_flags|__GFP_NOWARN)))) {
-		preempt_disable();
+		get_online_cpus_atomic();
 		for_each_online_cpu(cpu)
 			if (cond_func(cpu, info))
 				cpumask_set_cpu(cpu, cpus);
 		on_each_cpu_mask(cpus, func, info, wait);
-		preempt_enable();
+		put_online_cpus_atomic();
 		free_cpumask_var(cpus);
 	} else {
 		/*
 		 * No free cpumask, bother. No matter, we'll
 		 * just have to IPI them one by one.
 		 */
-		preempt_disable();
+		get_online_cpus_atomic();
 		for_each_online_cpu(cpu)
 			if (cond_func(cpu, info)) {
 				ret = smp_call_function_single(cpu, func,
 								info, wait);
 				WARN_ON_ONCE(!ret);
 			}
-		preempt_enable();
+		put_online_cpus_atomic();
 	}
 }
 EXPORT_SYMBOL(on_each_cpu_cond);

^ permalink raw reply related

* [RFC PATCH v3 3/9] smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly
From: Srivatsa S. Bhat @ 2012-12-07 17:38 UTC (permalink / raw)
  To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, oleg
  Cc: sbw, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121207173702.27305.1486.stgit@srivatsabhat.in.ibm.com>

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() to prevent CPUs from going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/smp.c |   38 +++++++++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 29dd40a..ce1a866 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -310,7 +310,8 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 	 * prevent preemption and reschedule on another processor,
 	 * as well as CPU removal
 	 */
-	this_cpu = get_cpu();
+	get_online_cpus_atomic();
+	this_cpu = smp_processor_id();
 
 	/*
 	 * Can deadlock when called with interrupts disabled.
@@ -342,7 +343,7 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 		}
 	}
 
-	put_cpu();
+	put_online_cpus_atomic();
 
 	return err;
 }
@@ -371,8 +372,10 @@ int smp_call_function_any(const struct cpumask *mask,
 	const struct cpumask *nodemask;
 	int ret;
 
+	get_online_cpus_atomic();
 	/* Try for same CPU (cheapest) */
-	cpu = get_cpu();
+	cpu = smp_processor_id();
+
 	if (cpumask_test_cpu(cpu, mask))
 		goto call;
 
@@ -388,7 +391,7 @@ int smp_call_function_any(const struct cpumask *mask,
 	cpu = cpumask_any_and(mask, cpu_online_mask);
 call:
 	ret = smp_call_function_single(cpu, func, info, wait);
-	put_cpu();
+	put_online_cpus_atomic();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(smp_call_function_any);
@@ -409,14 +412,17 @@ void __smp_call_function_single(int cpu, struct call_single_data *data,
 	unsigned int this_cpu;
 	unsigned long flags;
 
-	this_cpu = get_cpu();
+	get_online_cpus_atomic();
+
+	this_cpu = smp_processor_id();
+
 	/*
 	 * Can deadlock when called with interrupts disabled.
 	 * We allow cpu's that are not yet online though, as no one else can
 	 * send smp call function interrupt to this cpu and as such deadlocks
 	 * can't happen.
 	 */
-	WARN_ON_ONCE(cpu_online(smp_processor_id()) && wait && irqs_disabled()
+	WARN_ON_ONCE(cpu_online(this_cpu) && wait && irqs_disabled()
 		     && !oops_in_progress);
 
 	if (cpu == this_cpu) {
@@ -427,7 +433,7 @@ void __smp_call_function_single(int cpu, struct call_single_data *data,
 		csd_lock(data);
 		generic_exec_single(cpu, data, wait);
 	}
-	put_cpu();
+	put_online_cpus_atomic();
 }
 
 /**
@@ -451,6 +457,8 @@ void smp_call_function_many(const struct cpumask *mask,
 	unsigned long flags;
 	int refs, cpu, next_cpu, this_cpu = smp_processor_id();
 
+	get_online_cpus_atomic();
+
 	/*
 	 * Can deadlock when called with interrupts disabled.
 	 * We allow cpu's that are not yet online though, as no one else can
@@ -467,17 +475,18 @@ void smp_call_function_many(const struct cpumask *mask,
 
 	/* No online cpus?  We're done. */
 	if (cpu >= nr_cpu_ids)
-		return;
+		goto out_unlock;
 
 	/* Do we have another CPU which isn't us? */
 	next_cpu = cpumask_next_and(cpu, mask, cpu_online_mask);
 	if (next_cpu == this_cpu)
-		next_cpu = cpumask_next_and(next_cpu, mask, cpu_online_mask);
+		next_cpu = cpumask_next_and(next_cpu, mask,
+						cpu_online_mask);
 
 	/* Fastpath: do that cpu by itself. */
 	if (next_cpu >= nr_cpu_ids) {
 		smp_call_function_single(cpu, func, info, wait);
-		return;
+		goto out_unlock;
 	}
 
 	data = &__get_cpu_var(cfd_data);
@@ -523,7 +532,7 @@ void smp_call_function_many(const struct cpumask *mask,
 	/* Some callers race with other cpus changing the passed mask */
 	if (unlikely(!refs)) {
 		csd_unlock(&data->csd);
-		return;
+		goto out_unlock;
 	}
 
 	raw_spin_lock_irqsave(&call_function.lock, flags);
@@ -554,6 +563,9 @@ void smp_call_function_many(const struct cpumask *mask,
 	/* Optionally wait for the CPUs to complete */
 	if (wait)
 		csd_lock_wait(&data->csd);
+
+out_unlock:
+	put_online_cpus_atomic();
 }
 EXPORT_SYMBOL(smp_call_function_many);
 
@@ -574,9 +586,9 @@ EXPORT_SYMBOL(smp_call_function_many);
  */
 int smp_call_function(smp_call_func_t func, void *info, int wait)
 {
-	preempt_disable();
+	get_online_cpus_atomic();
 	smp_call_function_many(cpu_online_mask, func, info, wait);
-	preempt_enable();
+	put_online_cpus_atomic();
 
 	return 0;
 }

^ permalink raw reply related

* [RFC PATCH v3 2/9] CPU hotplug: Convert preprocessor macros to static inline functions
From: Srivatsa S. Bhat @ 2012-12-07 17:38 UTC (permalink / raw)
  To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, oleg
  Cc: sbw, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121207173702.27305.1486.stgit@srivatsabhat.in.ibm.com>

On 12/05/2012 06:10 AM, Andrew Morton wrote:
"static inline C functions would be preferred if possible.  Feel free to
fix up the wrong crufty surrounding code as well ;-)"

Convert the macros in the CPU hotplug code to static inline C functions.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/cpu.h |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index cf24da1..eb79f47 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -198,10 +198,10 @@ static inline void cpu_hotplug_driver_unlock(void)
 
 #else		/* CONFIG_HOTPLUG_CPU */
 
-#define get_online_cpus()	do { } while (0)
-#define put_online_cpus()	do { } while (0)
-#define get_online_cpus_atomic()	do { } while (0)
-#define put_online_cpus_atomic()	do { } while (0)
+static inline void get_online_cpus(void) {}
+static inline void put_online_cpus(void) {}
+static inline void get_online_cpus_atomic(void) {}
+static inline void put_online_cpus_atomic(void) {}
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })

^ permalink raw reply related

* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline
From: Srivatsa S. Bhat @ 2012-12-07 17:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Oleg Nesterov, tj, tglx, peterz, paulmck, rusty, mingo, akpm,
	namhyung, vincent.guittot, sbw, amit.kucheria, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <1354831350.17101.31.camel@gandalf.local.home>

On 12/07/2012 03:32 AM, Steven Rostedt wrote:
> On Fri, 2012-12-07 at 01:06 +0530, Srivatsa S. Bhat wrote:
>> The root-cause of this deadlock is again lock-ordering mismatch right?
>> CPU0 takes locks in order A, B
>> CPU1 takes locks in order B, A
>>
>> And the writer facilitates in actually getting deadlocked.
>>
>> I avoid this in this patchset by always taking the locks in the same
>> order. So we won't be deadlocking like this.
> 
> OK, I haven't looked closely at the patch yet. I'm currently hacking on
> my own problems. But just from the description above, it looked like you
> were using rw_locks() to be able to inverse the order of the locks.
> 

Ah, ok, no problem! I'd be grateful if you could take a look when you
are free :-) I'll post a v3 soon, which has a completely redesigned
synchronization scheme, to address the performance concerns related to
global rwlocks that Tejun raised.

Regards,
Srivatsa S. Bhat


^ permalink raw reply

* Re: Race condition between driver_probe_device and device_shutdown
From: Ming Lei @ 2012-12-07 16:27 UTC (permalink / raw)
  To: Alan Stern
  Cc: Wedson Almeida Filho, Greg KH, Eric W. Biederman, Andrew Morton,
	linux-kernel, Linux PM List
In-Reply-To: <Pine.LNX.4.44L0.1212071023090.1565-100000@iolanthe.rowland.org>

On Fri, Dec 7, 2012 at 11:25 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Fri, 7 Dec 2012, Ming Lei wrote:
>> device_pm_lock() can prevent device_move() from being running.
>
> That wouldn't prevent problems during unbinding.  Wedson is right; the

Right.

> places that lock dev->parent must save a local copy of dev->parent.

One simple fix is to remove acquiring/releasing dev->parent lock
and get/put the parent reference, because holding the device lock only
may avoid the race between probe/remove and shutdown.

But __driver_attach and __device_release_driver need
save the local copy of dev->parent.

Thanks,
--
Ming Lei

^ permalink raw reply

* Re: Race condition between driver_probe_device and device_shutdown
From: Alan Stern @ 2012-12-07 15:25 UTC (permalink / raw)
  To: Ming Lei
  Cc: Wedson Almeida Filho, Greg KH, Eric W. Biederman, Andrew Morton,
	linux-kernel, Linux PM List
In-Reply-To: <CACVXFVMOxQEJ-nrvGQ-PkevqKkLnXu+QjHgeswkbzFjqOVYCdQ@mail.gmail.com>

On Fri, 7 Dec 2012, Ming Lei wrote:

> > I guess the question is whether the callee is allowed to call
> > device_move(), if not, we're good.
> 
> Not only the callee, and other contexts can change device->parent
> too. Looks rfcomm_tty_open() which calls device_move() is called
> in open() context, so the parent might be changed before unlock(dev->parent)
> in __driver_attach().
> 
> >
> >> Your concern on device_remove() might be correct. Also, I am wondering
> >> if we can walk the 'dpm_list' backwards for device shutdown, which should
> >> be simpler and more reasonable.
> >
> > How would that help?
> 
> device_pm_lock() can prevent device_move() from being running.

That wouldn't prevent problems during unbinding.  Wedson is right; the 
places that lock dev->parent must save a local copy of dev->parent.

Alan Stern


^ permalink raw reply

* Re: [PATCH 1/6 v9] arm: use devicetree to get smp_twd clock
From: Thiago Farina @ 2012-12-07 14:55 UTC (permalink / raw)
  To: Mark Langsdorf
  Cc: linux-kernel, cpufreq, linux-pm, linux-arm-kernel, Rob Herring
In-Reply-To: <1354833773-22845-2-git-send-email-mark.langsdorf@calxeda.com>

On Thu, Dec 6, 2012 at 8:42 PM, Mark Langsdorf
<mark.langsdorf@calxeda.com> wrote:
> From: Rob Herring <rob.herring@calxeda.com>
>
> Signed-off-by: Rob Herring <rob.herring@calxeda.com>
> Signed-off-by: Mark Langsdorf <mark.langsdorf@calxeda.com>
> ---
> Changes from v4, v5, v6, v7, v8
Why you keep sending these emails without real changes? Is this a
rebase process or what? I saw at least two other emails where you do
the same thing.

Thank you!

^ permalink raw reply

* Re: Race condition between driver_probe_device and device_shutdown
From: Ming Lei @ 2012-12-07 13:04 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alan Stern, Greg KH, Eric W. Biederman, Andrew Morton,
	linux-kernel, Linux PM List
In-Reply-To: <CANeycqqO06+u6Qy3zL=G_a=_sTYDOzUYf8wu7B7RGFJfuGNoRg@mail.gmail.com>

On Fri, Dec 7, 2012 at 8:16 PM, Wedson Almeida Filho <wedsonaf@gmail.com> wrote:
>> Because device_del() will put reference count of the parent, and the patch
>> only focuses on race between probe/release and shutdown.
>
> Right. device_del() puts the reference count of the parent -- is it
> guaranteed that device_del() won't ever reassign dev->parent though
> (e.g., to NULL)? I don't think it is, so I think that patch should
> also save the pointer to the parent and use it (instead of what
> happens to be in than dev->parent) to release the lock and put the
> ref.
>
>> As far as device_move() concerned, looks it might be a problem.
>> The problem even exits on driver attach vs. device open/release,
>> if device_move is called in open() and open() happens before driver
>> attach completes.
>
> Yeah, the pattern of locking the parent followed by the device occurs
> in a few places. It looks like they were added by Alan with commit
> bf74ad5bc41727d5f2f1c6bedb2c1fac394de731. (And as Greg mentioned,
> might be occurring often enough to merit being moved into a common
> function.)

In fact, there is no shutdown callback defined for usb driver, so holding
the parent lock in device_shutdown() might be removed.

>
> I guess the question is whether the callee is allowed to call
> device_move(), if not, we're good.

Not only the callee, and other contexts can change device->parent
too. Looks rfcomm_tty_open() which calls device_move() is called
in open() context, so the parent might be changed before unlock(dev->parent)
in __driver_attach().

>
>> Your concern on device_remove() might be correct. Also, I am wondering
>> if we can walk the 'dpm_list' backwards for device shutdown, which should
>> be simpler and more reasonable.
>
> How would that help?

device_pm_lock() can prevent device_move() from being running.

Thanks,
--
Ming Lei

^ permalink raw reply

* Re: Race condition between driver_probe_device and device_shutdown
From: Wedson Almeida Filho @ 2012-12-07 12:16 UTC (permalink / raw)
  To: Ming Lei
  Cc: Alan Stern, Greg KH, Eric W. Biederman, Andrew Morton,
	linux-kernel, Linux PM List
In-Reply-To: <CACVXFVPH=yw_Y3AcnTXUqVd5A0yGiw+hju8__-g7dymAe-Q8dA@mail.gmail.com>

> Because device_del() will put reference count of the parent, and the patch
> only focuses on race between probe/release and shutdown.

Right. device_del() puts the reference count of the parent -- is it
guaranteed that device_del() won't ever reassign dev->parent though
(e.g., to NULL)? I don't think it is, so I think that patch should
also save the pointer to the parent and use it (instead of what
happens to be in than dev->parent) to release the lock and put the
ref.

> As far as device_move() concerned, looks it might be a problem.
> The problem even exits on driver attach vs. device open/release,
> if device_move is called in open() and open() happens before driver
> attach completes.

Yeah, the pattern of locking the parent followed by the device occurs
in a few places. It looks like they were added by Alan with commit
bf74ad5bc41727d5f2f1c6bedb2c1fac394de731. (And as Greg mentioned,
might be occurring often enough to merit being moved into a common
function.)

I guess the question is whether the callee is allowed to call
device_move(), if not, we're good.

> Your concern on device_remove() might be correct. Also, I am wondering
> if we can walk the 'dpm_list' backwards for device shutdown, which should
> be simpler and more reasonable.

How would that help?

^ permalink raw reply

* [PATCH] drivers/thermal/spear_thermal.c: use devm_clk_get
From: Julia Lawall @ 2012-12-07 10:29 UTC (permalink / raw)
  To: Zhang Rui; +Cc: kernel-janitors, linux-pm, linux-kernel

From: Julia Lawall <Julia.Lawall@lip6.fr>

devm_clk_get allocates a resource that is released when a driver detaches.
This patch uses devm_clk_get for data that is allocated in the probe
function of a platform device and is only released in the remove function.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>

---
I was not able to compile this code.  At one point, devm_clk_get was not
supported for all architectures.  If that is still the case, and the code
doesn't compile, then just ignore the patch.

 drivers/thermal/spear_thermal.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/thermal/spear_thermal.c b/drivers/thermal/spear_thermal.c
index 6b2d8b2..3c5ee56 100644
--- a/drivers/thermal/spear_thermal.c
+++ b/drivers/thermal/spear_thermal.c
@@ -131,7 +131,7 @@ static int spear_thermal_probe(struct platform_device *pdev)
 		return -ENOMEM;
 	}
 
-	stdev->clk = clk_get(&pdev->dev, NULL);
+	stdev->clk = devm_clk_get(&pdev->dev, NULL);
 	if (IS_ERR(stdev->clk)) {
 		dev_err(&pdev->dev, "Can't get clock\n");
 		return PTR_ERR(stdev->clk);
@@ -140,7 +140,7 @@ static int spear_thermal_probe(struct platform_device *pdev)
 	ret = clk_enable(stdev->clk);
 	if (ret) {
 		dev_err(&pdev->dev, "Can't enable clock\n");
-		goto put_clk;
+		return ret;
 	}
 
 	stdev->flags = val;
@@ -163,8 +163,6 @@ static int spear_thermal_probe(struct platform_device *pdev)
 
 disable_clk:
 	clk_disable(stdev->clk);
-put_clk:
-	clk_put(stdev->clk);
 
 	return ret;
 }
@@ -183,7 +181,6 @@ static int spear_thermal_exit(struct platform_device *pdev)
 	writel_relaxed(actual_mask & ~stdev->flags, stdev->thermal_base);
 
 	clk_disable(stdev->clk);
-	clk_put(stdev->clk);
 
 	return 0;
 }

^ permalink raw reply related

* Re: [PATCH 6/6 v9] cpufreq, highbank: add support for highbank cpufreq
From: Mike Turquette @ 2012-12-07  7:04 UTC (permalink / raw)
  To: Mark Langsdorf
  Cc: linux-kernel, cpufreq, linux-pm@vger.kernel.org, linux-arm-kernel
In-Reply-To: <1354833773-22845-7-git-send-email-mark.langsdorf@calxeda.com>

On Thu, Dec 6, 2012 at 2:42 PM, Mark Langsdorf
<mark.langsdorf@calxeda.com> wrote:
> Highbank processors depend on the external ECME to perform voltage
> management based on a requested frequency. Communication between the
> A9 cores and the ECME happens over the pl320 IPC channel.
>
> Signed-off-by: Mark Langsdorf <mark.langsdorf@calxeda.com>
> Reviewed-by: Shawn Guo <shawn.guo@linaro.org>
> Cc: mturquette@linaro.org

Looks good to me.

Reviewed-by: Mike Turquette <mturquette@linaro.org>

Regards,
Mike

^ permalink raw reply

* Re: [PATCH v9 06/10] ata: zpodd: check zero power ready status
From: Aaron Lu @ 2012-12-07  6:13 UTC (permalink / raw)
  To: James Bottomley, Tejun Heo
  Cc: Rafael J. Wysocki, linux-pm, Jeff Garzik, Alan Stern, Jeff Wu,
	Aaron Lu, linux-ide, linux-scsi, linux-acpi
In-Reply-To: <1354623100.3206.34.camel@dabdike.int.hansenpartnership.com>

On 12/04/2012 08:11 PM, James Bottomley wrote:
> On Mon, 2012-12-03 at 08:23 -0800, Tejun Heo wrote:
>> Hello, James.
>>
>> On Mon, Dec 03, 2012 at 08:25:43AM +0000, James Bottomley wrote:
>>>> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
>>>> index e65c62e..1756151 100644
>>>> --- a/include/scsi/scsi_device.h
>>>> +++ b/include/scsi/scsi_device.h
>>>> @@ -160,6 +160,7 @@ struct scsi_device {
>>>>  	unsigned can_power_off:1; /* Device supports runtime power off */
>>>>  	unsigned wce_default_on:1;	/* Cache is ON by default */
>>>>  	unsigned no_dif:1;	/* T10 PI (DIF) should be disabled */
>>>> +	unsigned event_driven:1; /* No need to poll the device */
>>>>  
>>>>  	DECLARE_BITMAP(supported_events, SDEV_EVT_MAXBITS); /* supported events */
>>>>  	struct list_head event_list;	/* asserted events */
>>>
>>> Yes, but if we can get away with doing that, it should be in genhd
>>> because it's completely generic.
>>>
>>> I was imagining we'd have to fake the reply to the test unit ready or
>>> some other commands, which is why it would need to be in sr.c
>>>
>>> The check events code is Tejun's baby, if he's OK with it then just do
>>> it in genhd.c
>>
>> The problem here is there's no easy to reach genhd from libata (or the
>> other way around) without going through sr.  I think we're gonna have
>> to have something in sr one way or the other.
> 
> Can't we do that via an event? It's a bit clunky because we need the
> callback in the layer that sees the sdev, which is libata-scsi, we just
> need an analogue of ata_scsi_media_change_notify, but ignoring and
> allowing polling is essentially event driven as well, so it should all
> work.  We'll need a listener in genhd, which might be trickier.

Hi Tejun,

Do you have any comments on James' suggestion? I want to know your
opinion, thanks.

Hi James,

Do you mean we start a thread in genhd that listen to events, so that
some driver can send an event to genhd about if the device should be
polled? I'm thinking where to store such information. If store it in
struct disk_events, then we will need to know which gendisk is for
the device, but how? Maybe by loop scan all gendisk's driverfs_dev?
If store it in struct device, then we do not need to send the event but
just set a flag in sturct device in libata, and let genhd check this
flag when poll. So I don't quite understand this, can you please
elaborate? Thanks.

Thanks,
Aaron

> 
> This may also work as the more generic route for stuff where we can't
> connect the bottom to the top of the stack (which looks like a problem
> we'll begin wrestling with a lot now).
> 
> James
> 
> 


^ permalink raw reply

* Re: Race condition between driver_probe_device and device_shutdown
From: Ming Lei @ 2012-12-07  3:42 UTC (permalink / raw)
  To: Wedson Almeida Filho
  Cc: Alan Stern, Greg KH, Eric W. Biederman, Andrew Morton,
	linux-kernel, Linux PM List
In-Reply-To: <CANeycqq177AzPdYfBr-VU+ksh+Jwd9i4aLpZo6VwxcO+Xx6ETg@mail.gmail.com>

On Fri, Dec 7, 2012 at 8:09 AM, Wedson Almeida Filho <wedsonaf@gmail.com> wrote:
> I don't have the setup anymore, I'll give it a shot if I ever get back it back.
>
> I have a question though: why do we need to get a reference to the
> parent? If the assumption is that the callee can release its reference

Because device_del() will put reference count of the parent, and the patch
only focuses on race between probe/release and shutdown.

> to the parent, then one may also expect callees to reassign
> dev->parent (e.g., by calling device_move), so it wouldn't be safe to
> later call device_unlock on the potentially different dev->parent.

As far as device_move() concerned, looks it might be a problem.
The problem even exits on driver attach vs. device open/release,
if device_move is called in open() and open() happens before driver
attach completes.

> If we expect dev->parent to remain unchanged, then there's no need to
> get an extra reference to the parent as the device itself already
> holds one (and we hold an extra one on the device).

As I said above, device_del() will put reference count of parent, but I forget
why the parent's lock has to be held in this patch.

> What am I missing?

Your concern on device_remove() might be correct. Also, I am wondering
if we can walk the 'dpm_list' backwards for device shutdown, which should
be simpler and more reasonable.

Thanks,
--
Ming Lei

^ permalink raw reply

* Re: Race condition between driver_probe_device and device_shutdown
From: Wedson Almeida Filho @ 2012-12-07  0:09 UTC (permalink / raw)
  To: Ming Lei
  Cc: Alan Stern, Greg KH, Eric W. Biederman, Andrew Morton,
	linux-kernel, Linux PM List
In-Reply-To: <CACVXFVPbpbb-PAJAweoLj8uec=KgoUKvybSey+R=L=oY16Ozbw@mail.gmail.com>

I don't have the setup anymore, I'll give it a shot if I ever get back it back.

I have a question though: why do we need to get a reference to the
parent? If the assumption is that the callee can release its reference
to the parent, then one may also expect callees to reassign
dev->parent (e.g., by calling device_move), so it wouldn't be safe to
later call device_unlock on the potentially different dev->parent.

If we expect dev->parent to remain unchanged, then there's no need to
get an extra reference to the parent as the device itself already
holds one (and we hold an extra one on the device).

What am I missing?

Thanks

On Thu, Dec 6, 2012 at 2:52 AM, Ming Lei <ming.lei@canonical.com> wrote:
> On Thu, Dec 6, 2012 at 5:11 PM, Wedson Almeida Filho <wedsonaf@gmail.com> wrote:
>> [Sorry for taking so long to respond, after a week of silence I
>> assumed I wouldn't get any responses, plus I had moved on to other
>> things.]
>>
>> I happen to still have the logs, the relevant part is pasted at the end.
>>
>> Answering some of the questions: the driver is on the platform bus, in
>> fact, it's drivers/usb/host/ehci-tegra.c; after seeing the oops below,
>> I added printks when entering and exiting tegra_ehci_probe to try to
>> understand it better.
>>
>> Note that in the log we see some thread entering tegra_ehci_probe, but
>> nothing indicates that it has exited before we get the oops.
>
> The commit d1c6c030fcec6f860d9bb6c632a3ebe62e28440b(driver core:
> fix shutdown races with probe/remove(v3)) has been merged to v3.6, so
> device_shutdown will wait for completing of probe.
>
> Could you test kernel 3.6 or the latest kernel to see if the problem can be
> reproduced?
>
> Thanks,
> --
> Ming Lei

^ permalink raw reply

* [PATCH] PM: Move disabling/enabling runtime PM to late suspend/early resume
From: Rafael J. Wysocki @ 2012-12-06 22:59 UTC (permalink / raw)
  To: Linux PM list; +Cc: LKML, Jan-Matthias Braun, Jiri Kosina, Alan Stern

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Currently, the PM core disables runtime PM for all devices right
after executing subsystem/driver .suspend() callbacks for them
and re-enables it right before executing subsystem/driver .resume()
callbacks for them.  This may lead to problems when there are
two devices such that the .suspend() callback executed for one of
them depends on runtime PM working for the other.  In that case,
if runtime PM has already been disabled for the second device,
the first one's .suspend() won't work correctly (and analogously
for resume).

To make those issues go away, make the PM core disable runtime PM
for devices right before executing subsystem/driver .suspend_late()
callbacks for them and enable runtime PM for them right after
executing subsystem/driver .resume_early() callbacks for them.  This
way the potential conflitcs between .suspend_late()/.resume_early()
and their runtime PM counterparts are still prevented from happening,
but the subtle ordering issues related to disabling/enabling runtime
PM for devices during system suspend/resume are much easier to avoid.

Reported-and-tested-by: Jan-Matthias Braun <jan_braun@gmx.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 Documentation/power/runtime_pm.txt |    9 +++++----
 drivers/base/power/main.c          |    9 ++++-----
 2 files changed, 9 insertions(+), 9 deletions(-)

Index: linux/drivers/base/power/main.c
===================================================================
--- linux.orig/drivers/base/power/main.c
+++ linux/drivers/base/power/main.c
@@ -513,6 +513,8 @@ static int device_resume_early(struct de

  Out:
 	TRACE_RESUME(error);
+
+	pm_runtime_enable(dev);
 	return error;
 }

@@ -589,8 +591,6 @@ static int device_resume(struct device *
 	if (!dev->power.is_suspended)
 		goto Unlock;

-	pm_runtime_enable(dev);
-
 	if (dev->pm_domain) {
 		info = "power domain ";
 		callback = pm_op(&dev->pm_domain->ops, state);
@@ -930,6 +930,8 @@ static int device_suspend_late(struct de
 	pm_callback_t callback = NULL;
 	char *info = NULL;

+	__pm_runtime_disable(dev, false);
+
 	if (dev->power.syscore)
 		return 0;

@@ -1133,11 +1135,8 @@ static int __device_suspend(struct devic

  Complete:
 	complete_all(&dev->power.completion);
-
 	if (error)
 		async_error = error;
-	else if (dev->power.is_suspended)
-		__pm_runtime_disable(dev, false);

 	return error;
 }
Index: linux/Documentation/power/runtime_pm.txt
===================================================================
--- linux.orig/Documentation/power/runtime_pm.txt
+++ linux/Documentation/power/runtime_pm.txt
@@ -642,12 +642,13 @@ out the following operations:
   * During system suspend it calls pm_runtime_get_noresume() and
     pm_runtime_barrier() for every device right before executing the
     subsystem-level .suspend() callback for it.  In addition to that it calls
-    pm_runtime_disable() for every device right after executing the
-    subsystem-level .suspend() callback for it.
+    __pm_runtime_disable() with 'false' as the second argument for every device
+    right before executing the subsystem-level .suspend_late() callback for it.

   * During system resume it calls pm_runtime_enable() and pm_runtime_put_sync()
-    for every device right before and right after executing the subsystem-level
-    .resume() callback for it, respectively.
+    for every device right after executing the subsystem-level .resume_early()
+    callback and right after executing the subsystem-level .resume() callback
+    for it, respectively.

 7. Generic subsystem callbacks

^ permalink raw reply

* [PATCH 6/6 v9] cpufreq, highbank: add support for highbank cpufreq
From: Mark Langsdorf @ 2012-12-06 22:42 UTC (permalink / raw)
  To: linux-kernel, cpufreq, linux-pm, linux-arm-kernel
  Cc: Mark Langsdorf, mturquette
In-Reply-To: <1354833773-22845-1-git-send-email-mark.langsdorf@calxeda.com>

Highbank processors depend on the external ECME to perform voltage
management based on a requested frequency. Communication between the
A9 cores and the ECME happens over the pl320 IPC channel.

Signed-off-by: Mark Langsdorf <mark.langsdorf@calxeda.com>
Reviewed-by: Shawn Guo <shawn.guo@linaro.org>
Cc: mturquette@linaro.org
---
Changes from v8
	Added Shawn Guo's reviewed by.
	Removed some magic numbers.
	Changed failure returns in clk_notify from NOTIFY_STOP to NOTIFY_BAD.
Changes from v7
        Removed old attribution to cpufreq-cpu0.
        Added some description in the documentation.
        Made cpu_dev, cpu_clk into local variables.
        Removed __devinit.
        Removed some unneeded includes.
        Added a brace to clarify some nested if logic.
Changes from v6
        Removed devicetree bindings documentation.
        Restructured driver to use clk notifications.
        Core driver logic is now cpufreq-clk0.
Changes from v5
        Changed ipc_transmit() to pl320_ipc_transmit().
Changes from v4
        Removed erroneous changes to arch/arm/Kconfig.
        Removed unnecessary changes to drivers/cpufreq/Kconfig.arm
        Alphabetized additions to arch/arm/mach-highbank/Kconfig
        Changed ipc call and header to match new ipc location in 
        drivers/mailbox.
Changes from v3
        None.
Changes from v2
        Changed transition latency binding in code to match documentation.
Changes from v1
        Added highbank specific Kconfig changes.

 arch/arm/boot/dts/highbank.dts     |  10 ++++
 arch/arm/mach-highbank/Kconfig     |   2 +
 drivers/cpufreq/Kconfig.arm        |  16 ++++++
 drivers/cpufreq/Makefile           |   1 +
 drivers/cpufreq/highbank-cpufreq.c | 109 +++++++++++++++++++++++++++++++++++++
 5 files changed, 138 insertions(+)
 create mode 100644 drivers/cpufreq/highbank-cpufreq.c

diff --git a/arch/arm/boot/dts/highbank.dts b/arch/arm/boot/dts/highbank.dts
index 0c6fc34..7c4c27d 100644
--- a/arch/arm/boot/dts/highbank.dts
+++ b/arch/arm/boot/dts/highbank.dts
@@ -36,6 +36,16 @@
 			next-level-cache = <&L2>;
 			clocks = <&a9pll>;
 			clock-names = "cpu";
+			operating-points = <
+				/* kHz    ignored */
+				 1300000  1000000
+				 1200000  1000000
+				 1100000  1000000
+				  800000  1000000
+				  400000  1000000
+				  200000  1000000
+			>;
+			clock-latency = <100000>;
 		};
 
 		cpu@1 {
diff --git a/arch/arm/mach-highbank/Kconfig b/arch/arm/mach-highbank/Kconfig
index 2896881..b7862da 100644
--- a/arch/arm/mach-highbank/Kconfig
+++ b/arch/arm/mach-highbank/Kconfig
@@ -1,5 +1,7 @@
 config ARCH_HIGHBANK
 	bool "Calxeda ECX-1000 (Highbank)" if ARCH_MULTI_V7
+	select ARCH_HAS_CPUFREQ
+	select ARCH_HAS_OPP
 	select ARCH_WANT_OPTIONAL_GPIOLIB
 	select ARM_AMBA
 	select ARM_GIC
diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
index 5961e64..7aaac9f 100644
--- a/drivers/cpufreq/Kconfig.arm
+++ b/drivers/cpufreq/Kconfig.arm
@@ -76,3 +76,19 @@ config ARM_EXYNOS5250_CPUFREQ
 	help
 	  This adds the CPUFreq driver for Samsung EXYNOS5250
 	  SoC.
+
+config ARM_HIGHBANK_CPUFREQ
+	tristate "Calxeda Highbank-based"
+	depends on ARCH_HIGHBANK
+	select CPU_FREQ_TABLE
+	select GENERIC_CPUFREQ_CPU0
+	select PM_OPP
+	select REGULATOR
+
+	default m
+	help
+	  This adds the CPUFreq driver for Calxeda Highbank SoC
+	  based boards.
+
+	  If in doubt, say N.
+
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 1bc90e1..9e8f12a 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -50,6 +50,7 @@ obj-$(CONFIG_ARM_EXYNOS4210_CPUFREQ)	+= exynos4210-cpufreq.o
 obj-$(CONFIG_ARM_EXYNOS4X12_CPUFREQ)	+= exynos4x12-cpufreq.o
 obj-$(CONFIG_ARM_EXYNOS5250_CPUFREQ)	+= exynos5250-cpufreq.o
 obj-$(CONFIG_ARM_OMAP2PLUS_CPUFREQ)     += omap-cpufreq.o
+obj-$(CONFIG_ARM_HIGHBANK_CPUFREQ)	+= highbank-cpufreq.o
 
 ##################################################################################
 # PowerPC platform drivers
diff --git a/drivers/cpufreq/highbank-cpufreq.c b/drivers/cpufreq/highbank-cpufreq.c
new file mode 100644
index 0000000..863bb39
--- /dev/null
+++ b/drivers/cpufreq/highbank-cpufreq.c
@@ -0,0 +1,109 @@
+/*
+ * Copyright (C) 2012 Calxeda, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This driver provides the clk notifier callbacks that are used when
+ * the cpufreq-cpu0 driver changes to frequency to alert the highbank
+ * EnergyCore Management Engine (ECME) about the need to change
+ * voltage. The ECME interfaces with the actual voltage regulators.
+ */
+
+#define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/clk.h>
+#include <linux/cpu.h>
+#include <linux/err.h>
+#include <linux/of.h>
+#include <linux/mailbox.h>
+
+#define HB_CPUFREQ_CHANGE_NOTE	0x80000001
+#define HB_CPUFREQ_IPC_LEN	7
+#define HB_CPUFREQ_VOLT_RETRIES	15
+
+static int hb_voltage_change(unsigned int freq)
+{
+	int i;
+	u32 msg[HB_CPUFREQ_IPC_LEN];
+
+	msg[0] = HB_CPUFREQ_CHANGE_NOTE;
+	msg[1] = freq / 1000000;
+	for (i = 2; i < HB_CPUFREQ_IPC_LEN; i++)
+		msg[i] = 0;
+
+	return pl320_ipc_transmit(msg);
+}
+
+static int hb_cpufreq_clk_notify(struct notifier_block *nb, 
+				unsigned long action, void *hclk)
+{
+	struct clk_notifier_data *clk_data = hclk;
+	int i = 0;
+
+	if (action == PRE_RATE_CHANGE) {
+		if (clk_data->new_rate > clk_data->old_rate)
+			while (hb_voltage_change(clk_data->new_rate))
+				if (i++ > HB_CPUFREQ_VOLT_RETRIES)
+					return NOTIFY_BAD;
+	} else if (action == POST_RATE_CHANGE) {
+		if (clk_data->new_rate < clk_data->old_rate)
+			while (hb_voltage_change(clk_data->new_rate))
+				if (i++ > HB_CPUFREQ_VOLT_RETRIES)
+					return NOTIFY_BAD;
+	}
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block hb_cpufreq_clk_nb = {
+	.notifier_call = hb_cpufreq_clk_notify,
+};
+
+static int hb_cpufreq_driver_init(void)
+{
+	struct device *cpu_dev;
+	struct clk *cpu_clk;
+	struct device_node *np;
+	int ret;
+
+	np = of_find_node_by_path("/cpus/cpu@0");
+	if (!np) {
+		pr_err("failed to find highbank cpufreq node\n");
+		return -ENOENT;
+	}
+
+	cpu_dev = get_cpu_device(0);
+	if (!cpu_dev) {
+		pr_err("failed to get highbank cpufreq device\n");
+		ret = -ENODEV;
+		goto out_put_node;
+	}
+
+	cpu_dev->of_node = np;
+
+	cpu_clk = clk_get(cpu_dev, NULL);
+	if (IS_ERR(cpu_clk)) {
+		ret = PTR_ERR(cpu_clk);
+		pr_err("failed to get cpu0 clock: %d\n", ret);
+		goto out_put_node;
+	}
+
+	ret = clk_notifier_register(cpu_clk, &hb_cpufreq_clk_nb);
+	if (ret) {
+		pr_err("failed to register clk notifier: %d\n", ret);
+		goto out_put_node;
+	}
+
+out_put_node:
+	of_node_put(np);
+	return ret;
+}
+late_initcall(hb_cpufreq_driver_init);
+
+MODULE_AUTHOR("Mark Langsdorf <mark.langsdorf@calxeda.com>");
+MODULE_DESCRIPTION("Calxeda Highbank cpufreq driver");
+MODULE_LICENSE("GPL");
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH 5/6 v9] power: export opp cpufreq functions
From: Mark Langsdorf @ 2012-12-06 22:42 UTC (permalink / raw)
  To: linux-kernel, cpufreq, linux-pm, linux-arm-kernel; +Cc: Mark Langsdorf
In-Reply-To: <1354833773-22845-1-git-send-email-mark.langsdorf@calxeda.com>

These functions are needed to make the cpufreq-core0 and highbank-cpufreq
drivers loadable as modules.

Signed-off-by: Mark Langsdorf <mark.langsdorf@calxeda.com>
Acked-by: Nishanth Menon <nm@ti.com>
---
Changes from v4, v5, v6, v7, v8
        None.
Changes from v3
        includes linux/export.h instead of module.h.
Changes from v2
        None.
Changes from v1
        Added Nishanth Menon's ack.
        Clarified the purpose of the change in the commit message.

 drivers/base/power/opp.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c
index d946864..4062ec3 100644
--- a/drivers/base/power/opp.c
+++ b/drivers/base/power/opp.c
@@ -23,6 +23,7 @@
 #include <linux/rcupdate.h>
 #include <linux/opp.h>
 #include <linux/of.h>
+#include <linux/export.h>
 
 /*
  * Internal data structure organization with the OPP layer library is as
@@ -643,6 +644,7 @@ int opp_init_cpufreq_table(struct device *dev,
 
 	return 0;
 }
+EXPORT_SYMBOL(opp_init_cpufreq_table);
 
 /**
  * opp_free_cpufreq_table() - free the cpufreq table
@@ -660,6 +662,7 @@ void opp_free_cpufreq_table(struct device *dev,
 	kfree(*table);
 	*table = NULL;
 }
+EXPORT_SYMBOL(opp_free_cpufreq_table);
 #endif		/* CONFIG_CPU_FREQ */
 
 /**
@@ -720,4 +723,5 @@ int of_init_opp_table(struct device *dev)
 
 	return 0;
 }
+EXPORT_SYMBOL(of_init_opp_table);
 #endif
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH 2/6 v9] clk, highbank: Prevent glitches in non-bypass reset mode
From: Mark Langsdorf @ 2012-12-06 22:42 UTC (permalink / raw)
  To: linux-kernel, cpufreq, linux-pm, linux-arm-kernel
  Cc: Mark Langsdorf, Rob Herring
In-Reply-To: <1354833773-22845-1-git-send-email-mark.langsdorf@calxeda.com>

The highbank clock will glitch with the current code if the
clock rate is reset without relocking the PLL. Program the PLL
correctly to prevent glitches.

Signed-off-by: Mark Langsdorf <mark.langsdorf@calxeda.com>
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Mike Turquette <mturquette@linaro.org>
---
Changes from v6, v7, v8
        None.
Changes from v5
        Added Mike Turquette's ack.
Changes from v4
        None.
Changes from v3
        Changelog text and patch name now correspond to the actual patch.
        was clk, highbank: remove non-bypass reset mode.
Changes from v2
        None.
Changes from v1
        Removed erroneous reformating.

 drivers/clk/clk-highbank.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/clk/clk-highbank.c b/drivers/clk/clk-highbank.c
index 52fecad..3a0b723 100644
--- a/drivers/clk/clk-highbank.c
+++ b/drivers/clk/clk-highbank.c
@@ -182,8 +182,10 @@ static int clk_pll_set_rate(struct clk_hw *hwclk, unsigned long rate,
 		reg |= HB_PLL_EXT_ENA;
 		reg &= ~HB_PLL_EXT_BYPASS;
 	} else {
+		writel(reg | HB_PLL_EXT_BYPASS, hbclk->reg);
 		reg &= ~HB_PLL_DIVQ_MASK;
 		reg |= divq << HB_PLL_DIVQ_SHIFT;
+		writel(reg | HB_PLL_EXT_BYPASS, hbclk->reg);
 	}
 	writel(reg, hbclk->reg);
 
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH 1/6 v9] arm: use devicetree to get smp_twd clock
From: Mark Langsdorf @ 2012-12-06 22:42 UTC (permalink / raw)
  To: linux-kernel, cpufreq, linux-pm, linux-arm-kernel
  Cc: Mark Langsdorf, Rob Herring
In-Reply-To: <1354833773-22845-1-git-send-email-mark.langsdorf@calxeda.com>

From: Rob Herring <rob.herring@calxeda.com>

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Mark Langsdorf <mark.langsdorf@calxeda.com>
---
Changes from v4, v5, v6, v7, v8
        None.
Changes from v3
        No longer setting *clk to NULL in twd_get_clock().
Changes from v2
        Turned the check for the node pointer into an if-then-else statement.
        Removed the second, redundant clk_get_rate.
Changes from v1
        None.

 arch/arm/kernel/smp_twd.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/arm/kernel/smp_twd.c b/arch/arm/kernel/smp_twd.c
index b22d700..af46b80 100644
--- a/arch/arm/kernel/smp_twd.c
+++ b/arch/arm/kernel/smp_twd.c
@@ -237,12 +237,15 @@ static irqreturn_t twd_handler(int irq, void *dev_id)
 	return IRQ_NONE;
 }
 
-static struct clk *twd_get_clock(void)
+static struct clk *twd_get_clock(struct device_node *np)
 {
 	struct clk *clk;
 	int err;
 
-	clk = clk_get_sys("smp_twd", NULL);
+	if (np)
+		clk = of_clk_get(np, 0);
+	else
+		clk = clk_get_sys("smp_twd", NULL);
 	if (IS_ERR(clk)) {
 		pr_err("smp_twd: clock not found: %d\n", (int)PTR_ERR(clk));
 		return clk;
@@ -263,6 +266,7 @@ static struct clk *twd_get_clock(void)
 		return ERR_PTR(err);
 	}
 
+	twd_timer_rate = clk_get_rate(clk);
 	return clk;
 }
 
@@ -273,12 +277,7 @@ static int __cpuinit twd_timer_setup(struct clock_event_device *clk)
 {
 	struct clock_event_device **this_cpu_clk;
 
-	if (!twd_clk)
-		twd_clk = twd_get_clock();
-
-	if (!IS_ERR_OR_NULL(twd_clk))
-		twd_timer_rate = clk_get_rate(twd_clk);
-	else
+	if (IS_ERR_OR_NULL(twd_clk))
 		twd_calibrate_rate();
 
 	__raw_writel(0, twd_base + TWD_TIMER_CONTROL);
@@ -349,6 +348,8 @@ int __init twd_local_timer_register(struct twd_local_timer *tlt)
 	if (!twd_base)
 		return -ENOMEM;
 
+	twd_clk = twd_get_clock(NULL);
+
 	return twd_local_timer_common_register();
 }
 
@@ -383,6 +384,8 @@ void __init twd_local_timer_of_register(void)
 		goto out;
 	}
 
+	twd_clk = twd_get_clock(np);
+
 	err = twd_local_timer_common_register();
 
 out:
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH 4/6 v9] arm highbank: add support for pl320 IPC
From: Mark Langsdorf @ 2012-12-06 22:42 UTC (permalink / raw)
  To: linux-kernel, cpufreq, linux-pm, linux-arm-kernel
  Cc: Mark Langsdorf, Rob Herring, Omar Ramirez Luna, Arnd Bergmann
In-Reply-To: <1354833773-22845-1-git-send-email-mark.langsdorf@calxeda.com>

From: Rob Herring <rob.herring@calxeda.com>

The pl320 IPC allows for interprocessor communication between the highbank A9
and the EnergyCore Management Engine. The pl320 implements a straightforward
mailbox protocol.

This patch depends on Omar Ramirez Luna's <omar.luna@linaro.org>
mailbox driver patch series.

Signed-off-by: Mark Langsdorf <mark.langsdorf@calxeda.com>
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Cc: Omar Ramirez Luna <omar.luna@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
---
Changes from v6, v7, v8
        None.
Changes from v5
        Renamed ipc_transmit() to pl320_ipc_transmit().
        Properly exported pl320_ipc_{un}register_notifier().
Changes from v4
        Moved pl320-ipc.c from arch/arm/mach-highbank to drivers/mailbox.
        Moved header information to include/linux/mailbox.h.
        Added Kconfig options to reflect the new code location.
        Change drivers/mailbox/Makefile to build the omap mailboxes only 
        when they are configured.
        Removed ipc_call_fast and renamed ipc_call_slow ipc_transmit.
Changes from v3, v2
        None.
Changes from v1
        Removed erroneous changes for cpufreq Kconfig.

 arch/arm/mach-highbank/Kconfig |   2 +
 drivers/mailbox/Kconfig        |   9 ++
 drivers/mailbox/Makefile       |   4 +
 drivers/mailbox/pl320-ipc.c    | 199 +++++++++++++++++++++++++++++++++++++++++
 include/linux/mailbox.h        |  19 +++-
 5 files changed, 232 insertions(+), 1 deletion(-)
 create mode 100644 drivers/mailbox/Makefile
 create mode 100644 drivers/mailbox/pl320-ipc.c

diff --git a/arch/arm/mach-highbank/Kconfig b/arch/arm/mach-highbank/Kconfig
index 0e1d0a4..2896881 100644
--- a/arch/arm/mach-highbank/Kconfig
+++ b/arch/arm/mach-highbank/Kconfig
@@ -11,5 +11,7 @@ config ARCH_HIGHBANK
 	select GENERIC_CLOCKEVENTS
 	select HAVE_ARM_SCU
 	select HAVE_SMP
+	select MAILBOX
+	select PL320_MBOX
 	select SPARSE_IRQ
 	select USE_OF
diff --git a/drivers/mailbox/Kconfig b/drivers/mailbox/Kconfig
index be8cac0..e89fdb4 100644
--- a/drivers/mailbox/Kconfig
+++ b/drivers/mailbox/Kconfig
@@ -34,4 +34,13 @@ config OMAP_MBOX_KFIFO_SIZE
 	  This can also be changed at runtime (via the mbox_kfifo_size
 	  module parameter).
 
+config PL320_MBOX
+	bool "ARM PL320 Mailbox"
+	help
+	  An implementation of the ARM PL320 Interprocessor Communication
+	  Mailbox (IPCM), tailored for the Calxeda Highbank. It is used to
+	  send short messages between Highbank's A9 cores and the EnergyCore
+	  Management Engine, primarily for cpufreq. Say Y here if you want
+	  to use the PL320 IPCM support.
+
 endif
diff --git a/drivers/mailbox/Makefile b/drivers/mailbox/Makefile
new file mode 100644
index 0000000..c9f14c3
--- /dev/null
+++ b/drivers/mailbox/Makefile
@@ -0,0 +1,4 @@
+obj-$(CONFIG_OMAP1_MBOX)	+= mailbox.o mailbox-omap1.o
+obj-$(CONFIG_OMAP2PLUS_MBOX)	+= mailbox.o mailbox-omap2.o
+obj-$(CONFIG_PL320_MBOX)	+= pl320-ipc.o
+
diff --git a/drivers/mailbox/pl320-ipc.c b/drivers/mailbox/pl320-ipc.c
new file mode 100644
index 0000000..1a9d8e4
--- /dev/null
+++ b/drivers/mailbox/pl320-ipc.c
@@ -0,0 +1,199 @@
+/*
+ * Copyright 2012 Calxeda, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#include <linux/types.h>
+#include <linux/err.h>
+#include <linux/delay.h>
+#include <linux/export.h>
+#include <linux/io.h>
+#include <linux/interrupt.h>
+#include <linux/completion.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/spinlock.h>
+#include <linux/device.h>
+#include <linux/amba/bus.h>
+
+#include <linux/mailbox.h>
+
+#define IPCMxSOURCE(m)		((m) * 0x40)
+#define IPCMxDSET(m)		(((m) * 0x40) + 0x004)
+#define IPCMxDCLEAR(m)		(((m) * 0x40) + 0x008)
+#define IPCMxDSTATUS(m)		(((m) * 0x40) + 0x00C)
+#define IPCMxMODE(m)		(((m) * 0x40) + 0x010)
+#define IPCMxMSET(m)		(((m) * 0x40) + 0x014)
+#define IPCMxMCLEAR(m)		(((m) * 0x40) + 0x018)
+#define IPCMxMSTATUS(m)		(((m) * 0x40) + 0x01C)
+#define IPCMxSEND(m)		(((m) * 0x40) + 0x020)
+#define IPCMxDR(m, dr)		(((m) * 0x40) + ((dr) * 4) + 0x024)
+
+#define IPCMMIS(irq)		(((irq) * 8) + 0x800)
+#define IPCMRIS(irq)		(((irq) * 8) + 0x804)
+
+#define MBOX_MASK(n)		(1 << (n))
+#define IPC_TX_MBOX		1
+#define IPC_RX_MBOX		2
+
+#define CHAN_MASK(n)		(1 << (n))
+#define A9_SOURCE		1
+#define M3_SOURCE		0
+
+static void __iomem *ipc_base;
+static int ipc_irq;
+static DEFINE_MUTEX(ipc_m1_lock);
+static DECLARE_COMPLETION(ipc_completion);
+static ATOMIC_NOTIFIER_HEAD(ipc_notifier);
+
+static inline void set_destination(int source, int mbox)
+{
+	__raw_writel(CHAN_MASK(source), ipc_base + IPCMxDSET(mbox));
+	__raw_writel(CHAN_MASK(source), ipc_base + IPCMxMSET(mbox));
+}
+
+static inline void clear_destination(int source, int mbox)
+{
+	__raw_writel(CHAN_MASK(source), ipc_base + IPCMxDCLEAR(mbox));
+	__raw_writel(CHAN_MASK(source), ipc_base + IPCMxMCLEAR(mbox));
+}
+
+static void __ipc_send(int mbox, u32 *data)
+{
+	int i;
+	for (i = 0; i < 7; i++)
+		__raw_writel(data[i], ipc_base + IPCMxDR(mbox, i));
+	__raw_writel(0x1, ipc_base + IPCMxSEND(mbox));
+}
+
+static u32 __ipc_rcv(int mbox, u32 *data)
+{
+	int i;
+	for (i = 0; i < 7; i++)
+		data[i] = __raw_readl(ipc_base + IPCMxDR(mbox, i));
+	return data[1];
+}
+
+/* blocking implmentation from the A9 side, not usuable in interrupts! */
+int pl320_ipc_transmit(u32 *data)
+{
+	int ret;
+
+	mutex_lock(&ipc_m1_lock);
+
+	init_completion(&ipc_completion);
+	__ipc_send(IPC_TX_MBOX, data);
+	ret = wait_for_completion_timeout(&ipc_completion,
+					  msecs_to_jiffies(1000));
+	if (ret == 0) {
+		ret = -ETIMEDOUT;
+		goto out;
+	}
+
+	ret = __ipc_rcv(IPC_TX_MBOX, data);
+out:
+	mutex_unlock(&ipc_m1_lock);
+	return ret;
+}
+EXPORT_SYMBOL(pl320_ipc_transmit);
+
+irqreturn_t ipc_handler(int irq, void *dev)
+{
+	u32 irq_stat;
+	u32 data[7];
+
+	irq_stat = __raw_readl(ipc_base + IPCMMIS(1));
+	if (irq_stat & MBOX_MASK(IPC_TX_MBOX)) {
+		__raw_writel(0, ipc_base + IPCMxSEND(IPC_TX_MBOX));
+		complete(&ipc_completion);
+	}
+	if (irq_stat & MBOX_MASK(IPC_RX_MBOX)) {
+		__ipc_rcv(IPC_RX_MBOX, data);
+		atomic_notifier_call_chain(&ipc_notifier, data[0], data + 1);
+		__raw_writel(2, ipc_base + IPCMxSEND(IPC_RX_MBOX));
+	}
+
+	return IRQ_HANDLED;
+}
+
+int pl320_ipc_register_notifier(struct notifier_block *nb)
+{
+	return atomic_notifier_chain_register(&ipc_notifier, nb);
+}
+EXPORT_SYMBOL(pl320_ipc_register_notifier);
+
+int pl320_ipc_unregister_notifier(struct notifier_block *nb)
+{
+	return atomic_notifier_chain_unregister(&ipc_notifier, nb);
+}
+EXPORT_SYMBOL(pl320_ipc_unregister_notifier);
+
+static int __devinit pl320_probe(struct amba_device *adev,
+				const struct amba_id *id)
+{
+	int ret;
+
+	ipc_base = ioremap(adev->res.start, resource_size(&adev->res));
+	if (ipc_base == NULL)
+		return -ENOMEM;
+
+	__raw_writel(0, ipc_base + IPCMxSEND(IPC_TX_MBOX));
+
+	ipc_irq = adev->irq[0];
+	ret = request_irq(ipc_irq, ipc_handler, 0, dev_name(&adev->dev), NULL);
+	if (ret < 0)
+		goto err;
+
+	/* Init slow mailbox */
+	__raw_writel(CHAN_MASK(A9_SOURCE),
+			ipc_base + IPCMxSOURCE(IPC_TX_MBOX));
+	__raw_writel(CHAN_MASK(M3_SOURCE),
+			ipc_base + IPCMxDSET(IPC_TX_MBOX));
+	__raw_writel(CHAN_MASK(M3_SOURCE) | CHAN_MASK(A9_SOURCE),
+		     ipc_base + IPCMxMSET(IPC_TX_MBOX));
+
+	/* Init receive mailbox */
+	__raw_writel(CHAN_MASK(M3_SOURCE),
+			ipc_base + IPCMxSOURCE(IPC_RX_MBOX));
+	__raw_writel(CHAN_MASK(A9_SOURCE),
+			ipc_base + IPCMxDSET(IPC_RX_MBOX));
+	__raw_writel(CHAN_MASK(M3_SOURCE) | CHAN_MASK(A9_SOURCE),
+		     ipc_base + IPCMxMSET(IPC_RX_MBOX));
+
+	return 0;
+err:
+	iounmap(ipc_base);
+	return ret;
+}
+
+static struct amba_id pl320_ids[] = {
+	{
+		.id	= 0x00041320,
+		.mask	= 0x000fffff,
+	},
+	{ 0, 0 },
+};
+
+static struct amba_driver pl320_driver = {
+	.drv = {
+		.name	= "pl320",
+	},
+	.id_table	= pl320_ids,
+	.probe		= pl320_probe,
+};
+
+static int __init ipc_init(void)
+{
+	return amba_driver_register(&pl320_driver);
+}
+module_init(ipc_init);
diff --git a/include/linux/mailbox.h b/include/linux/mailbox.h
index e8e4131..e7829e5 100644
--- a/include/linux/mailbox.h
+++ b/include/linux/mailbox.h
@@ -1,4 +1,16 @@
-/* mailbox.h */
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
 
 typedef u32 mbox_msg_t;
 struct omap_mbox;
@@ -20,3 +32,8 @@ void omap_mbox_save_ctx(struct omap_mbox *mbox);
 void omap_mbox_restore_ctx(struct omap_mbox *mbox);
 void omap_mbox_enable_irq(struct omap_mbox *mbox, omap_mbox_irq_t irq);
 void omap_mbox_disable_irq(struct omap_mbox *mbox, omap_mbox_irq_t irq);
+
+int pl320_ipc_transmit(u32 *data);
+int pl320_ipc_register_notifier(struct notifier_block *nb);
+int pl320_ipc_unregister_notifier(struct notifier_block *nb);
+
-- 
1.7.11.7


^ permalink raw reply related

* [PATCH 3/6 v9] cpufreq: tolerate inexact values when collecting stats
From: Mark Langsdorf @ 2012-12-06 22:42 UTC (permalink / raw)
  To: linux-kernel, cpufreq, linux-pm, linux-arm-kernel; +Cc: Mark Langsdorf
In-Reply-To: <1354833773-22845-1-git-send-email-mark.langsdorf@calxeda.com>

This patch is withdrawn due to a need for severe rework.

Changes from v4
        Withdrawn.      
Changes from v3, v2
        None.
Changes from v1
        Implemented a simple round-up algorithm instead of the over/under
method that could cause errors on Intel processors with boost mode.


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox