Linux Power Management development

Linux Power Management development
 help / color / mirror / Atom feed

* Re: [RFC] cpuidle - remove the power_specified field in the driver
From: Rafael J. Wysocki @ 2012-12-10 22:41 UTC (permalink / raw)
  To: Julius Werner
  Cc: Daniel Lezcano, Francesco Lavra, linux-pm, Kevin Hilman,
	Deepthi Dharwar, Trinabh Gupta, Lists Linaro-dev, len.brown,
	linux-kernel, Andrew Morton, Sameer Nanda, Len Brown
In-Reply-To: <CAODwPW96ZTXUZjz4B_GET_6UAd4oTednRmrFLe-Jt=EF9n4O1g@mail.gmail.com>

On Monday, December 10, 2012 11:09:58 AM Julius Werner wrote:
> Hi,
> 
> What is the current status of this? Daniel, do you think you have got
> enough feedback to submit a definitive patch for this? Rafael, would
> you approve of such a change?

I need to talk to Len about that before I give you a reliable answer.

Thanks,
Rafael


> The bug with dynamically added C-states that is tied to this still
> hurts the battery life for some users across all distros every day, so
> I think it would be valuable to get a consistent solution into the
> mainline soon before everyone has to roll their own.
> 
> On 11/12/2012 09:26 PM, Daniel Lezcano wrote:
> > This patch follows the discussion about reinitializing the power usage
> > when a C-state is added/removed.
> >
> >  https://lkml.org/lkml/2012/10/16/518
> >
> > We realized the power usage field is never filled and when it is
> > filled for tegra, the power_specified flag is not set making all these
> > values to be resetted when the driver is initialized with the set_power_state
> > function.
> >
> > Julius and I feel this is over-engineered and the power_specified
> > flag could be simply removed and continue assuming the states are
> > backward sorted.
> >
> > The menu governor select function is simplified as the power is ordered.
> > Actually the condition is always true with the current code.
> >
> > The cpuidle_play_dead function is also simplified by doing a reverse lookup
> > on the array.
> >
> > The set_power_states function is removed as it does no make sense anymore.
> >
> > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> > ---
> >  drivers/cpuidle/cpuidle.c        |   17 ++++-------------
> >  drivers/cpuidle/driver.c         |   25 -------------------------
> >  drivers/cpuidle/governors/menu.c |    8 ++------
> >  include/linux/cpuidle.h          |    2 +-
> >  4 files changed, 7 insertions(+), 45 deletions(-)
> >
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [RFC] cpuidle - remove the power_specified field in the driver
From: Julius Werner @ 2012-12-10 19:09 UTC (permalink / raw)
  To: Daniel Lezcano, Rafael J. Wysocki
  Cc: Francesco Lavra, linux-pm, Kevin Hilman, Deepthi Dharwar,
	Trinabh Gupta, Lists Linaro-dev, len.brown, linux-kernel,
	Andrew Morton, Sameer Nanda
In-Reply-To: <50A8A7BB.2070809@linaro.org>

Hi,

What is the current status of this? Daniel, do you think you have got
enough feedback to submit a definitive patch for this? Rafael, would
you approve of such a change?

The bug with dynamically added C-states that is tied to this still
hurts the battery life for some users across all distros every day, so
I think it would be valuable to get a consistent solution into the
mainline soon before everyone has to roll their own.

On 11/12/2012 09:26 PM, Daniel Lezcano wrote:
> This patch follows the discussion about reinitializing the power usage
> when a C-state is added/removed.
>
>  https://lkml.org/lkml/2012/10/16/518
>
> We realized the power usage field is never filled and when it is
> filled for tegra, the power_specified flag is not set making all these
> values to be resetted when the driver is initialized with the set_power_state
> function.
>
> Julius and I feel this is over-engineered and the power_specified
> flag could be simply removed and continue assuming the states are
> backward sorted.
>
> The menu governor select function is simplified as the power is ordered.
> Actually the condition is always true with the current code.
>
> The cpuidle_play_dead function is also simplified by doing a reverse lookup
> on the array.
>
> The set_power_states function is removed as it does no make sense anymore.
>
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> ---
>  drivers/cpuidle/cpuidle.c        |   17 ++++-------------
>  drivers/cpuidle/driver.c         |   25 -------------------------
>  drivers/cpuidle/governors/menu.c |    8 ++------
>  include/linux/cpuidle.h          |    2 +-
>  4 files changed, 7 insertions(+), 45 deletions(-)
>

^ permalink raw reply

* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline
From: Steven Rostedt @ 2012-12-10 19:07 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Srivatsa S. Bhat, tj, tglx, peterz, paulmck, rusty, mingo, akpm,
	namhyung, vincent.guittot, sbw, amit.kucheria, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121210182115.GA30892@redhat.com>

On Mon, 2012-12-10 at 19:21 +0100, Oleg Nesterov wrote:
> On 12/07, Oleg Nesterov wrote:
> >
> > On 12/06, Steven Rostedt wrote:
> > >
> > > You know reader locks can deadlock with each other, right? And this
> > > isn't caught be lockdep yet. This is because rwlocks have been made to
> > > be fair with writers. Before writers could be starved if a CPU always
> > > let a reader in. Now if a writer is waiting, a reader will block behind
> > > the writer. But this has introduced new issues with the kernel as
> > > follows:
> > >
> > >
> > >    CPU0			   CPU1	 	   CPU2		   CPU3
> > >    ----			   ----		   ----		   ----
> > > read_lock(A);
> > > 			read_lock(B)
> > > 					write_lock(A) <- block
> > > 							write_lock(B) <- block
> > > read_lock(B) <-block
> > >
> > > 			read_lock(A) <- block
> > >
> > > DEADLOCK!
> >
> > Really??? Oh I didn't know...
> >
> > Yes this was always true for rwsem, but rwlock_t?
> 
> Sorry, please ignore my email. I misread your email.
> 

No prob, looking at what I wrote, I should have explicitly stated two
different rwlocks. The only hint that I gave about two locks was (A) and
(B). Even what I started with: "reader locks can deadlock with each
other" is a bit ambiguous. So I can easily see the confusion.

-- Steve



^ permalink raw reply

* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline
From: Oleg Nesterov @ 2012-12-10 18:21 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Srivatsa S. Bhat, tj, tglx, peterz, paulmck, rusty, mingo, akpm,
	namhyung, vincent.guittot, sbw, amit.kucheria, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121207200014.GB13238@redhat.com>

On 12/07, Oleg Nesterov wrote:
>
> On 12/06, Steven Rostedt wrote:
> >
> > You know reader locks can deadlock with each other, right? And this
> > isn't caught be lockdep yet. This is because rwlocks have been made to
> > be fair with writers. Before writers could be starved if a CPU always
> > let a reader in. Now if a writer is waiting, a reader will block behind
> > the writer. But this has introduced new issues with the kernel as
> > follows:
> >
> >
> >    CPU0			   CPU1	 	   CPU2		   CPU3
> >    ----			   ----		   ----		   ----
> > read_lock(A);
> > 			read_lock(B)
> > 					write_lock(A) <- block
> > 							write_lock(B) <- block
> > read_lock(B) <-block
> >
> > 			read_lock(A) <- block
> >
> > DEADLOCK!
>
> Really??? Oh I didn't know...
>
> Yes this was always true for rwsem, but rwlock_t?

Sorry, please ignore my email. I misread your email.

Oleg.


^ permalink raw reply

* Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Oleg Nesterov @ 2012-12-10 18:15 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: rostedt, tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <50C570F9.2020801@linux.vnet.ibm.com>

On 12/10, Srivatsa S. Bhat wrote:
>
> On 12/10/2012 02:27 AM, Oleg Nesterov wrote:
> > On 12/07, Srivatsa S. Bhat wrote:
> >>
> >> 4. No deadlock possibilities
> >>
> >>    Per-cpu locking is not the way to go if we want to have relaxed rules
> >>    for lock-ordering. Because, we can end up in circular-locking dependencies
> >>    as explained in https://lkml.org/lkml/2012/12/6/290
> >
> > OK, but this assumes that, contrary to what Steven said, read-write-read
> > deadlock is not possible when it comes to rwlock_t.
>
> What I meant is, with a single (global) rwlock, you can't deadlock like that.

Ah. I greatly misunderstood Steven's email,

http://marc.info/?l=linux-pm&m=135482212307876

Somehow I didn't notice he described the deadlock with _two_ rwlock's, I
wrongly thought that his point is that read_lock() is not recursive (like
down_read).

> Let me know if my assumptions are incorrect!

No, sorry, I misunderstood Steven.


> > However. If this is true, then compared to preempt_disable/stop_machine
> > livelock is possible. Probably this is fine, we have the same problem with
> > get_online_cpus(). But if we can accept this fact I feel we can simmplify
> > this somehow... Can't prove, only feel ;)
>
> Not sure I follow..

I meant that write_lock_irqsave(&hotplug_rwlock) in take_cpu_down()
can spin "forever".

Suppose that reader_acked() == T on every CPU, so that
get_online_cpus_atomic() always takes read_lock(&hotplug_rwlock).

It is possible that this lock will be never released by readers,

	CPU_0				CPU_1

	get_online_cpus_atomic()
					get_online_cpus_atomic()
	put_online_cpus_atomic()

	get_online_cpus_atomic()
					put_online_cpus_atomic()

					get_online_cpus_atomic()
	put_online_cpus_atomic()

and so on.


> Reader-side:
>    -> read_lock() your per-cpu rwlock and proceed.
>
> Writer-side:
>    -> for_each_online_cpu(cpu)
>           write_lock(per-cpu rwlock of 'cpu');

Yes, yes, this is clear.

> Also, like Tejun said, one of the important measures for per-cpu rwlocks
> should be that, if a user replaces global rwlocks with percpu rwlocks (for
> performance reasons), he shouldn't suddenly end up in numerous deadlock
> possibilities which never existed before. The replacement should continue to
> remain safe, and perhaps improve the performance.

Sure, I agree.

Oleg.

^ permalink raw reply

* Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Oleg Nesterov @ 2012-12-10 17:28 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <50C56CC0.4000200@linux.vnet.ibm.com>

On 12/10, Srivatsa S. Bhat wrote:
>
> On 12/10/2012 02:43 AM, Oleg Nesterov wrote:
> > Damn, sorry for noise. I missed this part...
> >
> > On 12/10, Srivatsa S. Bhat wrote:
> >>
> >> On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
> >>> the latency. And I guess something like kick_all_cpus_sync() is "too heavy".
> >>
> >> I hadn't considered that. Thinking of it, I don't think it would help us..
> >> It won't get rid of the currently running preempt_disable() sections no?
> >
> > Sure. But (again, this is only my feeling so far) given that get_online_cpus_atomic()
> > does cli/sti,
>
> Ah, that one! Actually, the only reason I do that cli/sti is because, potentially
> interrupt handlers can be hotplug readers too. So we need to protect the portion
> of the code of get_online_cpus_atomic() which is not re-entrant.

Yes, I understand.

> > this can help to implement ensure-the-readers-must-see-the-pending-writer.
> > IOW this might help to implement sync-with-readers.
> >
>
> 2 problems:
>
> 1. It won't help with cases like this:
>
>    preempt_disable()
> 	...
> 	preempt_disable()
> 		...
> 			<------- Here
> 		...
> 	preempt_enable()
> 	...
>    preempt_enable()

No, I meant that kick_all_cpus_sync() can be used to synchronize with
cli/sti in get_online_cpus_atomic(), just like synchronize_sched() does
in the code I posted a minute ago.

> 2. Part of the reason we want to get rid of stop_machine() is to avoid the
> latency it induces on _all_ CPUs just to take *one* CPU offline. If we use
> kick_all_cpus_sync(), we get into that territory again : we unfairly interrupt
> every CPU, _even when_ that CPU's existing preempt_disabled() sections might
> not actually be hotplug readers! (ie., not bothered about CPU Hotplug).

I agree, that is why I said it is "too heavy".

Oleg.


^ permalink raw reply

* Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Oleg Nesterov @ 2012-12-10 17:24 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <50C564ED.9090803@linux.vnet.ibm.com>

On 12/10, Srivatsa S. Bhat wrote:
>
> On 12/10/2012 01:52 AM, Oleg Nesterov wrote:
> > On 12/10, Srivatsa S. Bhat wrote:
> >>
> >> On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
> >>
> >>> But yes, it is easy to blame somebody else's code ;) And I can't suggest
> >>> something better at least right now. If I understand correctly, we can not
> >>> use, say, synchronize_sched() in _cpu_down() path
> >>
> >> We can't sleep in that code.. so that's a no-go.
> >
> > But we can?
> >
> > Note that I meant _cpu_down(), not get_online_cpus_atomic() or take_cpu_down().
> >
>
> Maybe I'm missing something, but how would it help if we did a
> synchronize_sched() so early (in _cpu_down())? Another bunch of preempt_disable()
> sections could start immediately after our call to synchronize_sched() no?
> How would we deal with that?

Sorry for confusion. Of course synchronize_sched() alone is not enough.
But we can use it to synchronize with preempt-disabled section and avoid
the barriers/atomic in the fast-path.

For example,

	bool writer_pending;
	DEFINE_RWLOCK(writer_rwlock);
	DEFINE_PER_CPU(int, reader_ctr);

	void get_online_cpus_atomic(void)
	{
		preempt_disable();
	
		if (likely(!writer_pending) || __this_cpu_read(reader_ctr)) {
			__this_cpu_inc(reader_ctr);
			return;
		}

		read_lock(&writer_rwlock);
		__this_cpu_inc(reader_ctr);
		read_unlock(&writer_rwlock);
	}

	// lacks release semantics, but we don't care
	void put_online_cpus_atomic(void)
	{
		__this_cpu_dec(reader_ctr);
		preempt_enable();
	}

Now, _cpu_down() does

	writer_pending = true;
	synchronize_sched();

before stop_one_cpu(). When synchronize_sched() returns, we know that
every get_online_cpus_atomic() must see writer_pending == T. And, if
any CPU incremented its reader_ctr we must see it is not zero.

take_cpu_down() does

	write_lock(&writer_rwlock);

	for_each_online_cpu(cpu) {
		while (per_cpu(reader_ctr, cpu))
			cpu_relax();
	}

and takes the lock.

However. This can lead to the deadlock we already discussed. So
take_cpu_down() should do

 retry:
	write_lock(&writer_rwlock);

	for_each_online_cpu(cpu) {
		if (per_cpu(reader_ctr, cpu)) {
			write_unlock(&writer_rwlock);
			goto retry;
		}
	}

to take the lock. But this is livelockable. However, I do not think it
is possible to avoid the livelock.

Just in case, the code above is only for illustration, perhaps it is not
100% correct and perhaps we can do it better. cpu_hotplug.active_writer
is ignored for simplicity, get/put should check current == active_writer.

Oleg.

^ permalink raw reply

* [RESEND PATCH 3/5] cpufreq: dbx500: Move clk_get to probe
From: Ulf Hansson @ 2012-12-10 15:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, cpufreq, linux-pm, Mike Turquette,
	Mike Turquette
  Cc: linux-arm-kernel, Lee Jones, Linus Walleij, Rickard Andersson,
	Jonas Aberg, Vincent Guittot, Philippe Begnic, Ulf Hansson
In-Reply-To: <1355153142-9534-1-git-send-email-ulf.hansson@stericsson.com>

From: Ulf Hansson <ulf.hansson@linaro.org>

The armss clock shall only be fetched at probe thus move this here.
Same thing goes for the printing of the available frequencies.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Jonas Aaberg <jonas.aberg@stericsson.com>
---
 drivers/cpufreq/dbx500-cpufreq.c |   29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/drivers/cpufreq/dbx500-cpufreq.c b/drivers/cpufreq/dbx500-cpufreq.c
index 0a411b5..d974a8e 100644
--- a/drivers/cpufreq/dbx500-cpufreq.c
+++ b/drivers/cpufreq/dbx500-cpufreq.c
@@ -90,28 +90,14 @@ static unsigned int dbx500_cpufreq_getspeed(unsigned int cpu)
 
 static int __cpuinit dbx500_cpufreq_init(struct cpufreq_policy *policy)
 {
-	int i = 0;
 	int res;
 
-	armss_clk = clk_get(NULL, "armss");
-	if (IS_ERR(armss_clk)) {
-		pr_err("dbx500-cpufreq : Failed to get armss clk\n");
-		return PTR_ERR(armss_clk);
-	}
-
-	pr_info("dbx500-cpufreq : Available frequencies:\n");
-	while (freq_table[i].frequency != CPUFREQ_TABLE_END) {
-		pr_info("  %d Mhz\n", freq_table[i].frequency/1000);
-		i++;
-	}
-
 	/* get policy fields based on the table */
 	res = cpufreq_frequency_table_cpuinfo(policy, freq_table);
 	if (!res)
 		cpufreq_frequency_table_get_attr(freq_table, policy->cpu);
 	else {
 		pr_err("dbx500-cpufreq : Failed to read policy table\n");
-		clk_put(armss_clk);
 		return res;
 	}
 
@@ -147,13 +133,26 @@ static struct cpufreq_driver dbx500_cpufreq_driver = {
 
 static int dbx500_cpufreq_probe(struct platform_device *pdev)
 {
-	freq_table = dev_get_platdata(&pdev->dev);
+	int i = 0;
 
+	freq_table = dev_get_platdata(&pdev->dev);
 	if (!freq_table) {
 		pr_err("dbx500-cpufreq: Failed to fetch cpufreq table\n");
 		return -ENODEV;
 	}
 
+	armss_clk = clk_get(&pdev->dev, "armss");
+	if (IS_ERR(armss_clk)) {
+		pr_err("dbx500-cpufreq : Failed to get armss clk\n");
+		return PTR_ERR(armss_clk);
+	}
+
+	pr_info("dbx500-cpufreq : Available frequencies:\n");
+	while (freq_table[i].frequency != CPUFREQ_TABLE_END) {
+		pr_info("  %d Mhz\n", freq_table[i].frequency/1000);
+		i++;
+	}
+
 	return cpufreq_register_driver(&dbx500_cpufreq_driver);
 }
 
-- 
1.7.10


^ permalink raw reply related

* [RESEND PATCH 1/5] cpufreq: Give driver used for dbx500 family a more generic name
From: Ulf Hansson @ 2012-12-10 15:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, cpufreq, linux-pm, Mike Turquette,
	Mike Turquette
  Cc: linux-arm-kernel, Lee Jones, Linus Walleij, Rickard Andersson,
	Jonas Aberg, Vincent Guittot, Philippe Begnic, Ulf Hansson
In-Reply-To: <1355153142-9534-1-git-send-email-ulf.hansson@stericsson.com>

From: Lee Jones <lee.jones@linaro.org>

This driver doesn't only handle cpufreq functionality for the
db8500 anymore. There are new variants which rely on it too.
Let's make the name a bit more generic.

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 drivers/cpufreq/Makefile         |    2 +-
 drivers/cpufreq/db8500-cpufreq.c |  179 --------------------------------------
 drivers/cpufreq/dbx500-cpufreq.c |  179 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 180 insertions(+), 180 deletions(-)
 delete mode 100644 drivers/cpufreq/db8500-cpufreq.c
 create mode 100644 drivers/cpufreq/dbx500-cpufreq.c

diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 1bc90e1..675559f 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -41,7 +41,7 @@ obj-$(CONFIG_X86_CPUFREQ_NFORCE2)	+= cpufreq-nforce2.o
 
 ##################################################################################
 # ARM SoC drivers
-obj-$(CONFIG_UX500_SOC_DB8500)		+= db8500-cpufreq.o
+obj-$(CONFIG_UX500_SOC_DB8500)		+= dbx500-cpufreq.o
 obj-$(CONFIG_ARM_S3C2416_CPUFREQ)	+= s3c2416-cpufreq.o
 obj-$(CONFIG_ARM_S3C64XX_CPUFREQ)	+= s3c64xx-cpufreq.o
 obj-$(CONFIG_ARM_S5PV210_CPUFREQ)	+= s5pv210-cpufreq.o
diff --git a/drivers/cpufreq/db8500-cpufreq.c b/drivers/cpufreq/db8500-cpufreq.c
deleted file mode 100644
index 4f154bc..0000000
--- a/drivers/cpufreq/db8500-cpufreq.c
+++ /dev/null
@@ -1,179 +0,0 @@
-/*
- * Copyright (C) STMicroelectronics 2009
- * Copyright (C) ST-Ericsson SA 2010
- *
- * License Terms: GNU General Public License v2
- * Author: Sundar Iyer <sundar.iyer@stericsson.com>
- * Author: Martin Persson <martin.persson@stericsson.com>
- * Author: Jonas Aaberg <jonas.aberg@stericsson.com>
- *
- */
-#include <linux/module.h>
-#include <linux/kernel.h>
-#include <linux/cpufreq.h>
-#include <linux/delay.h>
-#include <linux/slab.h>
-#include <linux/platform_device.h>
-#include <linux/clk.h>
-#include <mach/id.h>
-
-static struct cpufreq_frequency_table *freq_table;
-static struct clk *armss_clk;
-
-static struct freq_attr *db8500_cpufreq_attr[] = {
-	&cpufreq_freq_attr_scaling_available_freqs,
-	NULL,
-};
-
-static int db8500_cpufreq_verify_speed(struct cpufreq_policy *policy)
-{
-	return cpufreq_frequency_table_verify(policy, freq_table);
-}
-
-static int db8500_cpufreq_target(struct cpufreq_policy *policy,
-				unsigned int target_freq,
-				unsigned int relation)
-{
-	struct cpufreq_freqs freqs;
-	unsigned int idx;
-
-	/* scale the target frequency to one of the extremes supported */
-	if (target_freq < policy->cpuinfo.min_freq)
-		target_freq = policy->cpuinfo.min_freq;
-	if (target_freq > policy->cpuinfo.max_freq)
-		target_freq = policy->cpuinfo.max_freq;
-
-	/* Lookup the next frequency */
-	if (cpufreq_frequency_table_target
-	    (policy, freq_table, target_freq, relation, &idx)) {
-		return -EINVAL;
-	}
-
-	freqs.old = policy->cur;
-	freqs.new = freq_table[idx].frequency;
-
-	if (freqs.old == freqs.new)
-		return 0;
-
-	/* pre-change notification */
-	for_each_cpu(freqs.cpu, policy->cpus)
-		cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
-
-	/* update armss clk frequency */
-	if (clk_set_rate(armss_clk, freq_table[idx].frequency * 1000)) {
-		pr_err("db8500-cpufreq: Failed to update armss clk\n");
-		return -EINVAL;
-	}
-
-	/* post change notification */
-	for_each_cpu(freqs.cpu, policy->cpus)
-		cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
-
-	return 0;
-}
-
-static unsigned int db8500_cpufreq_getspeed(unsigned int cpu)
-{
-	int i = 0;
-	unsigned long freq = clk_get_rate(armss_clk) / 1000;
-
-	while (freq_table[i].frequency != CPUFREQ_TABLE_END) {
-		if (freq <= freq_table[i].frequency)
-			return freq_table[i].frequency;
-		i++;
-	}
-
-	/* We could not find a corresponding frequency. */
-	pr_err("db8500-cpufreq: Failed to find cpufreq speed\n");
-	return 0;
-}
-
-static int __cpuinit db8500_cpufreq_init(struct cpufreq_policy *policy)
-{
-	int i = 0;
-	int res;
-
-	armss_clk = clk_get(NULL, "armss");
-	if (IS_ERR(armss_clk)) {
-		pr_err("db8500-cpufreq : Failed to get armss clk\n");
-		return PTR_ERR(armss_clk);
-	}
-
-	pr_info("db8500-cpufreq : Available frequencies:\n");
-	while (freq_table[i].frequency != CPUFREQ_TABLE_END) {
-		pr_info("  %d Mhz\n", freq_table[i].frequency/1000);
-		i++;
-	}
-
-	/* get policy fields based on the table */
-	res = cpufreq_frequency_table_cpuinfo(policy, freq_table);
-	if (!res)
-		cpufreq_frequency_table_get_attr(freq_table, policy->cpu);
-	else {
-		pr_err("db8500-cpufreq : Failed to read policy table\n");
-		clk_put(armss_clk);
-		return res;
-	}
-
-	policy->min = policy->cpuinfo.min_freq;
-	policy->max = policy->cpuinfo.max_freq;
-	policy->cur = db8500_cpufreq_getspeed(policy->cpu);
-	policy->governor = CPUFREQ_DEFAULT_GOVERNOR;
-
-	/*
-	 * FIXME : Need to take time measurement across the target()
-	 *	   function with no/some/all drivers in the notification
-	 *	   list.
-	 */
-	policy->cpuinfo.transition_latency = 20 * 1000; /* in ns */
-
-	/* policy sharing between dual CPUs */
-	cpumask_copy(policy->cpus, cpu_present_mask);
-
-	policy->shared_type = CPUFREQ_SHARED_TYPE_ALL;
-
-	return 0;
-}
-
-static struct cpufreq_driver db8500_cpufreq_driver = {
-	.flags  = CPUFREQ_STICKY,
-	.verify = db8500_cpufreq_verify_speed,
-	.target = db8500_cpufreq_target,
-	.get    = db8500_cpufreq_getspeed,
-	.init   = db8500_cpufreq_init,
-	.name   = "DB8500",
-	.attr   = db8500_cpufreq_attr,
-};
-
-static int db8500_cpufreq_probe(struct platform_device *pdev)
-{
-	freq_table = dev_get_platdata(&pdev->dev);
-
-	if (!freq_table) {
-		pr_err("db8500-cpufreq: Failed to fetch cpufreq table\n");
-		return -ENODEV;
-	}
-
-	return cpufreq_register_driver(&db8500_cpufreq_driver);
-}
-
-static struct platform_driver db8500_cpufreq_plat_driver = {
-	.driver = {
-		.name = "cpufreq-u8500",
-		.owner = THIS_MODULE,
-	},
-	.probe = db8500_cpufreq_probe,
-};
-
-static int __init db8500_cpufreq_register(void)
-{
-	if (!cpu_is_u8500_family())
-		return -ENODEV;
-
-	pr_info("cpufreq for DB8500 started\n");
-	return platform_driver_register(&db8500_cpufreq_plat_driver);
-}
-device_initcall(db8500_cpufreq_register);
-
-MODULE_LICENSE("GPL v2");
-MODULE_DESCRIPTION("cpufreq driver for DB8500");
diff --git a/drivers/cpufreq/dbx500-cpufreq.c b/drivers/cpufreq/dbx500-cpufreq.c
new file mode 100644
index 0000000..0a411b5
--- /dev/null
+++ b/drivers/cpufreq/dbx500-cpufreq.c
@@ -0,0 +1,179 @@
+/*
+ * Copyright (C) STMicroelectronics 2009
+ * Copyright (C) ST-Ericsson SA 2010
+ *
+ * License Terms: GNU General Public License v2
+ * Author: Sundar Iyer <sundar.iyer@stericsson.com>
+ * Author: Martin Persson <martin.persson@stericsson.com>
+ * Author: Jonas Aaberg <jonas.aberg@stericsson.com>
+ *
+ */
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/cpufreq.h>
+#include <linux/delay.h>
+#include <linux/slab.h>
+#include <linux/platform_device.h>
+#include <linux/clk.h>
+#include <mach/id.h>
+
+static struct cpufreq_frequency_table *freq_table;
+static struct clk *armss_clk;
+
+static struct freq_attr *dbx500_cpufreq_attr[] = {
+	&cpufreq_freq_attr_scaling_available_freqs,
+	NULL,
+};
+
+static int dbx500_cpufreq_verify_speed(struct cpufreq_policy *policy)
+{
+	return cpufreq_frequency_table_verify(policy, freq_table);
+}
+
+static int dbx500_cpufreq_target(struct cpufreq_policy *policy,
+				unsigned int target_freq,
+				unsigned int relation)
+{
+	struct cpufreq_freqs freqs;
+	unsigned int idx;
+
+	/* scale the target frequency to one of the extremes supported */
+	if (target_freq < policy->cpuinfo.min_freq)
+		target_freq = policy->cpuinfo.min_freq;
+	if (target_freq > policy->cpuinfo.max_freq)
+		target_freq = policy->cpuinfo.max_freq;
+
+	/* Lookup the next frequency */
+	if (cpufreq_frequency_table_target
+	    (policy, freq_table, target_freq, relation, &idx)) {
+		return -EINVAL;
+	}
+
+	freqs.old = policy->cur;
+	freqs.new = freq_table[idx].frequency;
+
+	if (freqs.old == freqs.new)
+		return 0;
+
+	/* pre-change notification */
+	for_each_cpu(freqs.cpu, policy->cpus)
+		cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
+
+	/* update armss clk frequency */
+	if (clk_set_rate(armss_clk, freq_table[idx].frequency * 1000)) {
+		pr_err("dbx500-cpufreq: Failed to update armss clk\n");
+		return -EINVAL;
+	}
+
+	/* post change notification */
+	for_each_cpu(freqs.cpu, policy->cpus)
+		cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+
+	return 0;
+}
+
+static unsigned int dbx500_cpufreq_getspeed(unsigned int cpu)
+{
+	int i = 0;
+	unsigned long freq = clk_get_rate(armss_clk) / 1000;
+
+	while (freq_table[i].frequency != CPUFREQ_TABLE_END) {
+		if (freq <= freq_table[i].frequency)
+			return freq_table[i].frequency;
+		i++;
+	}
+
+	/* We could not find a corresponding frequency. */
+	pr_err("dbx500-cpufreq: Failed to find cpufreq speed\n");
+	return 0;
+}
+
+static int __cpuinit dbx500_cpufreq_init(struct cpufreq_policy *policy)
+{
+	int i = 0;
+	int res;
+
+	armss_clk = clk_get(NULL, "armss");
+	if (IS_ERR(armss_clk)) {
+		pr_err("dbx500-cpufreq : Failed to get armss clk\n");
+		return PTR_ERR(armss_clk);
+	}
+
+	pr_info("dbx500-cpufreq : Available frequencies:\n");
+	while (freq_table[i].frequency != CPUFREQ_TABLE_END) {
+		pr_info("  %d Mhz\n", freq_table[i].frequency/1000);
+		i++;
+	}
+
+	/* get policy fields based on the table */
+	res = cpufreq_frequency_table_cpuinfo(policy, freq_table);
+	if (!res)
+		cpufreq_frequency_table_get_attr(freq_table, policy->cpu);
+	else {
+		pr_err("dbx500-cpufreq : Failed to read policy table\n");
+		clk_put(armss_clk);
+		return res;
+	}
+
+	policy->min = policy->cpuinfo.min_freq;
+	policy->max = policy->cpuinfo.max_freq;
+	policy->cur = dbx500_cpufreq_getspeed(policy->cpu);
+	policy->governor = CPUFREQ_DEFAULT_GOVERNOR;
+
+	/*
+	 * FIXME : Need to take time measurement across the target()
+	 *	   function with no/some/all drivers in the notification
+	 *	   list.
+	 */
+	policy->cpuinfo.transition_latency = 20 * 1000; /* in ns */
+
+	/* policy sharing between dual CPUs */
+	cpumask_copy(policy->cpus, cpu_present_mask);
+
+	policy->shared_type = CPUFREQ_SHARED_TYPE_ALL;
+
+	return 0;
+}
+
+static struct cpufreq_driver dbx500_cpufreq_driver = {
+	.flags  = CPUFREQ_STICKY,
+	.verify = dbx500_cpufreq_verify_speed,
+	.target = dbx500_cpufreq_target,
+	.get    = dbx500_cpufreq_getspeed,
+	.init   = dbx500_cpufreq_init,
+	.name   = "DBX500",
+	.attr   = dbx500_cpufreq_attr,
+};
+
+static int dbx500_cpufreq_probe(struct platform_device *pdev)
+{
+	freq_table = dev_get_platdata(&pdev->dev);
+
+	if (!freq_table) {
+		pr_err("dbx500-cpufreq: Failed to fetch cpufreq table\n");
+		return -ENODEV;
+	}
+
+	return cpufreq_register_driver(&dbx500_cpufreq_driver);
+}
+
+static struct platform_driver dbx500_cpufreq_plat_driver = {
+	.driver = {
+		.name = "cpufreq-ux500",
+		.owner = THIS_MODULE,
+	},
+	.probe = dbx500_cpufreq_probe,
+};
+
+static int __init dbx500_cpufreq_register(void)
+{
+	if (!cpu_is_u8500_family())
+		return -ENODEV;
+
+	pr_info("cpufreq for DBX500 started\n");
+	return platform_driver_register(&dbx500_cpufreq_plat_driver);
+}
+device_initcall(dbx500_cpufreq_register);
+
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("cpufreq driver for DBX500");
-- 
1.7.10


^ permalink raw reply related

* [RESEND PATCH 2/5] mfd: db8500: Update cpufreq device name
From: Ulf Hansson @ 2012-12-10 15:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, cpufreq, linux-pm, Mike Turquette,
	Mike Turquette
  Cc: linux-arm-kernel, Lee Jones, Linus Walleij, Rickard Andersson,
	Jonas Aberg, Vincent Guittot, Philippe Begnic, Ulf Hansson
In-Reply-To: <1355153142-9534-1-git-send-email-ulf.hansson@stericsson.com>

From: Lee Jones <lee.jones@linaro.org>

Since the cpufreq driver for ux500 has been renamed from
cpufreq-db8500 to cpufreq-dbx500, we need to change the
device name here as well.

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>
---
 drivers/mfd/db8500-prcmu.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/mfd/db8500-prcmu.c b/drivers/mfd/db8500-prcmu.c
index b96661d..5214422 100644
--- a/drivers/mfd/db8500-prcmu.c
+++ b/drivers/mfd/db8500-prcmu.c
@@ -3062,8 +3062,8 @@ static struct mfd_cell db8500_prcmu_devs[] = {
 		.pdata_size = sizeof(db8500_regulators),
 	},
 	{
-		.name = "cpufreq-u8500",
-		.of_compatible = "stericsson,cpufreq-u8500",
+		.name = "cpufreq-ux500",
+		.of_compatible = "stericsson,cpufreq-ux500",
 		.platform_data = &db8500_cpufreq_table,
 		.pdata_size = sizeof(db8500_cpufreq_table),
 	},
-- 
1.7.10


^ permalink raw reply related

* [RESEND PATCH 0/5] cpufreq: db8500: Rename driver and update some parts
From: Ulf Hansson @ 2012-12-10 15:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, cpufreq, linux-pm, Mike Turquette,
	Mike Turquette
  Cc: linux-arm-kernel, Lee Jones, Linus Walleij, Rickard Andersson,
	Jonas Aberg, Vincent Guittot, Philippe Begnic, Ulf Hansson

From: Ulf Hansson <ulf.hansson@linaro.org>

This patchset starts by renaming the db8500 cpufreq driver to a more generic
name. There are new variants which rely on it too, so instead we give it a
family name of dbx500.

On top of that a fixup patch for initialization of the driver and some minor
cleanup patches are included as well.

These patches can be material for 3.8.

Since some patches for the db8500 cpufreq driver has recently been merged
through Mike Turquette's clk tree, I decided to base these patches on top
of the clk-next branch. So I hope Mike is fine by merging these patches
through his tree.

So I will try to collect ACKs from Rafael for the cpufreq patches. The
mfd patch has already been ACKED by Samuel.

Jonas Aaberg (1):
  cpufreq: dbx500: Minor code cleanup

Lee Jones (3):
  cpufreq: Give driver used for dbx500 family a more generic name
  mfd: db8500: Update cpufreq device name
  cpufreq: dbx500: Update file header

Ulf Hansson (1):
  cpufreq: dbx500: Move clk_get to probe

 drivers/cpufreq/Makefile         |    2 +-
 drivers/cpufreq/db8500-cpufreq.c |  179 -------------------------------------
 drivers/cpufreq/dbx500-cpufreq.c |  180 ++++++++++++++++++++++++++++++++++++++
 drivers/mfd/db8500-prcmu.c       |    4 +-
 4 files changed, 183 insertions(+), 182 deletions(-)
 delete mode 100644 drivers/cpufreq/db8500-cpufreq.c
 create mode 100644 drivers/cpufreq/dbx500-cpufreq.c

-- 
1.7.10

^ permalink raw reply

* [RESEND PATCH 5/5] cpufreq: dbx500: Update file header
From: Ulf Hansson @ 2012-12-10 15:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, cpufreq, linux-pm, Mike Turquette,
	Mike Turquette
  Cc: linux-arm-kernel, Lee Jones, Linus Walleij, Rickard Andersson,
	Jonas Aberg, Vincent Guittot, Philippe Begnic, Ulf Hansson
In-Reply-To: <1355153142-9534-1-git-send-email-ulf.hansson@stericsson.com>

From: Lee Jones <lee.jones@linaro.org>

Real simple patch to extend the ST-Ericsson copyright date and
remove unnecessary extra commented lines.

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 drivers/cpufreq/dbx500-cpufreq.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/dbx500-cpufreq.c b/drivers/cpufreq/dbx500-cpufreq.c
index d4cb782..047fc4f 100644
--- a/drivers/cpufreq/dbx500-cpufreq.c
+++ b/drivers/cpufreq/dbx500-cpufreq.c
@@ -1,13 +1,13 @@
 /*
  * Copyright (C) STMicroelectronics 2009
- * Copyright (C) ST-Ericsson SA 2010
+ * Copyright (C) ST-Ericsson SA 2010-2012
  *
  * License Terms: GNU General Public License v2
  * Author: Sundar Iyer <sundar.iyer@stericsson.com>
  * Author: Martin Persson <martin.persson@stericsson.com>
  * Author: Jonas Aaberg <jonas.aberg@stericsson.com>
- *
  */
+
 #include <linux/module.h>
 #include <linux/kernel.h>
 #include <linux/cpufreq.h>
-- 
1.7.10


^ permalink raw reply related

* [RESEND PATCH 4/5] cpufreq: dbx500: Minor code cleanup
From: Ulf Hansson @ 2012-12-10 15:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, cpufreq, linux-pm, Mike Turquette,
	Mike Turquette
  Cc: linux-arm-kernel, Lee Jones, Linus Walleij, Rickard Andersson,
	Jonas Aberg, Vincent Guittot, Philippe Begnic, Ulf Hansson
In-Reply-To: <1355153142-9534-1-git-send-email-ulf.hansson@stericsson.com>

From: Jonas Aaberg <jonas.aberg@stericsson.com>

Some minor code cleanup and some minor changes to printed
error messages.

Signed-off-by: Jonas Aaberg <jonas.aberg@stericsson.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 drivers/cpufreq/dbx500-cpufreq.c |   22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/cpufreq/dbx500-cpufreq.c b/drivers/cpufreq/dbx500-cpufreq.c
index d974a8e..d4cb782 100644
--- a/drivers/cpufreq/dbx500-cpufreq.c
+++ b/drivers/cpufreq/dbx500-cpufreq.c
@@ -36,6 +36,7 @@ static int dbx500_cpufreq_target(struct cpufreq_policy *policy,
 {
 	struct cpufreq_freqs freqs;
 	unsigned int idx;
+	int ret;
 
 	/* scale the target frequency to one of the extremes supported */
 	if (target_freq < policy->cpuinfo.min_freq)
@@ -44,10 +45,9 @@ static int dbx500_cpufreq_target(struct cpufreq_policy *policy,
 		target_freq = policy->cpuinfo.max_freq;
 
 	/* Lookup the next frequency */
-	if (cpufreq_frequency_table_target
-	    (policy, freq_table, target_freq, relation, &idx)) {
+	if (cpufreq_frequency_table_target(policy, freq_table, target_freq,
+					relation, &idx))
 		return -EINVAL;
-	}
 
 	freqs.old = policy->cur;
 	freqs.new = freq_table[idx].frequency;
@@ -60,9 +60,12 @@ static int dbx500_cpufreq_target(struct cpufreq_policy *policy,
 		cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
 
 	/* update armss clk frequency */
-	if (clk_set_rate(armss_clk, freq_table[idx].frequency * 1000)) {
-		pr_err("dbx500-cpufreq: Failed to update armss clk\n");
-		return -EINVAL;
+	ret = clk_set_rate(armss_clk, freqs.new * 1000);
+
+	if (ret) {
+		pr_err("dbx500-cpufreq: Failed to set armss_clk to %d Hz: error %d\n",
+		       freqs.new * 1000, ret);
+		return ret;
 	}
 
 	/* post change notification */
@@ -97,7 +100,7 @@ static int __cpuinit dbx500_cpufreq_init(struct cpufreq_policy *policy)
 	if (!res)
 		cpufreq_frequency_table_get_attr(freq_table, policy->cpu);
 	else {
-		pr_err("dbx500-cpufreq : Failed to read policy table\n");
+		pr_err("dbx500-cpufreq: Failed to read policy table\n");
 		return res;
 	}
 
@@ -143,11 +146,11 @@ static int dbx500_cpufreq_probe(struct platform_device *pdev)
 
 	armss_clk = clk_get(&pdev->dev, "armss");
 	if (IS_ERR(armss_clk)) {
-		pr_err("dbx500-cpufreq : Failed to get armss clk\n");
+		pr_err("dbx500-cpufreq: Failed to get armss clk\n");
 		return PTR_ERR(armss_clk);
 	}
 
-	pr_info("dbx500-cpufreq : Available frequencies:\n");
+	pr_info("dbx500-cpufreq: Available frequencies:\n");
 	while (freq_table[i].frequency != CPUFREQ_TABLE_END) {
 		pr_info("  %d Mhz\n", freq_table[i].frequency/1000);
 		i++;
@@ -169,7 +172,6 @@ static int __init dbx500_cpufreq_register(void)
 	if (!cpu_is_u8500_family())
 		return -ENODEV;
 
-	pr_info("cpufreq for DBX500 started\n");
 	return platform_driver_register(&dbx500_cpufreq_plat_driver);
 }
 device_initcall(dbx500_cpufreq_register);
-- 
1.7.10


^ permalink raw reply related

* [RFC PATCH] ARM: EXYNOS5: Support Exynos5-bus devfreq driver
From: Abhilash Kesavan @ 2012-12-10 12:06 UTC (permalink / raw)
  To: linux-kernel, linux-pm, kgene.kim
  Cc: myungjoo.ham, kyungmin.park, rjw, jhbird.choi, Abhilash Kesavan

- Setup the INT clock ops to control/vary INT frequency
- Add mappings initially for the PPMU device

Signed-off-by: Abhilash Kesavan <a.kesavan@samsung.com>
---
Corresponding devfreq driver support for Exynos5 has been posted at:
https://patchwork.kernel.org/patch/1823931/

Tested after merging for-rafael branch of
git://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq.git
with for-next branch of
git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung.git

 arch/arm/mach-exynos/clock-exynos5.c           |  151 ++++++++++++++++++++++++
 arch/arm/mach-exynos/common.c                  |   25 ++++
 arch/arm/mach-exynos/include/mach/map.h        |    6 +
 arch/arm/mach-exynos/include/mach/regs-clock.h |   48 ++++++++
 arch/arm/plat-samsung/include/plat/map-s5p.h   |    6 +
 5 files changed, 236 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mach-exynos/clock-exynos5.c b/arch/arm/mach-exynos/clock-exynos5.c
index 5c63bc7..f00b259 100644
--- a/arch/arm/mach-exynos/clock-exynos5.c
+++ b/arch/arm/mach-exynos/clock-exynos5.c
@@ -109,6 +109,11 @@ static struct clk exynos5_clk_sclk_usbphy = {
 	.rate		= 48000000,
 };
 
+/* Virtual Bus INT clock */
+static struct clk exynos5_int_clk = {
+	.name		= "int_clk",
+};
+
 static int exynos5_clksrc_mask_top_ctrl(struct clk *clk, int enable)
 {
 	return s5p_gatectrl(EXYNOS5_CLKSRC_MASK_TOP, clk, enable);
@@ -1519,6 +1524,149 @@ static struct clk *exynos5_clks[] __initdata = {
 	&clk_fout_cpll,
 	&clk_fout_mpll_div2,
 	&exynos5_clk_armclk,
+	&exynos5_int_clk,
+};
+
+#define INT_FREQ(f, a0, a1, a2, a3, a4, a5, b0, b1, b2, b3, \
+			c0, c1, d0, e0) \
+	{ \
+		.freq = (f) * 1000000, \
+		.clk_div_top0 = ((a0) << 0 | (a1) << 8 | (a2) << 12 | \
+				(a3) << 16 | (a4) << 20 | (a5) << 28), \
+		.clk_div_top1 = ((b0) << 12 | (b1) << 16 | (b2) << 20 | \
+				(b3) << 24), \
+		.clk_div_lex = ((c0) << 4 | (c1) << 8), \
+		.clk_div_r0x = ((d0) << 4), \
+		.clk_div_r1x = ((e0) << 4), \
+	}
+
+static struct {
+	unsigned long freq;
+	u32 clk_div_top0;
+	u32 clk_div_top1;
+	u32 clk_div_lex;
+	u32 clk_div_r0x;
+	u32 clk_div_r1x;
+} int_freq[] = {
+	/*
+	 * values:
+	 * freq
+	 * clock divider for ACLK66, ACLK166, ACLK200, ACLK266,
+			ACLK333, ACLK300_DISP1
+	 * clock divider for ACLK300_GSCL, ACLK400_IOP, ACLK400_ISP, ACLK66_PRE
+	 * clock divider for PCLK_LEX, ATCLK_LEX
+	 * clock divider for ACLK_PR0X
+	 * clock divider for ACLK_PR1X
+	 */
+	INT_FREQ(266, 1, 1, 3, 2, 0, 0, 0, 1, 1, 5, 1, 0, 1, 1),
+	INT_FREQ(200, 1, 2, 4, 3, 1, 0, 0, 3, 2, 5, 1, 0, 1, 1),
+	INT_FREQ(160, 1, 3, 4, 4, 2, 0, 0, 3, 3, 5, 1, 0, 1, 1),
+	INT_FREQ(133, 1, 3, 5, 5, 2, 1, 1, 4, 4, 5, 1, 0, 1, 1),
+	INT_FREQ(100, 1, 7, 7, 7, 7, 3, 7, 7, 7, 5, 1, 0, 1, 1),
+};
+
+static unsigned long exynos5_clk_int_get_rate(struct clk *clk)
+{
+	return clk->rate;
+}
+
+static void exynos5_int_set_clkdiv(unsigned int div_index)
+{
+	unsigned int tmp;
+
+	/* Change Divider - TOP0 */
+	tmp = __raw_readl(EXYNOS5_CLKDIV_TOP0);
+
+	tmp &= ~(EXYNOS5_CLKDIV_TOP0_ACLK266_MASK |
+		EXYNOS5_CLKDIV_TOP0_ACLK200_MASK |
+		EXYNOS5_CLKDIV_TOP0_ACLK66_MASK |
+		EXYNOS5_CLKDIV_TOP0_ACLK333_MASK |
+		EXYNOS5_CLKDIV_TOP0_ACLK166_MASK |
+		EXYNOS5_CLKDIV_TOP0_ACLK300_DISP1_MASK);
+
+	tmp |= int_freq[div_index].clk_div_top0;
+
+	__raw_writel(tmp, EXYNOS5_CLKDIV_TOP0);
+
+	while (__raw_readl(EXYNOS5_CLKDIV_STAT_TOP0) & 0x151101)
+		cpu_relax();
+
+	/* Change Divider - TOP1 */
+	tmp = __raw_readl(EXYNOS5_CLKDIV_TOP1);
+
+	tmp &= ~(EXYNOS5_CLKDIV_TOP1_ACLK400_ISP_MASK |
+		EXYNOS5_CLKDIV_TOP1_ACLK400_IOP_MASK |
+		EXYNOS5_CLKDIV_TOP1_ACLK66_PRE_MASK |
+		EXYNOS5_CLKDIV_TOP1_ACLK300_GSCL_MASK);
+
+	tmp |= int_freq[div_index].clk_div_top1;
+
+	__raw_writel(tmp, EXYNOS5_CLKDIV_TOP1);
+
+	while ((__raw_readl(EXYNOS5_CLKDIV_STAT_TOP1) & 0x1110000) &&
+		(__raw_readl(EXYNOS5_CLKDIV_STAT_TOP0) & 0x80000))
+		cpu_relax();
+
+	/* Change Divider - LEX */
+	tmp = __raw_readl(EXYNOS5_CLKDIV_LEX);
+
+	tmp &= ~(EXYNOS5_CLKDIV_LEX_ATCLK_LEX_MASK |
+		EXYNOS5_CLKDIV_LEX_PCLK_LEX_MASK);
+
+	tmp |= int_freq[div_index].clk_div_lex;
+
+	__raw_writel(tmp, EXYNOS5_CLKDIV_LEX);
+
+	while (__raw_readl(EXYNOS5_CLKDIV_STAT_LEX) & 0x110)
+		cpu_relax();
+
+	/* Change Divider - R0X */
+	tmp = __raw_readl(EXYNOS5_CLKDIV_R0X);
+
+	tmp &= ~EXYNOS5_CLKDIV_R0X_PCLK_R0X_MASK;
+
+	tmp |= int_freq[div_index].clk_div_r0x;
+
+	__raw_writel(tmp, EXYNOS5_CLKDIV_R0X);
+
+	while (__raw_readl(EXYNOS5_CLKDIV_STAT_R0X) & 0x10)
+		cpu_relax();
+
+	/* Change Divider - R1X */
+	tmp = __raw_readl(EXYNOS5_CLKDIV_R1X);
+
+	tmp &= ~EXYNOS5_CLKDIV_R1X_PCLK_R1X_MASK;
+
+	tmp |= int_freq[div_index].clk_div_r1x;
+
+	__raw_writel(tmp, EXYNOS5_CLKDIV_R1X);
+
+	while (__raw_readl(EXYNOS5_CLKDIV_STAT_R1X) & 0x10)
+		cpu_relax();
+}
+
+static int exynos5_clk_int_set_rate(struct clk *clk, unsigned long rate)
+{
+	int index;
+
+	for (index = 0; index < ARRAY_SIZE(int_freq); index++)
+		if (int_freq[index].freq == rate)
+			break;
+
+	if (index == ARRAY_SIZE(int_freq))
+		return -EINVAL;
+
+	/* Change the system clock divider values */
+	exynos5_int_set_clkdiv(index);
+
+	clk->rate = rate;
+
+	return 0;
+}
+
+static struct clk_ops exynos5_clk_int_ops = {
+	.get_rate = exynos5_clk_int_get_rate,
+	.set_rate = exynos5_clk_int_set_rate
 };
 
 static u32 epll_div[][6] = {
@@ -1713,6 +1861,9 @@ void __init_or_cpufreq exynos5_setup_clocks(void)
 
 	clk_fout_epll.ops = &exynos5_epll_ops;
 
+	exynos5_int_clk.ops = &exynos5_clk_int_ops;
+	exynos5_int_clk.rate = aclk_266;
+
 	if (clk_set_parent(&exynos5_clk_mout_epll.clk, &clk_fout_epll))
 		printk(KERN_ERR "Unable to set parent %s of clock %s.\n",
 				clk_fout_epll.name, exynos5_clk_mout_epll.clk.name);
diff --git a/arch/arm/mach-exynos/common.c b/arch/arm/mach-exynos/common.c
index 73b940f..6b7d4ee 100644
--- a/arch/arm/mach-exynos/common.c
+++ b/arch/arm/mach-exynos/common.c
@@ -282,6 +282,31 @@ static struct map_desc exynos5_iodesc[] __initdata = {
 		.pfn		= __phys_to_pfn(EXYNOS5_PA_UART),
 		.length		= SZ_512K,
 		.type		= MT_DEVICE,
+	}, {
+		.virtual        = (unsigned long)S5P_VA_PPMU_CPU,
+		.pfn            = __phys_to_pfn(EXYNOS5_PA_PPMU_CPU),
+		.length         = SZ_8K,
+		.type		= MT_DEVICE,
+	}, {
+		.virtual        = (unsigned long)S5P_VA_PPMU_DDR_C,
+		.pfn            = __phys_to_pfn(EXYNOS5_PA_PPMU_DDR_C),
+		.length         = SZ_8K,
+		.type           = MT_DEVICE,
+	}, {
+		.virtual        = (unsigned long)S5P_VA_PPMU_DDR_R1,
+		.pfn            = __phys_to_pfn(EXYNOS5_PA_PPMU_DDR_R1),
+		.length         = SZ_8K,
+		.type           = MT_DEVICE,
+	}, {
+		.virtual        = (unsigned long)S5P_VA_PPMU_DDR_L,
+		.pfn            = __phys_to_pfn(EXYNOS5_PA_PPMU_DDR_L),
+		.length         = SZ_8K,
+		.type           = MT_DEVICE,
+	}, {
+		.virtual        = (unsigned long)S5P_VA_PPMU_RIGHT,
+		.pfn            = __phys_to_pfn(EXYNOS5_PA_PPMU_RIGHT),
+		.length         = SZ_8K,
+		.type           = MT_DEVICE,
 	},
 };
 
diff --git a/arch/arm/mach-exynos/include/mach/map.h b/arch/arm/mach-exynos/include/mach/map.h
index cbb2852..8c8de91 100644
--- a/arch/arm/mach-exynos/include/mach/map.h
+++ b/arch/arm/mach-exynos/include/mach/map.h
@@ -228,6 +228,12 @@
 #define EXYNOS4_PA_SDRAM		0x40000000
 #define EXYNOS5_PA_SDRAM		0x40000000
 
+#define EXYNOS5_PA_PPMU_DDR_C		0x10C40000
+#define EXYNOS5_PA_PPMU_DDR_R1		0x10C50000
+#define EXYNOS5_PA_PPMU_CPU		0x10C60000
+#define EXYNOS5_PA_PPMU_DDR_L		0x10CB0000
+#define EXYNOS5_PA_PPMU_RIGHT		0x13660000
+
 /* Compatibiltiy Defines */
 
 #define S3C_PA_HSMMC0			EXYNOS4_PA_HSMMC(0)
diff --git a/arch/arm/mach-exynos/include/mach/regs-clock.h b/arch/arm/mach-exynos/include/mach/regs-clock.h
index d36ad76..bad3cd3 100644
--- a/arch/arm/mach-exynos/include/mach/regs-clock.h
+++ b/arch/arm/mach-exynos/include/mach/regs-clock.h
@@ -323,6 +323,9 @@
 #define EXYNOS5_CLKDIV_PERIC5			EXYNOS_CLKREG(0x1056C)
 #define EXYNOS5_SCLK_DIV_ISP			EXYNOS_CLKREG(0x10580)
 
+#define EXYNOS5_CLKDIV_STAT_TOP0		EXYNOS_CLKREG(0x10610)
+#define EXYNOS5_CLKDIV_STAT_TOP1		EXYNOS_CLKREG(0x10614)
+
 #define EXYNOS5_CLKGATE_IP_ACP			EXYNOS_CLKREG(0x08800)
 #define EXYNOS5_CLKGATE_IP_ISP0			EXYNOS_CLKREG(0x0C800)
 #define EXYNOS5_CLKGATE_IP_ISP1			EXYNOS_CLKREG(0x0C804)
@@ -337,6 +340,18 @@
 #define EXYNOS5_CLKGATE_IP_PERIS		EXYNOS_CLKREG(0x10960)
 #define EXYNOS5_CLKGATE_BLOCK			EXYNOS_CLKREG(0x10980)
 
+#define EXYNOS5_CLKGATE_BUS_SYSLFT		EXYNOS_CLKREG(0x08920)
+
+#define EXYNOS5_CLKOUT_CMU_TOP			EXYNOS_CLKREG(0x10A00)
+
+#define EXYNOS5_CLKDIV_LEX			EXYNOS_CLKREG(0x14500)
+#define EXYNOS5_CLKDIV_STAT_LEX			EXYNOS_CLKREG(0x14600)
+
+#define EXYNOS5_CLKDIV_R0X			EXYNOS_CLKREG(0x18500)
+#define EXYNOS5_CLKDIV_STAT_R0X			EXYNOS_CLKREG(0x18600)
+
+#define EXYNOS5_CLKDIV_R1X			EXYNOS_CLKREG(0x1C500)
+#define EXYNOS5_CLKDIV_STAT_R1X			EXYNOS_CLKREG(0x1C600)
 #define EXYNOS5_BPLL_CON0			EXYNOS_CLKREG(0x20110)
 #define EXYNOS5_CLKSRC_CDREX			EXYNOS_CLKREG(0x20200)
 #define EXYNOS5_CLKDIV_CDREX			EXYNOS_CLKREG(0x20500)
@@ -347,6 +362,39 @@
 
 #define EXYNOS5_EPLLCON0_LOCKED_SHIFT		(29)
 
+#define EXYNOS5_CLKDIV_TOP0_ACLK300_DISP1_SHIFT	(28)
+#define EXYNOS5_CLKDIV_TOP0_ACLK300_DISP1_MASK	(0x7 << EXYNOS5_CLKDIV_TOP0_ACLK300_DISP1_SHIFT)
+#define EXYNOS5_CLKDIV_TOP0_ACLK333_SHIFT	(20)
+#define EXYNOS5_CLKDIV_TOP0_ACLK333_MASK	(0x7 << EXYNOS5_CLKDIV_TOP0_ACLK333_SHIFT)
+#define EXYNOS5_CLKDIV_TOP0_ACLK266_SHIFT	(16)
+#define EXYNOS5_CLKDIV_TOP0_ACLK266_MASK	(0x7 << EXYNOS5_CLKDIV_TOP0_ACLK266_SHIFT)
+#define EXYNOS5_CLKDIV_TOP0_ACLK200_SHIFT	(12)
+#define EXYNOS5_CLKDIV_TOP0_ACLK200_MASK	(0x7 << EXYNOS5_CLKDIV_TOP0_ACLK200_SHIFT)
+#define EXYNOS5_CLKDIV_TOP0_ACLK166_SHIFT	(8)
+#define EXYNOS5_CLKDIV_TOP0_ACLK166_MASK	(0x7 << EXYNOS5_CLKDIV_TOP0_ACLK166_SHIFT)
+#define EXYNOS5_CLKDIV_TOP0_ACLK66_SHIFT	(0)
+#define EXYNOS5_CLKDIV_TOP0_ACLK66_MASK		(0x7 << EXYNOS5_CLKDIV_TOP0_ACLK66_SHIFT)
+
+#define EXYNOS5_CLKDIV_TOP1_ACLK66_PRE_SHIFT	(24)
+#define EXYNOS5_CLKDIV_TOP1_ACLK66_PRE_MASK	(0x7 << EXYNOS5_CLKDIV_TOP1_ACLK66_PRE_SHIFT)
+#define EXYNOS5_CLKDIV_TOP1_ACLK400_ISP_SHIFT	(20)
+#define EXYNOS5_CLKDIV_TOP1_ACLK400_ISP_MASK	(0x7 << EXYNOS5_CLKDIV_TOP1_ACLK400_ISP_SHIFT)
+#define EXYNOS5_CLKDIV_TOP1_ACLK400_IOP_SHIFT	(16)
+#define EXYNOS5_CLKDIV_TOP1_ACLK400_IOP_MASK	(0x7 << EXYNOS5_CLKDIV_TOP1_ACLK400_IOP_SHIFT)
+#define EXYNOS5_CLKDIV_TOP1_ACLK300_GSCL_SHIFT	(12)
+#define EXYNOS5_CLKDIV_TOP1_ACLK300_GSCL_MASK	(0x7 << EXYNOS5_CLKDIV_TOP1_ACLK300_GSCL_SHIFT)
+
+#define EXYNOS5_CLKDIV_LEX_ATCLK_LEX_SHIFT	(8)
+#define EXYNOS5_CLKDIV_LEX_ATCLK_LEX_MASK	(0x7 << EXYNOS5_CLKDIV_LEX_ATCLK_LEX_SHIFT)
+#define EXYNOS5_CLKDIV_LEX_PCLK_LEX_SHIFT	(4)
+#define EXYNOS5_CLKDIV_LEX_PCLK_LEX_MASK	(0x7 << EXYNOS5_CLKDIV_LEX_PCLK_LEX_SHIFT)
+
+#define EXYNOS5_CLKDIV_R0X_PCLK_R0X_SHIFT	(4)
+#define EXYNOS5_CLKDIV_R0X_PCLK_R0X_MASK	(0x7 << EXYNOS5_CLKDIV_R0X_PCLK_R0X_SHIFT)
+
+#define EXYNOS5_CLKDIV_R1X_PCLK_R1X_SHIFT	(4)
+#define EXYNOS5_CLKDIV_R1X_PCLK_R1X_MASK	(0x7 << EXYNOS5_CLKDIV_R1X_PCLK_R1X_SHIFT)
+
 #define PWR_CTRL1_CORE2_DOWN_RATIO		(7 << 28)
 #define PWR_CTRL1_CORE1_DOWN_RATIO		(7 << 16)
 #define PWR_CTRL1_DIV2_DOWN_EN			(1 << 9)
diff --git a/arch/arm/plat-samsung/include/plat/map-s5p.h b/arch/arm/plat-samsung/include/plat/map-s5p.h
index 038aa96..28bef98 100644
--- a/arch/arm/plat-samsung/include/plat/map-s5p.h
+++ b/arch/arm/plat-samsung/include/plat/map-s5p.h
@@ -42,6 +42,12 @@
 
 #define S5P_VA_AUDSS		S3C_ADDR(0x02830000)
 
+#define S5P_VA_PPMU_CPU		S3C_ADDR(0x02840000)
+#define S5P_VA_PPMU_DDR_C	S3C_ADDR(0x02842000)
+#define S5P_VA_PPMU_DDR_R1	S3C_ADDR(0x02844000)
+#define S5P_VA_PPMU_DDR_L	S3C_ADDR(0x02846000)
+#define S5P_VA_PPMU_RIGHT	S3C_ADDR(0x02848000)
+
 #define VA_VIC(x)		(S3C_VA_IRQ + ((x) * 0x10000))
 #define VA_VIC0			VA_VIC(0)
 #define VA_VIC1			VA_VIC(1)
-- 
1.7.8.6


^ permalink raw reply related

* Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Srivatsa S. Bhat @ 2012-12-10  5:19 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: rostedt, tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121209205733.GA7038@redhat.com>

On 12/10/2012 02:27 AM, Oleg Nesterov wrote:
> On 12/07, Srivatsa S. Bhat wrote:
>>
>> 4. No deadlock possibilities
>>
>>    Per-cpu locking is not the way to go if we want to have relaxed rules
>>    for lock-ordering. Because, we can end up in circular-locking dependencies
>>    as explained in https://lkml.org/lkml/2012/12/6/290
> 
> OK, but this assumes that, contrary to what Steven said, read-write-read
> deadlock is not possible when it comes to rwlock_t.

What I meant is, with a single (global) rwlock, you can't deadlock like that.
But if you used per-cpu rwlocks and if we don't implement them properly, then we
can end up in circular locking dependencies like shown above.

That is, if you take the same example and replace the lock with global
rwlock, you won't deadlock:

Readers:

         CPU 0                                CPU 1
         ------                               ------

1.    spin_lock(&random_lock);             read_lock(&my_rwlock);

2.    read_lock(&my_rwlock);               spin_lock(&random_lock);

Writer:

         CPU 2:
         ------

       write_lock(&my_rwlock);

Even if the writer does a write_lock() in-between steps 1 and 2 at the reader-
side, nothing will happen. The writer would spin because CPU 1 already got
the rwlock for read, and hence both CPU 0 and 1 go ahead. When they finish,
the writer will get the lock and proceed. So, no deadlocks here.
So, what I was pointing out here was, if somebody replaced this global
rwlock with a "straight-forward" implementation of per-cpu rwlocks, he'll
immediately end up in circular locking dependency deadlock between the 3
entities as explained in the link above.

Let me know if my assumptions are incorrect!

> So far I think this
> is true and we can't deadlock. Steven?
> 
> However. If this is true, then compared to preempt_disable/stop_machine
> livelock is possible. Probably this is fine, we have the same problem with
> get_online_cpus(). But if we can accept this fact I feel we can simmplify
> this somehow... Can't prove, only feel ;)
> 

Not sure I follow..

Anyway, my point is that, we _can't_ implement per-cpu rwlocks like lglocks
and expect it to work in this case. IOW, we can't do:

Reader-side:
   -> read_lock() your per-cpu rwlock and proceed.

Writer-side:
   -> for_each_online_cpu(cpu)
          write_lock(per-cpu rwlock of 'cpu');

Also, like Tejun said, one of the important measures for per-cpu rwlocks
should be that, if a user replaces global rwlocks with percpu rwlocks (for
performance reasons), he shouldn't suddenly end up in numerous deadlock
possibilities which never existed before. The replacement should continue to
remain safe, and perhaps improve the performance.

Regards,
Srivatsa S. Bhat

^ permalink raw reply

* Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Srivatsa S. Bhat @ 2012-12-10  5:01 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121209211338.GA8090@redhat.com>

On 12/10/2012 02:43 AM, Oleg Nesterov wrote:
> Damn, sorry for noise. I missed this part...
> 
> On 12/10, Srivatsa S. Bhat wrote:
>>
>> On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
>>> the latency. And I guess something like kick_all_cpus_sync() is "too heavy".
>>
>> I hadn't considered that. Thinking of it, I don't think it would help us..
>> It won't get rid of the currently running preempt_disable() sections no?
> 
> Sure. But (again, this is only my feeling so far) given that get_online_cpus_atomic()
> does cli/sti,

Ah, that one! Actually, the only reason I do that cli/sti is because, potentially
interrupt handlers can be hotplug readers too. So we need to protect the portion
of the code of get_online_cpus_atomic() which is not re-entrant.
(Which reminds me to try and reduce the length of cli/sti in that code, if possible).

> this can help to implement ensure-the-readers-must-see-the-pending-writer.
> IOW this might help to implement sync-with-readers.
> 

2 problems:

1. It won't help with cases like this:

   preempt_disable()
	...
	preempt_disable()
		...
			<------- Here
		...
	preempt_enable()
	...
   preempt_enable()

If the IPI hits at the point marked above, the IPI is useless, because, at
that point, since we are already in a nested read-side critical section, we can't
switch the synchronization protocol. We need to wait till we start a fresh
non-nested read-side critical section, in order to switch to global rwlock.
The reason is that preempt_enable() or put_online_cpus_atomic() can only undo
what its predecessor (preempt_disable()/get_online_cpus_atomic()) did.

2. Part of the reason we want to get rid of stop_machine() is to avoid the
latency it induces on _all_ CPUs just to take *one* CPU offline. If we use
kick_all_cpus_sync(), we get into that territory again : we unfairly interrupt
every CPU, _even when_ that CPU's existing preempt_disabled() sections might
not actually be hotplug readers! (ie., not bothered about CPU Hotplug).

So, I think whatever synchronization scheme we develop, must not induce the very
same problems that stop_machine() had. Otherwise, we can end up going a full-circle
and coming back to the same stop_machine() scenario that we intended to get rid of.

(That's part of the reason why I initially tried to provide that _light() variant
of the reader APIs in the previous version of the patchset, so that light readers
can remain as undisturbed from cpu hotplug as possible, thereby avoiding indirectly
inducing the "stop_machine effect", like I mentioned here:
https://lkml.org/lkml/2012/12/5/369)

Regards,
Srivatsa S. Bhat

^ permalink raw reply

* Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Srivatsa S. Bhat @ 2012-12-10  4:28 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121209202234.GA5793@redhat.com>

On 12/10/2012 01:52 AM, Oleg Nesterov wrote:
> On 12/10, Srivatsa S. Bhat wrote:
>>
>> On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
>>
>>> But yes, it is easy to blame somebody else's code ;) And I can't suggest
>>> something better at least right now. If I understand correctly, we can not
>>> use, say, synchronize_sched() in _cpu_down() path
>>
>> We can't sleep in that code.. so that's a no-go.
> 
> But we can?
> 
> Note that I meant _cpu_down(), not get_online_cpus_atomic() or take_cpu_down().
> 

Maybe I'm missing something, but how would it help if we did a
synchronize_sched() so early (in _cpu_down())? Another bunch of preempt_disable()
sections could start immediately after our call to synchronize_sched() no?
How would we deal with that?

What we need to ensure here is that all existing preempt_disable() sections
are over and that *we* (meaning, the cpu offline writer) get to proceed immediately
after that, making all the new readers wait for us. And that is possible only if
we do our 'wait-for-readers' thing very close to our actual cpu offline operation
(which is take_cpu_down). Moreover, the writer needs to remain stable
(non-preemptible) while doing the wait-for-readers. Else (if the writer himself
keeps getting scheduled in and out of the CPU) I can't see how he can possibly
do justice to the wait.

Also, synchronize_sched() only helps us do the 'wait-for-existing-readers' logic.
What would take care of the 'prevent-new-readers-from-going-ahead' logic?

To add to it all, synchronize_sched() waits for _all_ preempt_disable() sections
to complete, whereas only a handful of them might actually care about CPU hotplug.
Which is an unnecessary burden for the writer (ie., waiting for unrelated readers
to complete).

I bet you meant something else. Sorry if I misunderstood it.

Regards,
Srivatsa S. Bhat

^ permalink raw reply

* Re: [RFC PATCH v3 7/9] yield_to(), cpu-hotplug: Prevent offlining of other CPUs properly
From: Srivatsa S. Bhat @ 2012-12-10  4:04 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121209204053.GA6593@redhat.com>

On 12/10/2012 02:10 AM, Oleg Nesterov wrote:
> On 12/10, Srivatsa S. Bhat wrote:
>>
>> On 12/10/2012 01:18 AM, Oleg Nesterov wrote:
>>>> -		if (preempt && rq != p_rq)
>>>> +		if (preempt && rq != p_rq && cpu_online(task_cpu(p)))
>>>
>>> Why do we need this change?
>>>
>>> Afaics, you could add BUG_ON(!cpu_online(...)) instead?
>>>
>>> I am just curious.
>>>
>>
>> Oh, I think that's a remnant of v1 (which needed readers to use
>> cpu_online_stable()). You're right, we don't need it.
> 
> Ah OK, thanks.
> 
>> Or we could put a
>> BUG_ON instead, like you suggested.
> 
> IMHO it would be better to simply drop this chunk.
> 

Sure, will drop it. Its distracting, if nothing else ;-)

Regards,
Srivatsa S. Bhat


^ permalink raw reply

* Re: [PATCH v9 06/10] ata: zpodd: check zero power ready status
From: Aaron Lu @ 2012-12-10  3:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: James Bottomley, Rafael J. Wysocki, linux-pm, Jeff Garzik,
	Alan Stern, Jeff Wu, linux-ide, linux-scsi, linux-acpi
In-Reply-To: <50C1891A.7040900@intel.com>

On 12/07/2012 02:13 PM, Aaron Lu wrote:
> On 12/04/2012 08:11 PM, James Bottomley wrote:
>> On Mon, 2012-12-03 at 08:23 -0800, Tejun Heo wrote:
>>> Hello, James.
>>>
>>> On Mon, Dec 03, 2012 at 08:25:43AM +0000, James Bottomley wrote:
>>>>> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
>>>>> index e65c62e..1756151 100644
>>>>> --- a/include/scsi/scsi_device.h
>>>>> +++ b/include/scsi/scsi_device.h
>>>>> @@ -160,6 +160,7 @@ struct scsi_device {
>>>>>  	unsigned can_power_off:1; /* Device supports runtime power off */
>>>>>  	unsigned wce_default_on:1;	/* Cache is ON by default */
>>>>>  	unsigned no_dif:1;	/* T10 PI (DIF) should be disabled */
>>>>> +	unsigned event_driven:1; /* No need to poll the device */
>>>>>  
>>>>>  	DECLARE_BITMAP(supported_events, SDEV_EVT_MAXBITS); /* supported events */
>>>>>  	struct list_head event_list;	/* asserted events */
>>>>
>>>> Yes, but if we can get away with doing that, it should be in genhd
>>>> because it's completely generic.
>>>>
>>>> I was imagining we'd have to fake the reply to the test unit ready or
>>>> some other commands, which is why it would need to be in sr.c
>>>>
>>>> The check events code is Tejun's baby, if he's OK with it then just do
>>>> it in genhd.c
>>>
>>> The problem here is there's no easy to reach genhd from libata (or the
>>> other way around) without going through sr.  I think we're gonna have
>>> to have something in sr one way or the other.
>>
>> Can't we do that via an event? It's a bit clunky because we need the
>> callback in the layer that sees the sdev, which is libata-scsi, we just
>> need an analogue of ata_scsi_media_change_notify, but ignoring and
>> allowing polling is essentially event driven as well, so it should all
>> work.  We'll need a listener in genhd, which might be trickier.
> 
> Hi Tejun,
> 
> Do you have any comments on James' suggestion? I want to know your
> opinion, thanks.

Hi Tejun,

A colleague of mine reminded me that it's impolite to write something
like this, and here is my apology. I didn't mean to be rude, I'm sorry
for this if it made you feel uncomfortable.

If possible, can you please shed some light on this problem and James'
suggestion? I need your opinions to make progress, thanks.

-Aaron

> 
> Hi James,
> 
> Do you mean we start a thread in genhd that listen to events, so that
> some driver can send an event to genhd about if the device should be
> polled? I'm thinking where to store such information. If store it in
> struct disk_events, then we will need to know which gendisk is for
> the device, but how? Maybe by loop scan all gendisk's driverfs_dev?
> If store it in struct device, then we do not need to send the event but
> just set a flag in sturct device in libata, and let genhd check this
> flag when poll. So I don't quite understand this, can you please
> elaborate? Thanks.
> 
> Thanks,
> Aaron
> 
>>
>> This may also work as the more generic route for stuff where we can't
>> connect the bottom to the top of the stack (which looks like a problem
>> we'll begin wrestling with a lot now).
>>
>> James
>>
>>
> 


^ permalink raw reply

* Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Oleg Nesterov @ 2012-12-09 21:13 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <50C4EB79.5050203@linux.vnet.ibm.com>

Damn, sorry for noise. I missed this part...

On 12/10, Srivatsa S. Bhat wrote:
>
> On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
> > the latency. And I guess something like kick_all_cpus_sync() is "too heavy".
>
> I hadn't considered that. Thinking of it, I don't think it would help us..
> It won't get rid of the currently running preempt_disable() sections no?

Sure. But (again, this is only my feeling so far) given that get_online_cpus_atomic()
does cli/sti, this can help to implement ensure-the-readers-must-see-the-pending-writer.
IOW this might help to implement sync-with-readers.

Oleg.


^ permalink raw reply

* Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Oleg Nesterov @ 2012-12-09 20:57 UTC (permalink / raw)
  To: Srivatsa S. Bhat, rostedt
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121207173759.27305.84316.stgit@srivatsabhat.in.ibm.com>

On 12/07, Srivatsa S. Bhat wrote:
>
> 4. No deadlock possibilities
>
>    Per-cpu locking is not the way to go if we want to have relaxed rules
>    for lock-ordering. Because, we can end up in circular-locking dependencies
>    as explained in https://lkml.org/lkml/2012/12/6/290

OK, but this assumes that, contrary to what Steven said, read-write-read
deadlock is not possible when it comes to rwlock_t. So far I think this
is true and we can't deadlock. Steven?

However. If this is true, then compared to preempt_disable/stop_machine
livelock is possible. Probably this is fine, we have the same problem with
get_online_cpus(). But if we can accept this fact I feel we can simmplify
this somehow... Can't prove, only feel ;)

Oleg.

^ permalink raw reply

* Re: [RFC PATCH v3 7/9] yield_to(), cpu-hotplug: Prevent offlining of other CPUs properly
From: Oleg Nesterov @ 2012-12-09 20:40 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <50C4ED46.3070100@linux.vnet.ibm.com>

On 12/10, Srivatsa S. Bhat wrote:
>
> On 12/10/2012 01:18 AM, Oleg Nesterov wrote:
> >> -		if (preempt && rq != p_rq)
> >> +		if (preempt && rq != p_rq && cpu_online(task_cpu(p)))
> >
> > Why do we need this change?
> >
> > Afaics, you could add BUG_ON(!cpu_online(...)) instead?
> >
> > I am just curious.
> >
>
> Oh, I think that's a remnant of v1 (which needed readers to use
> cpu_online_stable()). You're right, we don't need it.

Ah OK, thanks.

> Or we could put a
> BUG_ON instead, like you suggested.

IMHO it would be better to simply drop this chunk.

Oleg.

^ permalink raw reply

* Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Oleg Nesterov @ 2012-12-09 20:22 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <50C4EB79.5050203@linux.vnet.ibm.com>

On 12/10, Srivatsa S. Bhat wrote:
>
> On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
>
> > But yes, it is easy to blame somebody else's code ;) And I can't suggest
> > something better at least right now. If I understand correctly, we can not
> > use, say, synchronize_sched() in _cpu_down() path
>
> We can't sleep in that code.. so that's a no-go.

But we can?

Note that I meant _cpu_down(), not get_online_cpus_atomic() or take_cpu_down().

Oleg.


^ permalink raw reply

* Re: [RFC PATCH v3 7/9] yield_to(), cpu-hotplug: Prevent offlining of other CPUs properly
From: Srivatsa S. Bhat @ 2012-12-09 19:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121209194826.GB2816@redhat.com>

On 12/10/2012 01:18 AM, Oleg Nesterov wrote:
> On 12/07, Srivatsa S. Bhat wrote:
>>
>> Once stop_machine() is gone from the CPU offline path, we won't be able to
>> depend on local_irq_save() to prevent CPUs from going offline from under us.
> 
> OK, I guess we need to avoid resched_task()->smp_send_reschedule()
> after __cpu_disable() and before migrate_tasks().
> 

Yes.

> But, whatever problem we have,
> 
>> Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
>> while invoking from atomic context.
> 
> it should be solved, so...
> 
>> -		if (preempt && rq != p_rq)
>> +		if (preempt && rq != p_rq && cpu_online(task_cpu(p)))
> 
> Why do we need this change?
> 
> Afaics, you could add BUG_ON(!cpu_online(...)) instead?
> 
> I am just curious.
>

Oh, I think that's a remnant of v1 (which needed readers to use
cpu_online_stable()). You're right, we don't need it. Or we could put a
BUG_ON instead, like you suggested.

Regards,
Srivatsa S. Bhat


^ permalink raw reply

* Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Srivatsa S. Bhat @ 2012-12-09 19:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121209191437.GA2816@redhat.com>

On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
> On 12/07, Srivatsa S. Bhat wrote:
>>
>> Per-cpu counters can help solve the cache-line bouncing problem. So we
>> actually use the best of both: per-cpu counters (no-waiting) at the reader
>> side in the fast-path, and global rwlocks in the slowpath.
>>
>> [ Fastpath = no writer is active; Slowpath = a writer is active ]
>>
>> IOW, the hotplug readers just increment/decrement their per-cpu refcounts
>> when no writer is active.
> 
> Plus LOCK and cli/sti. I do not pretend I really know how bad this is
> performance-wise though. And at first glance this look overcomplicated.
>

Hehe, I agree ;-) But I couldn't think of any other way to get rid of the
deadlock possibilities, other than using global rwlocks. So I designed a
way in which we can switch between per-cpu counters and global rwlocks
dynamically. Probably there is a smarter way to achieve what we want, dunno...
 
> But yes, it is easy to blame somebody else's code ;) And I can't suggest
> something better at least right now. If I understand correctly, we can not
> use, say, synchronize_sched() in _cpu_down() path

We can't sleep in that code.. so that's a no-go.

>, you also want to improve
> the latency. And I guess something like kick_all_cpus_sync() is "too heavy".
> 

I hadn't considered that. Thinking of it, I don't think it would help us..
It won't get rid of the currently running preempt_disable() sections no?

> Also. After the quick reading this doesn't look correct, please see below.
> 
>> +void get_online_cpus_atomic(void)
>> +{
>> +	unsigned int cpu = smp_processor_id();
>> +	unsigned long flags;
>> +
>> +	preempt_disable();
>> +	local_irq_save(flags);
>> +
>> +	if (cpu_hotplug.active_writer == current)
>> +		goto out;
>> +
>> +	smp_rmb(); /* Paired with smp_wmb() in drop_writer_signal() */
>> +
>> +	if (likely(!writer_active(cpu))) {
> 
> WINDOW. Suppose that reader_active() == F.
> 
>> +		mark_reader_fastpath();
>> +		goto out;
> 
> Why take_cpu_down() can't do announce_cpu_offline_begin() + sync_all_readers()
> in between?
> 
> Looks like we should increment the counter first, then check writer_active().

You are right, I missed the above race-conditions.

> And sync_atomic_reader() needs rmb between 2 atomic_read's.
> 

OK.

> 
> Or. Again, suppose that reader_active() == F. But is_new_writer() == T.
> 
>> +	if (is_new_writer(cpu)) {
>> +		/*
>> +		 * ACK the writer's signal only if this is a fresh read-side
>> +		 * critical section, and not just an extension of a running
>> +		 * (nested) read-side critical section.
>> +		 */
>> +		if (!reader_active(cpu)) {
>> +			ack_writer_signal();
> 
> What if take_cpu_down() does announce_cpu_offline_end() right before
> ack_writer_signal() ? In this case get_online_cpus_atomic() returns
> with writer_signal == -1. If nothing else this breaks the next
> raise_writer_signal().
> 

Oh, yes, this one is wrong too! We need to mark ourselves as active
reader right in the beginning. And probably change the check to
"reader_nested()" or something like that.

Thanks a lot Oleg!
Regards,
Srivatsa S. Bhat


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox