Linux Power Management development
 help / color / mirror / Atom feed
* Re: [PATCH] cpuidle - fix lock contention in the idle path
From: Rafael J. Wysocki @ 2013-01-05  0:03 UTC (permalink / raw)
  To: Daniel Lezcano, Russ Anderson
  Cc: linux-pm, pdeschrijver, akpm, linux-kernel, rja
In-Reply-To: <50E6764C.9060608@linaro.org>

On Friday, January 04, 2013 07:27:24 AM Daniel Lezcano wrote:
> On 01/02/2013 10:13 PM, Russ Anderson wrote:
> > On Wed, Dec 26, 2012 at 11:01:48AM +0100, Daniel Lezcano wrote:
> >> The commit bf4d1b5ddb78f86078ac6ae0415802d5f0c68f92 introduces
> >> a lock in the cpuidle_get_cpu_driver function. This function
> >> is used in the idle_call function.
> >>
> >> The problem is the contention with a large number of cpus because
> >> they try to access the idle routine at the same time.
> >>
> >> The lock could be safely removed because of how is used the
> >> cpuidle api. The cpuidle_register_driver is called first but
> >> until the cpuidle_register_device is not called we don't
> >> enter in the cpuidle idle call function because the device
> >> is not enabled.
> >>
> >> The cpuidle_unregister_driver function, leading the a NULL driver,
> >> is not called before the cpuidle_unregister_device.
> >>
> >> This is how is used the cpuidle api from the different drivers.
> >>
> >> However, a cleanup around the lock and a proper refcounting
> >> mechanism should be used to ensure the consistency in the api,
> >> like cpuidle_unregister_driver should failed if its refcounting
> >> is not 0.
> >>
> >> These modifications will need some code reorganization and rewrite
> >> which does not fit with a fix.
> > 
> > I agree.
> > 
> >> The following patch is a hot fix by returning to the initial behavior
> >> by removing the lock when getting the driver.
> > 
> > The patch fixes the problem.  Verified on a system with 1024 cpus.
> > Thanks.
> > 
> >> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> > 
> > Reported-by: Russ Anderson <rja@sgi.com>
> > Acked-by: Russ Anderson <rja@sgi.com>
> 
> Hi Rafael,
> 
> could you consider this patch for merging ?

Yes, I've taken it already.

I'll include it into the next pull request.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: drivers/acpi/processor_driver.c:351:2: error: 'else' without a previous 'if'
From: Rafael J. Wysocki @ 2013-01-04 23:49 UTC (permalink / raw)
  To: kbuild test robot; +Cc: Bob Moore, linux-pm, Lv Zheng, Rafael J. Wysocki
In-Reply-To: <50e75f94.QueKkJ7Ms/cbp/Qu%fengguang.wu@intel.com>

On Saturday, January 05, 2013 07:02:44 AM kbuild test robot wrote:
> tree:   git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
> head:   279355c3da8cc9b3ce41865fddfa1cedc9625bf9
> commit: 2e6bce9096cf01e4b57e1d1ae7574a2bd986dd8e ACPICA: DEBUG_PRINT macros: Update to improve performance.
> date:   42 minutes ago
> config: make ARCH=x86_64 allmodconfig
> 
> All error/warnings:
> 
> drivers/acpi/processor_driver.c: In function 'acpi_processor_get_info':
> drivers/acpi/processor_driver.c:351:2: error: 'else' without a previous 'if'
> drivers/acpi/processor_driver.c: In function 'is_processor_present':
> drivers/acpi/processor_driver.c:674:2: error: 'else' without a previous 'if'

Thanks for the report.

Resetted the acpica-next branch to "ACPICA: Update version to 20121114."

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* drivers/acpi/processor_driver.c:351:2: error: 'else' without a previous 'if'
From: kbuild test robot @ 2013-01-04 23:02 UTC (permalink / raw)
  To: Bob Moore; +Cc: linux-pm, Lv Zheng, Rafael J. Wysocki

tree:   git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
head:   279355c3da8cc9b3ce41865fddfa1cedc9625bf9
commit: 2e6bce9096cf01e4b57e1d1ae7574a2bd986dd8e ACPICA: DEBUG_PRINT macros: Update to improve performance.
date:   42 minutes ago
config: make ARCH=x86_64 allmodconfig

All error/warnings:

drivers/acpi/processor_driver.c: In function 'acpi_processor_get_info':
drivers/acpi/processor_driver.c:351:2: error: 'else' without a previous 'if'
drivers/acpi/processor_driver.c: In function 'is_processor_present':
drivers/acpi/processor_driver.c:674:2: error: 'else' without a previous 'if'

vim +351 drivers/acpi/processor_driver.c

4be44fcd drivers/acpi/processor_core.c   Len Brown           2005-08-05  335  	}
7a04b849 drivers/acpi/processor_core.c   Zhao Yakui          2009-06-24  336  	/*
7a04b849 drivers/acpi/processor_core.c   Zhao Yakui          2009-06-24  337  	 * On some boxes several processors use the same processor bus id.
7a04b849 drivers/acpi/processor_core.c   Zhao Yakui          2009-06-24  338  	 * But they are located in different scope. For example:
7a04b849 drivers/acpi/processor_core.c   Zhao Yakui          2009-06-24  339  	 * \_SB.SCK0.CPU0
7a04b849 drivers/acpi/processor_core.c   Zhao Yakui          2009-06-24  340  	 * \_SB.SCK1.CPU0
7a04b849 drivers/acpi/processor_core.c   Zhao Yakui          2009-06-24  341  	 * Rename the processor device bus id. And the new bus id will be
7a04b849 drivers/acpi/processor_core.c   Zhao Yakui          2009-06-24  342  	 * generated as the following format:
7a04b849 drivers/acpi/processor_core.c   Zhao Yakui          2009-06-24  343  	 * CPU+CPU ID.
7a04b849 drivers/acpi/processor_core.c   Zhao Yakui          2009-06-24  344  	 */
7a04b849 drivers/acpi/processor_core.c   Zhao Yakui          2009-06-24  345  	sprintf(acpi_device_bid(device), "CPU%X", pr->id);
^1da177e drivers/acpi/processor_core.c   Linus Torvalds      2005-04-16  346  	ACPI_DEBUG_PRINT((ACPI_DB_INFO, "Processor [%d:%d]\n", pr->id,
4be44fcd drivers/acpi/processor_core.c   Len Brown           2005-08-05  347  			  pr->acpi_id));
^1da177e drivers/acpi/processor_core.c   Linus Torvalds      2005-04-16  348  
^1da177e drivers/acpi/processor_core.c   Linus Torvalds      2005-04-16  349  	if (!object.processor.pblk_address)
^1da177e drivers/acpi/processor_core.c   Linus Torvalds      2005-04-16  350  		ACPI_DEBUG_PRINT((ACPI_DB_INFO, "No PBLK (NULL address)\n"));
^1da177e drivers/acpi/processor_core.c   Linus Torvalds      2005-04-16 @351  	else if (object.processor.pblk_length != 6)
47db4547 drivers/acpi/processor_driver.c Toshi Kani          2012-11-20  352  		dev_err(&device->dev, "Invalid PBLK length [%d]\n",
6468463a drivers/acpi/processor_core.c   Len Brown           2006-06-26  353  			    object.processor.pblk_length);
^1da177e drivers/acpi/processor_core.c   Linus Torvalds      2005-04-16  354  	else {
^1da177e drivers/acpi/processor_core.c   Linus Torvalds      2005-04-16  355  		pr->throttling.address = object.processor.pblk_address;
cee324b1 drivers/acpi/processor_core.c   Alexey Starikovskiy 2007-02-02  356  		pr->throttling.duty_offset = acpi_gbl_FADT.duty_offset;
cee324b1 drivers/acpi/processor_core.c   Alexey Starikovskiy 2007-02-02  357  		pr->throttling.duty_width = acpi_gbl_FADT.duty_width;
^1da177e drivers/acpi/processor_core.c   Linus Torvalds      2005-04-16  358  
^1da177e drivers/acpi/processor_core.c   Linus Torvalds      2005-04-16  359  		pr->pblk = object.processor.pblk_address;

---
0-DAY kernel build testing backend         Open Source Technology Center
Fengguang Wu, Yuanhan Liu                              Intel Corporation

^ permalink raw reply

* Re: [PATCH v7u1 31/31] x86, 64bit, mm: hibernate use generic mapping_init
From: Rafael J. Wysocki @ 2013-01-04 22:07 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Pavel Machek, linux-pm
In-Reply-To: <CAE9FiQX6pPE9EbqRNCRvN+a_rSU6Zdt4QKa4hNGApyYfVYySzw@mail.gmail.com>

On Friday, January 04, 2013 01:59:33 PM Yinghai Lu wrote:
> On Fri, Jan 4, 2013 at 3:43 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Thursday, January 03, 2013 04:48:51 PM Yinghai Lu wrote:
> >> Make it only map range in pfn_mapped array.
> >
> > Can you please explain why that should be sufficient?
> 
> It is needed.
> 
> http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;h=66520ebc2df3fe52eb4792f8101fac573b766baf
> 
> Subject: [PATCH] x86, mm: Only direct map addresses that are marked as
>  E820_RAM
> 
> Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT )
> and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not
> backed by actual DRAM. This is fine for holes under 4GB which are covered
> by fixed and variable range MTRRs to be UC. However, we run into trouble
> on higher memory addresses which cannot be covered by MTRRs.
> 
> Our system with 1TB of RAM has an e820 that looks like this:
> 
>  BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable
>  BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved
>  BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved
>  BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable
>  BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data
>  BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS
>  BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved
>  BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
>  BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
>  BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
>  BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
>  BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
>  BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable
> 
> and so direct mappings are created for huge memory hole between
> 0x000000e038000000 to 0x0000010000000000. Even though the kernel never
> generates memory accesses in that region, since the page tables mark
> them incorrectly as being WB, our (AMD) processor ends up causing a MCE
> while doing some memory bookkeeping/optimizations around that area.
> 
> This patch iterates through e820 and only direct maps ranges that are
> marked as E820_RAM, and keeps track of those pfn ranges. Depending on
> the alignment of E820 ranges, this may possibly result in using smaller
> size (i.e. 4K instead of 2M or 1G) page tables.
> 
> >
> > Have you tested it?
> >
> 
> No
> 
> will update to
> 
> Subject: [PATCH] x86, 64bit, mm: hibernate use generic mapping_init
> 
> We should not set mapping for all under max_pfn.
> That causes same problem that is fixed by
>         x86, mm: Only direct map addresses that are marked as E820_RAM
> 
> Make it only map range in pfn_mapped array.

OK

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [PATCH v7u1 31/31] x86, 64bit, mm: hibernate use generic mapping_init
From: Yinghai Lu @ 2013-01-04 21:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Pavel Machek, linux-pm
In-Reply-To: <61708402.lBGARq0J34@vostro.rjw.lan>

On Fri, Jan 4, 2013 at 3:43 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Thursday, January 03, 2013 04:48:51 PM Yinghai Lu wrote:
>> Make it only map range in pfn_mapped array.
>
> Can you please explain why that should be sufficient?

It is needed.

http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;h=66520ebc2df3fe52eb4792f8101fac573b766baf

Subject: [PATCH] x86, mm: Only direct map addresses that are marked as
 E820_RAM

Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT )
and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not
backed by actual DRAM. This is fine for holes under 4GB which are covered
by fixed and variable range MTRRs to be UC. However, we run into trouble
on higher memory addresses which cannot be covered by MTRRs.

Our system with 1TB of RAM has an e820 that looks like this:

 BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable
 BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved
 BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved
 BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable
 BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data
 BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS
 BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved
 BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
 BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
 BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
 BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
 BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
 BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable

and so direct mappings are created for huge memory hole between
0x000000e038000000 to 0x0000010000000000. Even though the kernel never
generates memory accesses in that region, since the page tables mark
them incorrectly as being WB, our (AMD) processor ends up causing a MCE
while doing some memory bookkeeping/optimizations around that area.

This patch iterates through e820 and only direct maps ranges that are
marked as E820_RAM, and keeps track of those pfn ranges. Depending on
the alignment of E820 ranges, this may possibly result in using smaller
size (i.e. 4K instead of 2M or 1G) page tables.

>
> Have you tested it?
>

No

will update to

Subject: [PATCH] x86, 64bit, mm: hibernate use generic mapping_init

We should not set mapping for all under max_pfn.
That causes same problem that is fixed by
        x86, mm: Only direct map addresses that are marked as E820_RAM

Make it only map range in pfn_mapped array.

^ permalink raw reply

* [PATCH v6 2/3] x86/nmi: export local_touch_nmi() symbol for modules
From: Jacob Pan @ 2013-01-04 11:12 UTC (permalink / raw)
  To: Linux PM, LKML
  Cc: Peter Zijlstra, Rafael Wysocki, Len Brown, Thomas Gleixner,
	H. Peter Anvin, Ingo Molnar, Zhang Rui, Joe Perches, Rob Landley,
	Arjan van de Ven, Paul McKenney, Jacob Pan
In-Reply-To: <1357297965-17839-1-git-send-email-jacob.jun.pan@linux.intel.com>

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 arch/x86/kernel/nmi.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index f84f5c5..6030805 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -509,3 +509,4 @@ void local_touch_nmi(void)
 {
 	__this_cpu_write(last_nmi_rip, 0);
 }
+EXPORT_SYMBOL_GPL(local_touch_nmi);
-- 
1.7.9.5


^ permalink raw reply related

* [PATCH v6 1/3] tick: export nohz tick idle symbols for module use
From: Jacob Pan @ 2013-01-04 11:12 UTC (permalink / raw)
  To: Linux PM, LKML
  Cc: Peter Zijlstra, Rafael Wysocki, Len Brown, Thomas Gleixner,
	H. Peter Anvin, Ingo Molnar, Zhang Rui, Joe Perches, Rob Landley,
	Arjan van de Ven, Paul McKenney, Jacob Pan
In-Reply-To: <1357297965-17839-1-git-send-email-jacob.jun.pan@linux.intel.com>

Allow drivers such as intel_powerclamp to use these apis for
turning on/off ticks during idle.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 kernel/time/tick-sched.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d58e552..a767757 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -553,6 +553,7 @@ void tick_nohz_idle_enter(void)
 
 	local_irq_enable();
 }
+EXPORT_SYMBOL_GPL(tick_nohz_idle_enter);
 
 /**
  * tick_nohz_irq_exit - update next tick event from interrupt exit
@@ -681,6 +682,7 @@ void tick_nohz_idle_exit(void)
 
 	local_irq_enable();
 }
+EXPORT_SYMBOL_GPL(tick_nohz_idle_exit);
 
 static int tick_nohz_reprogram(struct tick_sched *ts, ktime_t now)
 {
-- 
1.7.9.5


^ permalink raw reply related

* Re: [PATCH v5 3/3] PM: Introduce Intel PowerClamp Driver
From: jacob pan @ 2013-01-04 16:51 UTC (permalink / raw)
  To: Joe Perches
  Cc: Linux PM, LKML, Rafael Wysocki, Len Brown, Thomas Gleixner,
	H. Peter Anvin, Ingo Molnar, Zhang Rui, Rob Landley,
	Arjan van de Ven, Paul McKenney
In-Reply-To: <1357271464.5452.31.camel@joe-AO722>

On Thu, 03 Jan 2013 19:51:04 -0800
Joe Perches <joe@perches.com> wrote:

> On Thu, 2013-01-03 at 07:10 -0800, Jacob Pan wrote:
> > Intel PowerClamp driver performs synchronized idle injection across
> > all online CPUs. The goal is to maintain a given package level
> > C-state ratio.
> 
> []
> 
> > +static int window_size_set(const char *arg, const struct
> > kernel_param *kp) +{
> > +	int ret = 0;
> > +	unsigned long new_window_size;
> > +
> > +	ret = kstrtoul(arg, 10, &new_window_size);
> > +	if (ret)
> > +		goto exit_win;
> > +	if (new_window_size > 10 || new_window_size < 2) {
> > +		pr_err("Invalid window size %lu, between 2-10\n",
> > +			new_window_size);
> > +		ret = -EINVAL;
> > +	}
> > +
> > +	window_size = new_window_size;
> 
> Possible assignment of known invalid windows size?
> Maybe you should add
> 	goto exit;
> after
> 	ret = -EINVAL;
> 
> or add
> 	new_window_size = clamp(new_window_size, 2ul, 10ul);
Good catch. The window size range 2-10 is somewhat arbitrary, greater
than 10 should also work just not recommended. I will reword that. But
it is good to clamp it as you suggested, i will do that for the
duration parameter also.

Thanks,

Jacob

^ permalink raw reply

* [PATCH 2/4 v10] clk, highbank: Prevent glitches in non-bypass reset mode
From: Mark Langsdorf @ 2013-01-04 16:35 UTC (permalink / raw)
  To: linux-kernel, cpufreq, linux-pm, linux-arm-kernel
  Cc: Mark Langsdorf, Rob Herring
In-Reply-To: <1357317346-1120-1-git-send-email-mark.langsdorf@calxeda.com>

The highbank clock will glitch with the current code if the
clock rate is reset without relocking the PLL. Program the PLL
correctly to prevent glitches.

Signed-off-by: Mark Langsdorf <mark.langsdorf@calxeda.com>
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Mike Turquette <mturquette@linaro.org>
---
Changes from v6, v7, v8, v9
        None.
Changes from v5
        Added Mike Turquette's ack.
Changes from v4
        None.
Changes from v3
        Changelog text and patch name now correspond to the actual patch.
        was clk, highbank: remove non-bypass reset mode.
Changes from v2
        None.
Changes from v1
        Removed erroneous reformating.

 drivers/clk/clk-highbank.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/clk/clk-highbank.c b/drivers/clk/clk-highbank.c
index 52fecad..3a0b723 100644
--- a/drivers/clk/clk-highbank.c
+++ b/drivers/clk/clk-highbank.c
@@ -182,8 +182,10 @@ static int clk_pll_set_rate(struct clk_hw *hwclk, unsigned long rate,
 		reg |= HB_PLL_EXT_ENA;
 		reg &= ~HB_PLL_EXT_BYPASS;
 	} else {
+		writel(reg | HB_PLL_EXT_BYPASS, hbclk->reg);
 		reg &= ~HB_PLL_DIVQ_MASK;
 		reg |= divq << HB_PLL_DIVQ_SHIFT;
+		writel(reg | HB_PLL_EXT_BYPASS, hbclk->reg);
 	}
 	writel(reg, hbclk->reg);
 
-- 
1.8.0.2

^ permalink raw reply related

* [PATCH 0/4 v10] cpufreq: add support for Calxeda ECX-1000 (highbank)
From: Mark Langsdorf @ 2013-01-04 16:35 UTC (permalink / raw)
  To: linux-kernel, cpufreq, linux-pm, linux-arm-kernel
In-Reply-To: <1351631056-25938-1-git-send-email-mark.langsdorf@calxeda.com>

This patch series adds cpufreq support for the Calxeda
ECX-1000 (highbank) SoCs. The EnergyCore Management Engine (ECME) on
the ECX-1000 manages the voltage for the part and communications with
Linux through a pl320 mailbox. clk notifications are used to control
when to send messages to the ECME.

Previous versions of this patch set include two other patches. One has
been dropped as unworkable and the other got picked up and included in
3.8.0.

--Mark Langsdorf
Calxeda, Inc.


^ permalink raw reply

* [PATCH 3/4 v10] arm highbank: add support for pl320 IPC
From: Mark Langsdorf @ 2013-01-04 16:35 UTC (permalink / raw)
  To: linux-kernel, cpufreq, linux-pm, linux-arm-kernel
  Cc: Mark Langsdorf, Rob Herring, Omar Ramirez Luna, Arnd Bergmann
In-Reply-To: <1357317346-1120-1-git-send-email-mark.langsdorf@calxeda.com>

From: Rob Herring <rob.herring@calxeda.com>

The pl320 IPC allows for interprocessor communication between the highbank A9
and the EnergyCore Management Engine. The pl320 implements a straightforward
mailbox protocol.

This patch depends on Omar Ramirez Luna's <omar.luna@linaro.org>
mailbox driver patch series.

Signed-off-by: Mark Langsdorf <mark.langsdorf@calxeda.com>
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Cc: Omar Ramirez Luna <omar.luna@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
---
Changes from v9
	Used to be the 4th patch in the series.
Changes from v6, v7, v8
        None.
Changes from v5
        Renamed ipc_transmit() to pl320_ipc_transmit().
        Properly exported pl320_ipc_{un}register_notifier().
Changes from v4
        Moved pl320-ipc.c from arch/arm/mach-highbank to drivers/mailbox.
        Moved header information to include/linux/mailbox.h.
        Added Kconfig options to reflect the new code location.
        Change drivers/mailbox/Makefile to build the omap mailboxes only 
        when they are configured.
        Removed ipc_call_fast and renamed ipc_call_slow ipc_transmit.
Changes from v3, v2
        None.
Changes from v1
        Removed erroneous changes for cpufreq Kconfig.

 arch/arm/mach-highbank/Kconfig |   2 +
 drivers/mailbox/Kconfig        |   9 ++
 drivers/mailbox/Makefile       |   6 +-
 drivers/mailbox/Makefile.rej   |   7 ++
 drivers/mailbox/pl320-ipc.c    | 199 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 220 insertions(+), 3 deletions(-)
 create mode 100644 drivers/mailbox/Makefile.rej
 create mode 100644 drivers/mailbox/pl320-ipc.c

diff --git a/arch/arm/mach-highbank/Kconfig b/arch/arm/mach-highbank/Kconfig
index 551c97e..2388085 100644
--- a/arch/arm/mach-highbank/Kconfig
+++ b/arch/arm/mach-highbank/Kconfig
@@ -11,5 +11,7 @@ config ARCH_HIGHBANK
 	select GENERIC_CLOCKEVENTS
 	select HAVE_ARM_SCU
 	select HAVE_SMP
+	select MAILBOX
+	select PL320_MBOX
 	select SPARSE_IRQ
 	select USE_OF
diff --git a/drivers/mailbox/Kconfig b/drivers/mailbox/Kconfig
index be8cac0..e89fdb4 100644
--- a/drivers/mailbox/Kconfig
+++ b/drivers/mailbox/Kconfig
@@ -34,4 +34,13 @@ config OMAP_MBOX_KFIFO_SIZE
 	  This can also be changed at runtime (via the mbox_kfifo_size
 	  module parameter).
 
+config PL320_MBOX
+	bool "ARM PL320 Mailbox"
+	help
+	  An implementation of the ARM PL320 Interprocessor Communication
+	  Mailbox (IPCM), tailored for the Calxeda Highbank. It is used to
+	  send short messages between Highbank's A9 cores and the EnergyCore
+	  Management Engine, primarily for cpufreq. Say Y here if you want
+	  to use the PL320 IPCM support.
+
 endif
diff --git a/drivers/mailbox/Makefile b/drivers/mailbox/Makefile
index fa71fab..dc3fbc5 100644
--- a/drivers/mailbox/Makefile
+++ b/drivers/mailbox/Makefile
@@ -1,4 +1,4 @@
-obj-$(CONFIG_MAILBOX) += mailbox.o
+obj-$(CONFIG_OMAP1_MBOX)	+= mailbox-omap1.o mailbox.o
+obj-$(CONFIG_OMAP2PLUS_MBOX)	+= mailbox-omap2.o mailbox.o
+obj-$(CONFIG_PL320_MBOX)	+= pl320-ipc.o
 
-obj-$(CONFIG_OMAP1_MBOX)	+= mailbox-omap1.o
-obj-$(CONFIG_OMAP2PLUS_MBOX)	+= mailbox-omap2.o
diff --git a/drivers/mailbox/Makefile.rej b/drivers/mailbox/Makefile.rej
new file mode 100644
index 0000000..62ade60
--- /dev/null
+++ b/drivers/mailbox/Makefile.rej
@@ -0,0 +1,7 @@
+--- /dev/null
++++ drivers/mailbox/Makefile
+@@ -0,0 +1,4 @@
++obj-$(CONFIG_OMAP1_MBOX)	+= mailbox.o mailbox-omap1.o
++obj-$(CONFIG_OMAP2PLUS_MBOX)	+= mailbox.o mailbox-omap2.o
++obj-$(CONFIG_PL320_MBOX)	+= pl320-ipc.o
++
diff --git a/drivers/mailbox/pl320-ipc.c b/drivers/mailbox/pl320-ipc.c
new file mode 100644
index 0000000..1a9d8e4
--- /dev/null
+++ b/drivers/mailbox/pl320-ipc.c
@@ -0,0 +1,199 @@
+/*
+ * Copyright 2012 Calxeda, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#include <linux/types.h>
+#include <linux/err.h>
+#include <linux/delay.h>
+#include <linux/export.h>
+#include <linux/io.h>
+#include <linux/interrupt.h>
+#include <linux/completion.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/spinlock.h>
+#include <linux/device.h>
+#include <linux/amba/bus.h>
+
+#include <linux/mailbox.h>
+
+#define IPCMxSOURCE(m)		((m) * 0x40)
+#define IPCMxDSET(m)		(((m) * 0x40) + 0x004)
+#define IPCMxDCLEAR(m)		(((m) * 0x40) + 0x008)
+#define IPCMxDSTATUS(m)		(((m) * 0x40) + 0x00C)
+#define IPCMxMODE(m)		(((m) * 0x40) + 0x010)
+#define IPCMxMSET(m)		(((m) * 0x40) + 0x014)
+#define IPCMxMCLEAR(m)		(((m) * 0x40) + 0x018)
+#define IPCMxMSTATUS(m)		(((m) * 0x40) + 0x01C)
+#define IPCMxSEND(m)		(((m) * 0x40) + 0x020)
+#define IPCMxDR(m, dr)		(((m) * 0x40) + ((dr) * 4) + 0x024)
+
+#define IPCMMIS(irq)		(((irq) * 8) + 0x800)
+#define IPCMRIS(irq)		(((irq) * 8) + 0x804)
+
+#define MBOX_MASK(n)		(1 << (n))
+#define IPC_TX_MBOX		1
+#define IPC_RX_MBOX		2
+
+#define CHAN_MASK(n)		(1 << (n))
+#define A9_SOURCE		1
+#define M3_SOURCE		0
+
+static void __iomem *ipc_base;
+static int ipc_irq;
+static DEFINE_MUTEX(ipc_m1_lock);
+static DECLARE_COMPLETION(ipc_completion);
+static ATOMIC_NOTIFIER_HEAD(ipc_notifier);
+
+static inline void set_destination(int source, int mbox)
+{
+	__raw_writel(CHAN_MASK(source), ipc_base + IPCMxDSET(mbox));
+	__raw_writel(CHAN_MASK(source), ipc_base + IPCMxMSET(mbox));
+}
+
+static inline void clear_destination(int source, int mbox)
+{
+	__raw_writel(CHAN_MASK(source), ipc_base + IPCMxDCLEAR(mbox));
+	__raw_writel(CHAN_MASK(source), ipc_base + IPCMxMCLEAR(mbox));
+}
+
+static void __ipc_send(int mbox, u32 *data)
+{
+	int i;
+	for (i = 0; i < 7; i++)
+		__raw_writel(data[i], ipc_base + IPCMxDR(mbox, i));
+	__raw_writel(0x1, ipc_base + IPCMxSEND(mbox));
+}
+
+static u32 __ipc_rcv(int mbox, u32 *data)
+{
+	int i;
+	for (i = 0; i < 7; i++)
+		data[i] = __raw_readl(ipc_base + IPCMxDR(mbox, i));
+	return data[1];
+}
+
+/* blocking implmentation from the A9 side, not usuable in interrupts! */
+int pl320_ipc_transmit(u32 *data)
+{
+	int ret;
+
+	mutex_lock(&ipc_m1_lock);
+
+	init_completion(&ipc_completion);
+	__ipc_send(IPC_TX_MBOX, data);
+	ret = wait_for_completion_timeout(&ipc_completion,
+					  msecs_to_jiffies(1000));
+	if (ret == 0) {
+		ret = -ETIMEDOUT;
+		goto out;
+	}
+
+	ret = __ipc_rcv(IPC_TX_MBOX, data);
+out:
+	mutex_unlock(&ipc_m1_lock);
+	return ret;
+}
+EXPORT_SYMBOL(pl320_ipc_transmit);
+
+irqreturn_t ipc_handler(int irq, void *dev)
+{
+	u32 irq_stat;
+	u32 data[7];
+
+	irq_stat = __raw_readl(ipc_base + IPCMMIS(1));
+	if (irq_stat & MBOX_MASK(IPC_TX_MBOX)) {
+		__raw_writel(0, ipc_base + IPCMxSEND(IPC_TX_MBOX));
+		complete(&ipc_completion);
+	}
+	if (irq_stat & MBOX_MASK(IPC_RX_MBOX)) {
+		__ipc_rcv(IPC_RX_MBOX, data);
+		atomic_notifier_call_chain(&ipc_notifier, data[0], data + 1);
+		__raw_writel(2, ipc_base + IPCMxSEND(IPC_RX_MBOX));
+	}
+
+	return IRQ_HANDLED;
+}
+
+int pl320_ipc_register_notifier(struct notifier_block *nb)
+{
+	return atomic_notifier_chain_register(&ipc_notifier, nb);
+}
+EXPORT_SYMBOL(pl320_ipc_register_notifier);
+
+int pl320_ipc_unregister_notifier(struct notifier_block *nb)
+{
+	return atomic_notifier_chain_unregister(&ipc_notifier, nb);
+}
+EXPORT_SYMBOL(pl320_ipc_unregister_notifier);
+
+static int __devinit pl320_probe(struct amba_device *adev,
+				const struct amba_id *id)
+{
+	int ret;
+
+	ipc_base = ioremap(adev->res.start, resource_size(&adev->res));
+	if (ipc_base == NULL)
+		return -ENOMEM;
+
+	__raw_writel(0, ipc_base + IPCMxSEND(IPC_TX_MBOX));
+
+	ipc_irq = adev->irq[0];
+	ret = request_irq(ipc_irq, ipc_handler, 0, dev_name(&adev->dev), NULL);
+	if (ret < 0)
+		goto err;
+
+	/* Init slow mailbox */
+	__raw_writel(CHAN_MASK(A9_SOURCE),
+			ipc_base + IPCMxSOURCE(IPC_TX_MBOX));
+	__raw_writel(CHAN_MASK(M3_SOURCE),
+			ipc_base + IPCMxDSET(IPC_TX_MBOX));
+	__raw_writel(CHAN_MASK(M3_SOURCE) | CHAN_MASK(A9_SOURCE),
+		     ipc_base + IPCMxMSET(IPC_TX_MBOX));
+
+	/* Init receive mailbox */
+	__raw_writel(CHAN_MASK(M3_SOURCE),
+			ipc_base + IPCMxSOURCE(IPC_RX_MBOX));
+	__raw_writel(CHAN_MASK(A9_SOURCE),
+			ipc_base + IPCMxDSET(IPC_RX_MBOX));
+	__raw_writel(CHAN_MASK(M3_SOURCE) | CHAN_MASK(A9_SOURCE),
+		     ipc_base + IPCMxMSET(IPC_RX_MBOX));
+
+	return 0;
+err:
+	iounmap(ipc_base);
+	return ret;
+}
+
+static struct amba_id pl320_ids[] = {
+	{
+		.id	= 0x00041320,
+		.mask	= 0x000fffff,
+	},
+	{ 0, 0 },
+};
+
+static struct amba_driver pl320_driver = {
+	.drv = {
+		.name	= "pl320",
+	},
+	.id_table	= pl320_ids,
+	.probe		= pl320_probe,
+};
+
+static int __init ipc_init(void)
+{
+	return amba_driver_register(&pl320_driver);
+}
+module_init(ipc_init);
-- 
1.8.0.2


^ permalink raw reply related

* [PATCH 4/4 v10] cpufreq, highbank: add support for highbank cpufreq
From: Mark Langsdorf @ 2013-01-04 16:35 UTC (permalink / raw)
  To: linux-kernel, cpufreq, linux-pm, linux-arm-kernel; +Cc: Mark Langsdorf
In-Reply-To: <1357317346-1120-1-git-send-email-mark.langsdorf@calxeda.com>

Highbank processors depend on the external ECME to perform voltage
management based on a requested frequency. Communication between the
A9 cores and the ECME happens over the pl320 IPC channel.

Signed-off-by: Mark Langsdorf <mark.langsdorf@calxeda.com>
Reviewed-by: Shawn Guo <shawn.guo@linaro.org>
Reviewed-by: Mike Turquette <mturquette@linaro.org>
---
Changes from v9
	Added Mike Turquette's reviewed by.
	Used to be the 6th patch in the series.
Changes from v8
        Added Shawn Guo's reviewed by.
        Removed some magic numbers.
        Changed failure returns in clk_notify from NOTIFY_STOP to NOTIFY_BAD.
Changes from v7
        Removed old attribution to cpufreq-cpu0.
        Added some description in the documentation.
        Made cpu_dev, cpu_clk into local variables.
        Removed __devinit.
        Removed some unneeded includes.
        Added a brace to clarify some nested if logic.
Changes from v6
        Removed devicetree bindings documentation.
        Restructured driver to use clk notifications.
        Core driver logic is now cpufreq-clk0.
Changes from v5
        Changed ipc_transmit() to pl320_ipc_transmit().
Changes from v4
        Removed erroneous changes to arch/arm/Kconfig.
        Removed unnecessary changes to drivers/cpufreq/Kconfig.arm
        Alphabetized additions to arch/arm/mach-highbank/Kconfig
        Changed ipc call and header to match new ipc location in 
        drivers/mailbox.
Changes from v3
        None.
Changes from v2
        Changed transition latency binding in code to match documentation.
Changes from v1
        Added highbank specific Kconfig changes.

 arch/arm/boot/dts/highbank.dts     |  10 ++++
 arch/arm/mach-highbank/Kconfig     |   2 +
 drivers/cpufreq/Kconfig.arm        |  15 +++++
 drivers/cpufreq/Makefile           |   1 +
 drivers/cpufreq/highbank-cpufreq.c | 109 +++++++++++++++++++++++++++++++++++++
 5 files changed, 137 insertions(+)
 create mode 100644 drivers/cpufreq/highbank-cpufreq.c

diff --git a/arch/arm/boot/dts/highbank.dts b/arch/arm/boot/dts/highbank.dts
index a9ae5d3..202f12e 100644
--- a/arch/arm/boot/dts/highbank.dts
+++ b/arch/arm/boot/dts/highbank.dts
@@ -36,6 +36,16 @@
 			next-level-cache = <&L2>;
 			clocks = <&a9pll>;
 			clock-names = "cpu";
+			operating-points = <
+				/* kHz    ignored */
+				 1300000  1000000
+				 1200000  1000000
+				 1100000  1000000
+				  800000  1000000
+				  400000  1000000
+				  200000  1000000
+			>;
+			clock-latency = <100000>;
 		};
 
 		cpu@1 {
diff --git a/arch/arm/mach-highbank/Kconfig b/arch/arm/mach-highbank/Kconfig
index 2388085..44b12f9 100644
--- a/arch/arm/mach-highbank/Kconfig
+++ b/arch/arm/mach-highbank/Kconfig
@@ -1,5 +1,7 @@
 config ARCH_HIGHBANK
 	bool "Calxeda ECX-1000/2000 (Highbank/Midway)" if ARCH_MULTI_V7
+	select ARCH_HAS_CPUFREQ
+	select ARCH_HAS_OPP
 	select ARCH_WANT_OPTIONAL_GPIOLIB
 	select ARM_AMBA
 	select ARM_GIC
diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
index a0b3661..7c71c1b 100644
--- a/drivers/cpufreq/Kconfig.arm
+++ b/drivers/cpufreq/Kconfig.arm
@@ -83,3 +83,18 @@ config ARM_SPEAR_CPUFREQ
 	default y
 	help
 	  This adds the CPUFreq driver support for SPEAr SOCs.
+
+config ARM_HIGHBANK_CPUFREQ
+	tristate "Calxeda Highbank-based"
+	depends on ARCH_HIGHBANK
+	select CPU_FREQ_TABLE
+	select GENERIC_CPUFREQ_CPU0
+	select PM_OPP
+	select REGULATOR
+
+	default m
+	help
+	   This adds the CPUFreq driver for Calxeda Highbank SoC
+	   based boards.
+
+	   If in doubt, say N.
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 1f254ec0..2f7ab0b 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -51,6 +51,7 @@ obj-$(CONFIG_ARM_EXYNOS4X12_CPUFREQ)	+= exynos4x12-cpufreq.o
 obj-$(CONFIG_ARM_EXYNOS5250_CPUFREQ)	+= exynos5250-cpufreq.o
 obj-$(CONFIG_ARM_OMAP2PLUS_CPUFREQ)     += omap-cpufreq.o
 obj-$(CONFIG_ARM_SPEAR_CPUFREQ)		+= spear-cpufreq.o
+obj-$(CONFIG_ARM_HIGHBANK_CPUFREQ)	+= highbank-cpufreq.o
 
 ##################################################################################
 # PowerPC platform drivers
diff --git a/drivers/cpufreq/highbank-cpufreq.c b/drivers/cpufreq/highbank-cpufreq.c
new file mode 100644
index 0000000..8c85608
--- /dev/null
+++ b/drivers/cpufreq/highbank-cpufreq.c
@@ -0,0 +1,109 @@
+/*
+ * Copyright (C) 2012 Calxeda, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This driver provides the clk notifier callbacks that are used when
+ * the cpufreq-cpu0 driver changes to frequency to alert the highbank
+ * EnergyCore Management Engine (ECME) about the need to change
+ * voltage. The ECME interfaces with the actual voltage regulators.
+ */
+
+#define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/clk.h>
+#include <linux/cpu.h>
+#include <linux/err.h>
+#include <linux/of.h>
+#include <linux/mailbox.h>
+
+#define HB_CPUFREQ_CHANGE_NOTE	0x80000001
+#define HB_CPUFREQ_IPC_LEN	7
+#define HB_CPUFREQ_VOLT_RETRIES	15
+
+static int hb_voltage_change(unsigned int freq)
+{
+	int i;
+	u32 msg[HB_CPUFREQ_IPC_LEN];
+
+	msg[0] = HB_CPUFREQ_CHANGE_NOTE;
+	msg[1] = freq / 1000000;
+	for (i = 2; i < HB_CPUFREQ_IPC_LEN; i++)
+		msg[i] = 0;
+
+	return pl320_ipc_transmit(msg);
+}
+
+static int hb_cpufreq_clk_notify(struct notifier_block *nb,
+				unsigned long action, void *hclk)
+{
+	struct clk_notifier_data *clk_data = hclk;
+	int i = 0;
+
+	if (action == PRE_RATE_CHANGE) {
+		if (clk_data->new_rate > clk_data->old_rate)
+			while (hb_voltage_change(clk_data->new_rate))
+				if (i++ > HB_CPUFREQ_VOLT_RETRIES)
+					return NOTIFY_BAD;
+	} else if (action == POST_RATE_CHANGE) {
+		if (clk_data->new_rate < clk_data->old_rate)
+			while (hb_voltage_change(clk_data->new_rate))
+				if (i++ > HB_CPUFREQ_VOLT_RETRIES)
+					return NOTIFY_BAD;
+	}
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block hb_cpufreq_clk_nb = {
+	.notifier_call = hb_cpufreq_clk_notify,
+};
+
+static int hb_cpufreq_driver_init(void)
+{
+	struct device *cpu_dev;
+	struct clk *cpu_clk;
+	struct device_node *np;
+	int ret;
+
+	np = of_find_node_by_path("/cpus/cpu@0");
+	if (!np) {
+		pr_err("failed to find highbank cpufreq node\n");
+		return -ENOENT;
+	}
+
+	cpu_dev = get_cpu_device(0);
+	if (!cpu_dev) {
+		pr_err("failed to get highbank cpufreq device\n");
+		ret = -ENODEV;
+		goto out_put_node;
+	}
+
+	cpu_dev->of_node = np;
+
+	cpu_clk = clk_get(cpu_dev, NULL);
+	if (IS_ERR(cpu_clk)) {
+		ret = PTR_ERR(cpu_clk);
+		pr_err("failed to get cpu0 clock: %d\n", ret);
+		goto out_put_node;
+	}
+
+	ret = clk_notifier_register(cpu_clk, &hb_cpufreq_clk_nb);
+	if (ret) {
+		pr_err("failed to register clk notifier: %d\n", ret);
+		goto out_put_node;
+	}
+
+out_put_node:
+	of_node_put(np);
+	return ret;
+}
+late_initcall(hb_cpufreq_driver_init);
+
+MODULE_AUTHOR("Mark Langsdorf <mark.langsdorf@calxeda.com>");
+MODULE_DESCRIPTION("Calxeda Highbank cpufreq driver");
+MODULE_LICENSE("GPL");
-- 
1.8.0.2


^ permalink raw reply related

* [PATCH 1/4 v10] arm: use devicetree to get smp_twd clock
From: Mark Langsdorf @ 2013-01-04 16:35 UTC (permalink / raw)
  To: linux-kernel, cpufreq, linux-pm, linux-arm-kernel
  Cc: Mark Langsdorf, Rob Herring
In-Reply-To: <1357317346-1120-1-git-send-email-mark.langsdorf@calxeda.com>

From: Rob Herring <rob.herring@calxeda.com>

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Mark Langsdorf <mark.langsdorf@calxeda.com>
---
Changes from v9
	Updated to work with 3.8 kernel.
Changes from v4, v5, v6, v7, v8
        None.
Changes from v3
        No longer setting *clk to NULL in twd_get_clock().
Changes from v2
        Turned the check for the node pointer into an if-then-else statement.
        Removed the second, redundant clk_get_rate.
Changes from v1
        None.

 arch/arm/kernel/smp_twd.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kernel/smp_twd.c b/arch/arm/kernel/smp_twd.c
index 49f335d..dad2d81 100644
--- a/arch/arm/kernel/smp_twd.c
+++ b/arch/arm/kernel/smp_twd.c
@@ -239,12 +239,15 @@ static irqreturn_t twd_handler(int irq, void *dev_id)
 	return IRQ_NONE;
 }
 
-static struct clk *twd_get_clock(void)
+static struct clk *twd_get_clock(struct device_node *np)
 {
 	struct clk *clk;
 	int err;
 
-	clk = clk_get_sys("smp_twd", NULL);
+	if (np)
+		clk = of_clk_get(np, 0);
+	else
+		clk = clk_get_sys("smp_twd", NULL);
 	if (IS_ERR(clk)) {
 		pr_err("smp_twd: clock not found: %d\n", (int)PTR_ERR(clk));
 		return clk;
@@ -257,6 +260,7 @@ static struct clk *twd_get_clock(void)
 		return ERR_PTR(err);
 	}
 
+	twd_timer_rate = clk_get_rate(clk);
 	return clk;
 }
 
@@ -285,7 +289,7 @@ static int __cpuinit twd_timer_setup(struct clock_event_device *clk)
 	 * during the runtime of the system.
 	 */
 	if (!common_setup_called) {
-		twd_clk = twd_get_clock();
+		twd_clk = twd_get_clock(NULL);
 
 		/*
 		 * We use IS_ERR_OR_NULL() here, because if the clock stubs
@@ -373,6 +377,8 @@ int __init twd_local_timer_register(struct twd_local_timer *tlt)
 	if (!twd_base)
 		return -ENOMEM;
 
+	twd_clk = twd_get_clock(NULL);
+
 	return twd_local_timer_common_register();
 }
 
@@ -405,6 +411,8 @@ void __init twd_local_timer_of_register(void)
 		goto out;
 	}
 
+	twd_clk = twd_get_clock(np);
+
 	err = twd_local_timer_common_register();
 
 out:
-- 
1.8.0.2


^ permalink raw reply related

* Re: [PATCH 1/2] thermal: Add support for thermal sensor for Orion SoC
From: Andrew Lunn @ 2013-01-04 15:35 UTC (permalink / raw)
  To: Eduardo Valentin
  Cc: linux ARM, iwamatsu, linux-pm, Thomas Petazzoni, jgunthorpe,
	Sebastian Hesselbarth, Jason Cooper
In-Reply-To: <50E6A378.90503@ti.com>

On 04/01/13 10:40, Eduardo Valentin wrote:
> Hey Andrew,
>
> On 14-12-2012 13:03, Andrew Lunn wrote:
>> From: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
>>
>> Some Orion SoC has thermal sensor.
>> This patch adds support for 88F6282 and 88F6283.
>>
>> Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
>> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
>> ---
>> .../devicetree/bindings/thermal/orion-thermal.txt | 16 +++
>> drivers/thermal/Kconfig | 7 ++
>> drivers/thermal/Makefile | 1 +
>> drivers/thermal/orion_thermal.c | 133 ++++++++++++++++++++
>> 4 files changed, 157 insertions(+)
>> create mode 100644
>> Documentation/devicetree/bindings/thermal/orion-thermal.txt
>> create mode 100644 drivers/thermal/orion_thermal.c
>>
>> diff --git
>> a/Documentation/devicetree/bindings/thermal/orion-thermal.txt
>> b/Documentation/devicetree/bindings/thermal/orion-thermal.txt
>> new file mode 100644
>> index 0000000..5ce925d
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/thermal/orion-thermal.txt
>> @@ -0,0 +1,16 @@
>> +* Orion Thermal
>> +
>> +This initial version is for Kirkwood 88F8262 & 88F6283 SoCs, however
>> +it is expected the driver will sometime in the future be expanded to
>> +also support Dove, using a different compatibility string.
>> +
>> +Required properties:
>> +- compatible : "marvell,kirkwood-thermal"
>> +- reg : Address range of the thermal registers
>> +
>> +Example:
>> +
>> + thermal@10078 {
>> + compatible = "marvell,kirkwood";
>> + reg = <0x10078 0x4>;
>> + };
>
> How do you differentiate if the SoC has the temperature sensor? On your
> patch description you are very clear saying that this supports only
> 88F8262 & 88F6283 SoCs.

Hi Eduardo

Thanks for the comments. I will address them in the next version.

We differentiate between the different SoCs by DT. Each has its own 
.dtsi file and we will put the node into only those which have the hardware.

Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH v7u1 31/31] x86, 64bit, mm: hibernate use generic mapping_init
From: Rafael J. Wysocki @ 2013-01-04 11:43 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Eric W. Biederman,
	Andrew Morton, Borislav Petkov, Jan Kiszka, Jason Wessel,
	linux-kernel, Pavel Machek, linux-pm
In-Reply-To: <1357260531-11115-32-git-send-email-yinghai@kernel.org>

On Thursday, January 03, 2013 04:48:51 PM Yinghai Lu wrote:
> Make it only map range in pfn_mapped array.

Can you please explain why that should be sufficient?

Have you tested it?

> and it has kernel mapping with EXEC.

That's because it needs to execute code from one of those pages and it
doesn't know in advance which one that's going to be.

Thanks,
Rafael


> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: Pavel Machek <pavel@ucw.cz>
> Cc: Rafael J. Wysocki <rjw@sisk.pl>
> Cc: linux-pm@vger.kernel.org
> ---
>  arch/x86/power/hibernate_64.c |   66 ++++++++++++++---------------------------
>  1 file changed, 22 insertions(+), 44 deletions(-)
> 
> diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c
> index 460f314..a0fde91 100644
> --- a/arch/x86/power/hibernate_64.c
> +++ b/arch/x86/power/hibernate_64.c
> @@ -11,6 +11,8 @@
>  #include <linux/gfp.h>
>  #include <linux/smp.h>
>  #include <linux/suspend.h>
> +
> +#include <asm/init.h>
>  #include <asm/proto.h>
>  #include <asm/page.h>
>  #include <asm/pgtable.h>
> @@ -39,41 +41,21 @@ pgd_t *temp_level4_pgt;
>  
>  void *relocated_restore_code;
>  
> -static int res_phys_pud_init(pud_t *pud, unsigned long address, unsigned long end)
> +static void *alloc_pgt_page(void *context)
>  {
> -	long i, j;
> -
> -	i = pud_index(address);
> -	pud = pud + i;
> -	for (; i < PTRS_PER_PUD; pud++, i++) {
> -		unsigned long paddr;
> -		pmd_t *pmd;
> -
> -		paddr = address + i*PUD_SIZE;
> -		if (paddr >= end)
> -			break;
> -
> -		pmd = (pmd_t *)get_safe_page(GFP_ATOMIC);
> -		if (!pmd)
> -			return -ENOMEM;
> -		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
> -		for (j = 0; j < PTRS_PER_PMD; pmd++, j++, paddr += PMD_SIZE) {
> -			unsigned long pe;
> -
> -			if (paddr >= end)
> -				break;
> -			pe = __PAGE_KERNEL_LARGE_EXEC | paddr;
> -			pe &= __supported_pte_mask;
> -			set_pmd(pmd, __pmd(pe));
> -		}
> -	}
> -	return 0;
> +	return (void *)get_safe_page(GFP_ATOMIC);
>  }
>
>  static int set_up_temporary_mappings(void)
>  {
> -	unsigned long start, end, next;
> -	int error;
> +	struct x86_mapping_info info = {
> +		.alloc_pgt_page	= alloc_pgt_page,
> +		.pmd_flag	= __PAGE_KERNEL_LARGE_EXEC,
> +		.kernel_mapping = true,
> +	};
> +	unsigned long mstart, mend;
> +	int result;
> +	int i;
>  
>  	temp_level4_pgt = (pgd_t *)get_safe_page(GFP_ATOMIC);
>  	if (!temp_level4_pgt)
> @@ -84,21 +66,17 @@ static int set_up_temporary_mappings(void)
>  		init_level4_pgt[pgd_index(__START_KERNEL_map)]);
>  
>  	/* Set up the direct mapping from scratch */
> -	start = (unsigned long)pfn_to_kaddr(0);
> -	end = (unsigned long)pfn_to_kaddr(max_pfn);
> -
> -	for (; start < end; start = next) {
> -		pud_t *pud = (pud_t *)get_safe_page(GFP_ATOMIC);
> -		if (!pud)
> -			return -ENOMEM;
> -		next = start + PGDIR_SIZE;
> -		if (next > end)
> -			next = end;
> -		if ((error = res_phys_pud_init(pud, __pa(start), __pa(next))))
> -			return error;
> -		set_pgd(temp_level4_pgt + pgd_index(start),
> -			mk_kernel_pgd(__pa(pud)));
> +	for (i = 0; i < nr_pfn_mapped; i++) {
> +		mstart = pfn_mapped[i].start << PAGE_SHIFT;
> +		mend   = pfn_mapped[i].end << PAGE_SHIFT;
> +
> +		result = kernel_ident_mapping_init(&info, temp_level4_pgt,
> +						   mstart, mend);
> +
> +		if (result)
> +			return result;
>  	}
> +
>  	return 0;
>  }
>  
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [PATCH 1/3] cpufreq: Manage only online cpus
From: Rafael J. Wysocki @ 2013-01-04 11:32 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linaro-dev, nicolas.pitre, amit.kucheria, mathieu.poirier,
	linux-kernel, cpufreq, pdsw-power-team, linux-pm
In-Reply-To: <CAKohpo=ok5aH77ycMKPAdHgUkf2HcwMgik+6eZp1Du1QVYxPZQ@mail.gmail.com>

On Friday, January 04, 2013 10:44:36 AM Viresh Kumar wrote:
> On 3 January 2013 17:32, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > True, but have those bugs been introduced recently (ie. in v3.8-rc1 or later)?
> 
> Don't know... I feel they were always there, its just that nobody
> tested it that way :)

That exactly is my point. :-)

If they have always been there, I don't see much reason for hurrying with the
fixes, so I'll queue them up for v3.9.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* [PATCH v6 3/3] PM: Introduce Intel PowerClamp Driver
From: Jacob Pan @ 2013-01-04 11:12 UTC (permalink / raw)
  To: Linux PM, LKML
  Cc: Peter Zijlstra, Rafael Wysocki, Len Brown, Thomas Gleixner,
	H. Peter Anvin, Ingo Molnar, Zhang Rui, Joe Perches, Rob Landley,
	Arjan van de Ven, Paul McKenney, Jacob Pan
In-Reply-To: <1357297965-17839-1-git-send-email-jacob.jun.pan@linux.intel.com>

Intel PowerClamp driver performs synchronized idle injection across
all online CPUs. The goal is to maintain a given package level C-state
ratio.

Compared to other throttling methods already exist in the kernel,
such as ACPI PAD (taking CPUs offline) and clock modulation, this is often
more efficient in terms of performance per watt.

Please refer to Documentation/thermal/intel_powerclamp.txt for more details.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 Documentation/thermal/intel_powerclamp.txt |  307 +++++++++++
 drivers/thermal/Kconfig                    |   10 +
 drivers/thermal/Makefile                   |    2 +
 drivers/thermal/intel_powerclamp.c         |  788 ++++++++++++++++++++++++++++
 4 files changed, 1107 insertions(+)
 create mode 100644 Documentation/thermal/intel_powerclamp.txt
 create mode 100644 drivers/thermal/intel_powerclamp.c

diff --git a/Documentation/thermal/intel_powerclamp.txt b/Documentation/thermal/intel_powerclamp.txt
new file mode 100644
index 0000000..332de4a
--- /dev/null
+++ b/Documentation/thermal/intel_powerclamp.txt
@@ -0,0 +1,307 @@
+			 =======================
+			 INTEL POWERCLAMP DRIVER
+			 =======================
+By: Arjan van de Ven <arjan@linux.intel.com>
+    Jacob Pan <jacob.jun.pan@linux.intel.com>
+
+Contents:
+	(*) Introduction
+	    - Goals and Objectives
+
+	(*) Theory of Operation
+	    - Idle Injection
+	    - Calibration
+
+	(*) Performance Analysis
+	    - Effectiveness and Limitations
+	    - Power vs Performance
+	    - Scalability
+	    - Calibration
+	    - Comparison with Alternative Techniques
+
+	(*) Usage and Interfaces
+	    - Generic Thermal Layer (sysfs)
+	    - Kernel APIs (TBD)
+
+============
+INTRODUCTION
+============
+
+Consider the situation where a system’s power consumption must be
+reduced at runtime, due to power budget, thermal constraint, or noise
+level, and where active cooling is not preferred. Software managed
+passive power reduction must be performed to prevent the hardware
+actions that are designed for catastrophic scenarios.
+
+Currently, P-states, T-states (clock modulation), and CPU offlining
+are used for CPU throttling.
+
+On Intel CPUs, C-states provide effective power reduction, but so far
+they’re only used opportunistically, based on workload. With the
+development of intel_powerclamp driver, the method of synchronizing
+idle injection across all online CPU threads was introduced. The goal
+is to achieve forced and controllable C-state residency.
+
+Test/Analysis has been made in the areas of power, performance,
+scalability, and user experience. In many cases, clear advantage is
+shown over taking the CPU offline or modulating the CPU clock.
+
+
+===================
+THEORY OF OPERATION
+===================
+
+Idle Injection
+--------------
+
+On modern Intel processors (Nehalem or later), package level C-state
+residency is available in MSRs, thus also available to the kernel.
+
+These MSRs are:
+      #define MSR_PKG_C2_RESIDENCY	0x60D
+      #define MSR_PKG_C3_RESIDENCY	0x3F8
+      #define MSR_PKG_C6_RESIDENCY	0x3F9
+      #define MSR_PKG_C7_RESIDENCY	0x3FA
+
+If the kernel can also inject idle time to the system, then a
+closed-loop control system can be established that manages package
+level C-state. The intel_powerclamp driver is conceived as such a
+control system, where the target set point is a user-selected idle
+ratio (based on power reduction), and the error is the difference
+between the actual package level C-state residency ratio and the target idle
+ratio.
+
+Injection is controlled by high priority kernel threads, spawned for
+each online CPU.
+
+These kernel threads, with SCHED_FIFO class, are created to perform
+clamping actions of controlled duty ratio and duration. Each per-CPU
+thread synchronizes its idle time and duration, based on the rounding
+of jiffies, so accumulated errors can be prevented to avoid a jittery
+effect. Threads are also bound to the CPU such that they cannot be
+migrated, unless the CPU is taken offline. In this case, threads
+belong to the offlined CPUs will be terminated immediately.
+
+Running as SCHED_FIFO and relatively high priority, also allows such
+scheme to work for both preemptable and non-preemptable kernels.
+Alignment of idle time around jiffies ensures scalability for HZ
+values. This effect can be better visualized using a Perf timechart.
+The following diagram shows the behavior of kernel thread
+kidle_inject/cpu. During idle injection, it runs monitor/mwait idle
+for a given "duration", then relinquishes the CPU to other tasks,
+until the next time interval.
+
+The NOHZ schedule tick is disabled during idle time, but interrupts
+are not masked. Tests show that the extra wakeups from scheduler tick
+have a dramatic impact on the effectiveness of the powerclamp driver
+on large scale systems (Westmere system with 80 processors).
+
+CPU0
+		  ____________          ____________
+kidle_inject/0   |   sleep    |  mwait |  sleep     |
+	_________|            |________|            |_______
+			       duration
+CPU1
+		  ____________          ____________
+kidle_inject/1   |   sleep    |  mwait |  sleep     |
+	_________|            |________|            |_______
+			      ^
+			      |
+			      |
+			      roundup(jiffies, interval)
+
+Only one CPU is allowed to collect statistics and update global
+control parameters. This CPU is referred to as the controlling CPU in
+this document. The controlling CPU is elected at runtime, with a
+policy that favors BSP, taking into account the possibility of a CPU
+hot-plug.
+
+In terms of dynamics of the idle control system, package level idle
+time is considered largely as a non-causal system where its behavior
+cannot be based on the past or current input. Therefore, the
+intel_powerclamp driver attempts to enforce the desired idle time
+instantly as given input (target idle ratio). After injection,
+powerclamp moniors the actual idle for a given time window and adjust
+the next injection accordingly to avoid over/under correction.
+
+When used in a causal control system, such as a temperature control,
+it is up to the user of this driver to implement algorithms where
+past samples and outputs are included in the feedback. For example, a
+PID-based thermal controller can use the powerclamp driver to
+maintain a desired target temperature, based on integral and
+derivative gains of the past samples.
+
+
+
+Calibration
+-----------
+During scalability testing, it is observed that synchronized actions
+among CPUs become challenging as the number of cores grows. This is
+also true for the ability of a system to enter package level C-states.
+
+To make sure the intel_powerclamp driver scales well, online
+calibration is implemented. The goals for doing such a calibration
+are:
+
+a) determine the effective range of idle injection ratio
+b) determine the amount of compensation needed at each target ratio
+
+Compensation to each target ratio consists of two parts:
+
+        a) steady state error compensation
+	This is to offset the error occurring when the system can
+	enter idle without extra wakeups (such as external interrupts).
+
+	b) dynamic error compensation
+	When an excessive amount of wakeups occurs during idle, an
+	additional idle ratio can be added to quiet interrupts, by
+	slowing down CPU activities.
+
+A debugfs file is provided for the user to examine compensation
+progress and results, such as on a Westmere system.
+[jacob@nex01 ~]$ cat
+/sys/kernel/debug/intel_powerclamp/powerclamp_calib
+controlling cpu: 0
+pct confidence steady dynamic (compensation)
+0	0	0	0
+1	1	0	0
+2	1	1	0
+3	3	1	0
+4	3	1	0
+5	3	1	0
+6	3	1	0
+7	3	1	0
+8	3	1	0
+...
+30	3	2	0
+31	3	2	0
+32	3	1	0
+33	3	2	0
+34	3	1	0
+35	3	2	0
+36	3	1	0
+37	3	2	0
+38	3	1	0
+39	3	2	0
+40	3	3	0
+41	3	1	0
+42	3	2	0
+43	3	1	0
+44	3	1	0
+45	3	2	0
+46	3	3	0
+47	3	0	0
+48	3	2	0
+49	3	3	0
+
+Calibration occurs during runtime. No offline method is available.
+Steady state compensation is used only when confidence levels of all
+adjacent ratios have reached satisfactory level. A confidence level
+is accumulated based on clean data collected at runtime. Data
+collected during a period without extra interrupts is considered
+clean.
+
+To compensate for excessive amounts of wakeup during idle, additional
+idle time is injected when such a condition is detected. Currently,
+we have a simple algorithm to double the injection ratio. A possible
+enhancement might be to throttle the offending IRQ, such as delaying
+EOI for level triggered interrupts. But it is a challenge to be
+non-intrusive to the scheduler or the IRQ core code.
+
+
+CPU Online/Offline
+------------------
+Per-CPU kernel threads are started/stopped upon receiving
+notifications of CPU hotplug activities. The intel_powerclamp driver
+keeps track of clamping kernel threads, even after they are migrated
+to other CPUs, after a CPU offline event.
+
+
+=====================
+Performance Analysis
+=====================
+This section describes the general performance data collected on
+multiple systems, including Westmere (80P) and Ivy Bridge (4P, 8P).
+
+Effectiveness and Limitations
+-----------------------------
+The maximum range that idle injection is allowed is capped at 50
+percent. As mentioned earlier, since interrupts are allowed during
+forced idle time, excessive interrupts could result in less
+effectiveness. The extreme case would be doing a ping -f to generated
+flooded network interrupts without much CPU acknowledgement. In this
+case, little can be done from the idle injection threads. In most
+normal cases, such as scp a large file, applications can be throttled
+by the powerclamp driver, since slowing down the CPU also slows down
+network protocol processing, which in turn reduces interrupts.
+
+When control parameters change at runtime by the controlling CPU, it
+may take an additional period for the rest of the CPUs to catch up
+with the changes. During this time, idle injection is out of sync,
+thus not able to enter package C- states at the expected ratio. But
+this effect is minor, in that in most cases change to the target
+ratio is updated much less frequently than the idle injection
+frequency.
+
+Scalability
+-----------
+Tests also show a minor, but measurable, difference between the 4P/8P
+Ivy Bridge system and the 80P Westmere server under 50% idle ratio.
+More compensation is needed on Westmere for the same amount of
+target idle ratio. The compensation also increases as the idle ratio
+gets larger. The above reason constitutes the need for the
+calibration code.
+
+On the IVB 8P system, compared to an offline CPU, powerclamp can
+achieve up to 40% better performance per watt. (measured by a spin
+counter summed over per CPU counting threads spawned for all running
+CPUs).
+
+====================
+Usage and Interfaces
+====================
+The powerclamp driver is registered to the generic thermal layer as a
+cooling device. Currently, it’s not bound to any thermal zones.
+
+jacob@chromoly:/sys/class/thermal/cooling_device14$ grep . *
+cur_state:0
+max_state:50
+type:intel_powerclamp
+
+Example usage:
+- To inject 25% idle time
+$ sudo sh -c "echo 25 > /sys/class/thermal/cooling_device80/cur_state
+"
+
+If the system is not busy and has more than 25% idle time already,
+then the powerclamp driver will not start idle injection. Using Top
+will not show idle injection kernel threads.
+
+If the system is busy (spin test below) and has less than 25% natural
+idle time, powerclamp kernel threads will do idle injection, which
+appear running to the scheduler. But the overall system idle is still
+reflected. In this example, 24.1% idle is shown. This helps the
+system admin or user determine the cause of slowdown, when a
+powerclamp driver is in action.
+
+
+Tasks: 197 total,   1 running, 196 sleeping,   0 stopped,   0 zombie
+Cpu(s): 71.2%us,  4.7%sy,  0.0%ni, 24.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
+Mem:   3943228k total,  1689632k used,  2253596k free,    74960k buffers
+Swap:  4087804k total,        0k used,  4087804k free,   945336k cached
+
+  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
+ 3352 jacob     20   0  262m  644  428 S  286  0.0   0:17.16 spin
+ 3341 root     -51   0     0    0    0 D   25  0.0   0:01.62 kidle_inject/0
+ 3344 root     -51   0     0    0    0 D   25  0.0   0:01.60 kidle_inject/3
+ 3342 root     -51   0     0    0    0 D   25  0.0   0:01.61 kidle_inject/1
+ 3343 root     -51   0     0    0    0 D   25  0.0   0:01.60 kidle_inject/2
+ 2935 jacob     20   0  696m 125m  35m S    5  3.3   0:31.11 firefox
+ 1546 root      20   0  158m  20m 6640 S    3  0.5   0:26.97 Xorg
+ 2100 jacob     20   0 1223m  88m  30m S    3  2.3   0:23.68 compiz
+
+Tests have shown that by using the powerclamp driver as a cooling
+device, a PID based userspace thermal controller can manage to
+control CPU temperature effectively, when no other thermal influence
+is added. For example, a UltraBook user can compile the kernel under
+certain temperature (below most active trip points).
diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index c2c77d1..7d90ab8 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -122,4 +122,14 @@ config DB8500_CPUFREQ_COOLING
 	  bound cpufreq cooling device turns active to set CPU frequency low to
 	  cool down the CPU.
 
+config INTEL_POWERCLAMP
+	tristate "Intel PowerClamp idle injection driver"
+	depends on THERMAL
+	depends on X86
+	depends on CPU_SUP_INTEL
+	help
+	  Enable this to enable Intel PowerClamp idle injection driver. This
+	  enforce idle time which results in more package C-state residency. The
+	  user interface is exposed via generic thermal framework.
+
 endif
diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
index d8da683..574f5f5 100644
--- a/drivers/thermal/Makefile
+++ b/drivers/thermal/Makefile
@@ -18,3 +18,5 @@ obj-$(CONFIG_RCAR_THERMAL)	+= rcar_thermal.o
 obj-$(CONFIG_EXYNOS_THERMAL)	+= exynos_thermal.o
 obj-$(CONFIG_DB8500_THERMAL)	+= db8500_thermal.o
 obj-$(CONFIG_DB8500_CPUFREQ_COOLING)	+= db8500_cpufreq_cooling.o
+obj-$(CONFIG_INTEL_POWERCLAMP)	+= intel_powerclamp.o
+
diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
new file mode 100644
index 0000000..314b6fc
--- /dev/null
+++ b/drivers/thermal/intel_powerclamp.c
@@ -0,0 +1,788 @@
+/*
+ * intel_powerclamp.c - package c-state idle injection
+ *
+ * Copyright (c) 2012, Intel Corporation.
+ *
+ * Authors:
+ *     Arjan van de Ven <arjan@linux.intel.com>
+ *     Jacob Pan <jacob.jun.pan@linux.intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ *
+ *	TODO:
+ *           1. better handle wakeup from external interrupts, currently a fixed
+ *              compensation is added to clamping duration when excessive amount
+ *              of wakeups are observed during idle time. the reason is that in
+ *              case of external interrupts without need for ack, clamping down
+ *              cpu in non-irq context does not reduce irq. for majority of the
+ *              cases, clamping down cpu does help reduce irq as well, we should
+ *              be able to differenciate the two cases and give a quantitative
+ *              solution for the irqs that we can control. perhaps based on
+ *              get_cpu_iowait_time_us()
+ *
+ *	     2. synchronization with other hw blocks
+ *
+ *
+ */
+
+#define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/delay.h>
+#include <linux/kthread.h>
+#include <linux/freezer.h>
+#include <linux/cpu.h>
+#include <linux/thermal.h>
+#include <linux/slab.h>
+#include <linux/tick.h>
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+#include <linux/nmi.h>
+
+#include <asm/msr.h>
+#include <asm/mwait.h>
+#include <asm/cpu_device_id.h>
+#include <asm/idle.h>
+#include <asm/hardirq.h>
+
+#define MAX_TARGET_RATIO (50U)
+/* For each undisturbed clamping period (no extra wake ups during idle time),
+ * we increment the confidence counter for the given target ratio.
+ * CONFIDENCE_OK defines the level where runtime calibration results are
+ * valid.
+ */
+#define CONFIDENCE_OK (3)
+/* Default idle injection duration, driver adjust sleep time to meet target
+ * idle ratio. Similar to frequency modulation.
+ */
+#define DEFAULT_DURATION_JIFFIES (6)
+
+static unsigned int target_mwait;
+static struct dentry *debug_dir;
+
+/* user selected target */
+static unsigned int set_target_ratio;
+static unsigned int current_ratio;
+static bool should_skip;
+static bool reduce_irq;
+static atomic_t idle_wakeup_counter;
+static unsigned int control_cpu; /* The cpu assigned to collect stat and update
+				  * control parameters. default to BSP but BSP
+				  * can be offlined.
+				  */
+static bool clamping;
+
+
+static struct task_struct __percpu **powerclamp_thread;
+static struct thermal_cooling_device *cooling_dev;
+static unsigned long *cpu_clamping_mask;  /* bit map for tracking per cpu
+					   * clamping thread
+					   */
+
+static unsigned int duration;
+static unsigned int pkg_cstate_ratio_cur;
+static unsigned int window_size;
+
+static int duration_set(const char *arg, const struct kernel_param *kp)
+{
+	int ret = 0;
+	unsigned long new_duration;
+
+	ret = kstrtoul(arg, 10, &new_duration);
+	if (ret)
+		goto exit;
+	if (new_duration > 25 || new_duration < 6) {
+		pr_err("Out of recommended range %lu, between 6-25ms\n",
+			new_duration);
+		ret = -EINVAL;
+	}
+
+	duration = clamp(new_duration, 6ul, 25ul);
+	smp_mb();
+
+exit:
+
+	return ret;
+}
+
+static struct kernel_param_ops duration_ops = {
+	.set = duration_set,
+	.get = param_get_int,
+};
+
+
+module_param_cb(duration, &duration_ops, &duration, 0644);
+MODULE_PARM_DESC(duration, "forced idle time for each attempt in msec.");
+
+struct powerclamp_calibration_data {
+	unsigned long confidence;  /* used for calibration, basically a counter
+				    * gets incremented each time a clamping
+				    * period is completed without extra wakeups
+				    * once that counter is reached given level,
+				    * compensation is deemed usable.
+				    */
+	unsigned long steady_comp; /* steady state compensation used when
+				    * no extra wakeups occurred.
+				    */
+	unsigned long dynamic_comp; /* compensate excessive wakeup from idle
+				     * mostly from external interrupts.
+				     */
+};
+
+static struct powerclamp_calibration_data cal_data[MAX_TARGET_RATIO];
+
+static int window_size_set(const char *arg, const struct kernel_param *kp)
+{
+	int ret = 0;
+	unsigned long new_window_size;
+
+	ret = kstrtoul(arg, 10, &new_window_size);
+	if (ret)
+		goto exit_win;
+	if (new_window_size > 10 || new_window_size < 2) {
+		pr_err("Out of recommended window size %lu, between 2-10\n",
+			new_window_size);
+		ret = -EINVAL;
+	}
+
+	window_size = clamp(new_window_size, 2ul, 10ul);
+	smp_mb();
+
+exit_win:
+
+	return ret;
+}
+
+static struct kernel_param_ops window_size_ops = {
+	.set = window_size_set,
+	.get = param_get_int,
+};
+
+module_param_cb(window_size, &window_size_ops, &window_size, 0644);
+MODULE_PARM_DESC(window_size, "sliding window in number of clamping cycles\n"
+	"\tpowerclamp controls idle ratio within this window. larger\n"
+	"\twindow size results in slower response time but more smooth\n"
+	"\tclamping results. default to 2.");
+
+static void find_target_mwait(void)
+{
+	unsigned int eax, ebx, ecx, edx;
+	unsigned int highest_cstate = 0;
+	unsigned int highest_subcstate = 0;
+	int i;
+
+	if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
+		return;
+
+	cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
+
+	if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) ||
+	    !(ecx & CPUID5_ECX_INTERRUPT_BREAK))
+		return;
+
+	edx >>= MWAIT_SUBSTATE_SIZE;
+	for (i = 0; i < 7 && edx; i++, edx >>= MWAIT_SUBSTATE_SIZE) {
+		if (edx & MWAIT_SUBSTATE_MASK) {
+			highest_cstate = i;
+			highest_subcstate = edx & MWAIT_SUBSTATE_MASK;
+		}
+	}
+	target_mwait = (highest_cstate << MWAIT_SUBSTATE_SIZE) |
+		(highest_subcstate - 1);
+
+}
+
+static u64 pkg_state_counter(void)
+{
+	u64 val;
+	u64 count = 0;
+
+	static bool skip_c2;
+	static bool skip_c3;
+	static bool skip_c6;
+	static bool skip_c7;
+
+	if (!skip_c2) {
+		if (!rdmsrl_safe(MSR_PKG_C2_RESIDENCY, &val))
+			count += val;
+		else
+			skip_c2 = true;
+	}
+
+	if (!skip_c3) {
+		if (!rdmsrl_safe(MSR_PKG_C3_RESIDENCY, &val))
+			count += val;
+		else
+			skip_c3 = true;
+	}
+
+	if (!skip_c6) {
+		if (!rdmsrl_safe(MSR_PKG_C6_RESIDENCY, &val))
+			count += val;
+		else
+			skip_c6 = true;
+	}
+
+	if (!skip_c7) {
+		if (!rdmsrl_safe(MSR_PKG_C7_RESIDENCY, &val))
+			count += val;
+		else
+			skip_c7 = true;
+	}
+
+	return count;
+}
+
+static void noop_timer(unsigned long foo)
+{
+	/* empty... just the fact that we get the interrupt wakes us up */
+}
+
+static unsigned int get_compensation(int ratio)
+{
+	unsigned int comp = 0;
+
+	/* we only use compensation if all adjacent ones are good */
+	if (ratio == 1 &&
+		cal_data[ratio].confidence >= CONFIDENCE_OK &&
+		cal_data[ratio + 1].confidence >= CONFIDENCE_OK &&
+		cal_data[ratio + 2].confidence >= CONFIDENCE_OK) {
+		comp = (cal_data[ratio].steady_comp +
+			cal_data[ratio + 1].steady_comp +
+			cal_data[ratio + 2].steady_comp) / 3;
+	} else if (ratio == MAX_TARGET_RATIO - 1 &&
+		cal_data[ratio].confidence >= CONFIDENCE_OK &&
+		cal_data[ratio - 1].confidence >= CONFIDENCE_OK &&
+		cal_data[ratio - 2].confidence >= CONFIDENCE_OK) {
+		comp = (cal_data[ratio].steady_comp +
+			cal_data[ratio - 1].steady_comp +
+			cal_data[ratio - 2].steady_comp) / 3;
+	} else if (cal_data[ratio].confidence >= CONFIDENCE_OK &&
+		cal_data[ratio - 1].confidence >= CONFIDENCE_OK &&
+		cal_data[ratio + 1].confidence >= CONFIDENCE_OK) {
+		comp = (cal_data[ratio].steady_comp +
+			cal_data[ratio - 1].steady_comp +
+			cal_data[ratio + 1].steady_comp) / 3;
+	}
+
+	/* REVISIT: simple penalty of double idle injection */
+	if (reduce_irq)
+		comp = ratio;
+	/* do not exceed limit */
+	if (comp + ratio >= MAX_TARGET_RATIO)
+		comp = MAX_TARGET_RATIO - ratio - 1;
+
+	return comp;
+}
+
+static void adjust_compensation(int target_ratio, unsigned int win)
+{
+	int delta;
+	struct powerclamp_calibration_data *d = &cal_data[target_ratio];
+
+	/*
+	 * adjust compensations if confidence level has not been reached or
+	 * there are too many wakeups during the last idle injection period, we
+	 * cannot trust the data for compensation.
+	 */
+	if (d->confidence >= CONFIDENCE_OK ||
+		atomic_read(&idle_wakeup_counter) >
+		win * num_online_cpus())
+		return;
+
+	delta = set_target_ratio - current_ratio;
+	/* filter out bad data */
+	if (delta >= 0 && delta <= (1+target_ratio/10)) {
+		if (d->steady_comp)
+			d->steady_comp =
+				roundup(delta+d->steady_comp, 2)/2;
+		else
+			d->steady_comp = delta;
+		d->confidence++;
+	}
+}
+
+static bool powerclamp_adjust_controls(unsigned int target_ratio,
+				unsigned int guard, unsigned int win)
+{
+	static u64 msr_last, tsc_last;
+	u64 msr_now, tsc_now;
+
+	/* check result for the last window */
+	msr_now = pkg_state_counter();
+	rdtscll(tsc_now);
+
+	/* calculate pkg cstate vs tsc ratio */
+	if (!msr_last || !tsc_last)
+		current_ratio = 1;
+	else if (tsc_now-tsc_last)
+		current_ratio = 100*(msr_now-msr_last)/
+			(tsc_now-tsc_last);
+
+	/* update record */
+	msr_last = msr_now;
+	tsc_last = tsc_now;
+
+	adjust_compensation(target_ratio, win);
+	/*
+	 * too many external interrupts, set flag such
+	 * that we can take measure later.
+	 */
+	reduce_irq = atomic_read(&idle_wakeup_counter) >=
+		2 * win * num_online_cpus();
+
+	atomic_set(&idle_wakeup_counter, 0);
+	/* if we are above target+guard, skip */
+	return set_target_ratio + guard <= current_ratio;
+}
+
+static int clamp_thread(void *arg)
+{
+	int cpunr = (unsigned long)arg;
+	DEFINE_TIMER(wakeup_timer, noop_timer, 0, 0);
+	static const struct sched_param param = {
+		.sched_priority = MAX_USER_RT_PRIO/2,
+	};
+	unsigned int count = 0;
+	unsigned int target_ratio;
+
+	set_bit(cpunr, cpu_clamping_mask);
+	set_freezable();
+	init_timer_on_stack(&wakeup_timer);
+	sched_setscheduler(current, SCHED_FIFO, &param);
+
+	while (true == clamping && !kthread_should_stop() &&
+		cpu_online(cpunr)) {
+		int sleeptime;
+		unsigned long target_jiffies;
+		unsigned int guard;
+		unsigned int compensation = 0;
+		int interval; /* jiffies to sleep for each attempt */
+		unsigned int duration_jiffies = msecs_to_jiffies(duration);
+		unsigned int window_size_now;
+
+		try_to_freeze();
+		/*
+		 * make sure user selected ratio does not take effect until
+		 * the next round. adjust target_ratio if user has changed
+		 * target such that we can converge quickly.
+		 */
+		target_ratio = set_target_ratio;
+		guard = 1 + target_ratio/20;
+		window_size_now = window_size;
+		count++;
+
+		/*
+		 * systems may have different ability to enter package level
+		 * c-states, thus we need to compensate the injected idle ratio
+		 * to achieve the actual target reported by the HW.
+		 */
+		compensation = get_compensation(target_ratio);
+		interval = duration_jiffies*100/(target_ratio+compensation);
+
+		/* align idle time */
+		target_jiffies = roundup(jiffies, interval);
+		sleeptime = target_jiffies - jiffies;
+		if (sleeptime <= 0)
+			sleeptime = 1;
+		schedule_timeout_interruptible(sleeptime);
+		/*
+		 * only elected controlling cpu can collect stats and update
+		 * control parameters.
+		 */
+		if (cpunr == control_cpu && !(count%window_size_now)) {
+			should_skip =
+				powerclamp_adjust_controls(target_ratio,
+							guard, window_size_now);
+			smp_mb();
+		}
+
+		if (should_skip)
+			continue;
+
+		target_jiffies = jiffies + duration_jiffies;
+		mod_timer(&wakeup_timer, target_jiffies);
+		if (unlikely(local_softirq_pending()))
+			continue;
+		/*
+		 * stop tick sched during idle time, interrupts are still
+		 * allowed. thus jiffies are updated properly.
+		 */
+		preempt_disable();
+		tick_nohz_idle_enter();
+		/* mwait until target jiffies is reached */
+		while (time_before(jiffies, target_jiffies)) {
+			unsigned long ecx = 1;
+			unsigned long eax = target_mwait;
+
+			/*
+			 * REVISIT: may call enter_idle() to notify drivers who
+			 * can save power during cpu idle. same for exit_idle()
+			 */
+			local_touch_nmi();
+			stop_critical_timings();
+			__monitor((void *)&current_thread_info()->flags, 0, 0);
+			cpu_relax(); /* allow HT sibling to run */
+			__mwait(eax, ecx);
+			start_critical_timings();
+			atomic_inc(&idle_wakeup_counter);
+		}
+		tick_nohz_idle_exit();
+		preempt_enable_no_resched();
+	}
+	del_timer_sync(&wakeup_timer);
+	clear_bit(cpunr, cpu_clamping_mask);
+
+	return 0;
+}
+
+/*
+ * 1 HZ polling while clamping is active, useful for userspace
+ * to monitor actual idle ratio.
+ */
+static void poll_pkg_cstate(struct work_struct *dummy);
+static DECLARE_DELAYED_WORK(poll_pkg_cstate_work, poll_pkg_cstate);
+static void poll_pkg_cstate(struct work_struct *dummy)
+{
+	static u64 msr_last;
+	static u64 tsc_last;
+	static unsigned long jiffies_last;
+
+	u64 msr_now;
+	unsigned long jiffies_now;
+	u64 tsc_now;
+
+	msr_now = pkg_state_counter();
+	rdtscll(tsc_now);
+	jiffies_now = jiffies;
+
+	/* calculate pkg cstate vs tsc ratio */
+	if (!msr_last || !tsc_last)
+		pkg_cstate_ratio_cur = 1;
+	else {
+		if (tsc_now - tsc_last)
+			pkg_cstate_ratio_cur = 100 * (msr_now - msr_last)/
+				(tsc_now - tsc_last);
+	}
+
+	/* update record */
+	msr_last = msr_now;
+	jiffies_last = jiffies_now;
+	tsc_last = tsc_now;
+
+	if (true == clamping)
+		schedule_delayed_work(&poll_pkg_cstate_work, HZ);
+}
+
+static int start_power_clamp(void)
+{
+	unsigned long cpu;
+	struct task_struct *thread;
+
+	/* check if pkg cstate counter is completely 0, abort in this case */
+	if (!pkg_state_counter()) {
+		pr_err("pkg cstate counter not functional, abort\n");
+		return -EINVAL;
+	}
+
+	set_target_ratio = clamp(set_target_ratio, 0U, MAX_TARGET_RATIO);
+	/* prevent cpu hotplug */
+	get_online_cpus();
+
+	/* prefer BSP */
+	control_cpu = 0;
+	if (!cpu_online(control_cpu))
+		control_cpu = smp_processor_id();
+
+	clamping = true;
+	schedule_delayed_work(&poll_pkg_cstate_work, 0);
+
+	/* start one thread per online cpu */
+	for_each_online_cpu(cpu) {
+		struct task_struct **p =
+			per_cpu_ptr(powerclamp_thread, cpu);
+
+		thread = kthread_create_on_node(clamp_thread,
+						(void *) cpu,
+						cpu_to_node(cpu),
+						"kidle_inject/%ld", cpu);
+		/* bind to cpu here */
+		if (likely(!IS_ERR(thread))) {
+			kthread_bind(thread, cpu);
+			wake_up_process(thread);
+			*p = thread;
+		}
+
+	}
+	put_online_cpus();
+
+	return 0;
+}
+
+static void end_power_clamp(void)
+{
+	int i;
+	struct task_struct *thread;
+
+	clamping = false;
+	/*
+	 * make clamping visible to other cpus and give per cpu clamping threads
+	 * sometime to exit, or gets killed later.
+	 */
+	smp_mb();
+	msleep(20);
+	if (bitmap_weight(cpu_clamping_mask, num_possible_cpus())) {
+		for_each_set_bit(i, cpu_clamping_mask, num_possible_cpus()) {
+			pr_debug("clamping thread for cpu %d alive, kill\n", i);
+			thread = *per_cpu_ptr(powerclamp_thread, i);
+			kthread_stop(thread);
+		}
+	}
+}
+
+static int powerclamp_cpu_callback(struct notifier_block *nfb,
+				unsigned long action, void *hcpu)
+{
+	unsigned long cpu = (unsigned long)hcpu;
+	struct task_struct *thread;
+	struct task_struct **percpu_thread =
+		per_cpu_ptr(powerclamp_thread, cpu);
+
+	if (false == clamping)
+		goto exit_ok;
+
+	switch (action) {
+	case CPU_ONLINE:
+		thread = kthread_create_on_node(clamp_thread,
+						(void *) cpu,
+						cpu_to_node(cpu),
+						"kidle_inject/%lu", cpu);
+		if (likely(!IS_ERR(thread))) {
+			kthread_bind(thread, cpu);
+			wake_up_process(thread);
+			*percpu_thread = thread;
+		}
+		/* prefer BSP as controlling CPU */
+		if (cpu == 0) {
+			control_cpu = 0;
+			smp_mb();
+		}
+		break;
+	case CPU_DEAD:
+		if (test_bit(cpu, cpu_clamping_mask)) {
+			pr_err("cpu %lu dead but powerclamping thread is not\n",
+				cpu);
+			kthread_stop(*percpu_thread);
+		}
+		if (cpu == control_cpu) {
+			control_cpu = smp_processor_id();
+			smp_mb();
+		}
+	}
+
+exit_ok:
+	return NOTIFY_OK;
+}
+
+static struct notifier_block powerclamp_cpu_notifier = {
+	.notifier_call = powerclamp_cpu_callback,
+};
+
+static int powerclamp_get_max_state(struct thermal_cooling_device *cdev,
+				 unsigned long *state)
+{
+	*state = MAX_TARGET_RATIO;
+
+	return 0;
+}
+
+static int powerclamp_get_cur_state(struct thermal_cooling_device *cdev,
+				 unsigned long *state)
+{
+	if (true == clamping)
+		*state = pkg_cstate_ratio_cur;
+	else
+		/* to save power, do not poll idle ratio while not clamping */
+		*state = -1; /* indicates invalid state */
+
+	return 0;
+}
+
+static int powerclamp_set_cur_state(struct thermal_cooling_device *cdev,
+				 unsigned long new_target_ratio)
+{
+	int ret = 0;
+
+	new_target_ratio = clamp(new_target_ratio, 0UL,
+				(unsigned long) (MAX_TARGET_RATIO-1));
+	if (set_target_ratio == 0 && new_target_ratio > 0) {
+		pr_info("Start idle injection to reduce power\n");
+		set_target_ratio = new_target_ratio;
+		ret = start_power_clamp();
+		goto exit_set;
+	} else	if (set_target_ratio > 0 && new_target_ratio == 0) {
+		pr_info("Stop forced idle injection\n");
+		set_target_ratio = 0;
+		end_power_clamp();
+	} else	/* adjust currently running */ {
+		set_target_ratio = new_target_ratio;
+		/* make new set_target_ratio visible to other cpus */
+		smp_mb();
+	}
+
+exit_set:
+	return ret;
+}
+
+/* bind to generic thermal layer as cooling device*/
+static struct thermal_cooling_device_ops powerclamp_cooling_ops = {
+	.get_max_state = powerclamp_get_max_state,
+	.get_cur_state = powerclamp_get_cur_state,
+	.set_cur_state = powerclamp_set_cur_state,
+};
+
+/* runs on Nehalem and later */
+static const struct x86_cpu_id intel_powerclamp_ids[] = {
+	{ X86_VENDOR_INTEL, 6, 0x1a},
+	{ X86_VENDOR_INTEL, 6, 0x1c},
+	{ X86_VENDOR_INTEL, 6, 0x1e},
+	{ X86_VENDOR_INTEL, 6, 0x1f},
+	{ X86_VENDOR_INTEL, 6, 0x25},
+	{ X86_VENDOR_INTEL, 6, 0x26},
+	{ X86_VENDOR_INTEL, 6, 0x2a},
+	{ X86_VENDOR_INTEL, 6, 0x2c},
+	{ X86_VENDOR_INTEL, 6, 0x2d},
+	{ X86_VENDOR_INTEL, 6, 0x2e},
+	{ X86_VENDOR_INTEL, 6, 0x2f},
+	{ X86_VENDOR_INTEL, 6, 0x3a},
+	{}
+};
+MODULE_DEVICE_TABLE(x86cpu, intel_powerclamp_ids);
+
+static int powerclamp_probe(void)
+{
+	if (!x86_match_cpu(intel_powerclamp_ids)) {
+		pr_err("Intel powerclamp does not run on family %d model %d\n",
+				boot_cpu_data.x86, boot_cpu_data.x86_model);
+		return -ENODEV;
+	}
+	if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC) ||
+		!boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ||
+		!boot_cpu_has(X86_FEATURE_MWAIT) ||
+		!boot_cpu_has(X86_FEATURE_ARAT))
+		return -ENODEV;
+
+	/* find the deepest mwait value */
+	find_target_mwait();
+
+	return 0;
+}
+
+static int powerclamp_debug_show(struct seq_file *m, void *unused)
+{
+	int i = 0;
+
+	seq_printf(m, "controlling cpu: %d\n", control_cpu);
+	seq_printf(m, "pct confidence steady dynamic (compensation)\n");
+	for (i = 0; i < MAX_TARGET_RATIO; i++) {
+		seq_printf(m, "%d\t%lu\t%lu\t%lu\n",
+			i,
+			cal_data[i].confidence,
+			cal_data[i].steady_comp,
+			cal_data[i].dynamic_comp);
+	}
+
+	return 0;
+}
+
+static int powerclamp_debug_open(struct inode *inode,
+			struct file *file)
+{
+	return single_open(file, powerclamp_debug_show, inode->i_private);
+}
+
+static const struct file_operations powerclamp_debug_fops = {
+	.open		= powerclamp_debug_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+	.owner		= THIS_MODULE,
+};
+
+static inline void powerclamp_create_debug_files(void)
+{
+	debug_dir = debugfs_create_dir("intel_powerclamp", NULL);
+	if (!debug_dir)
+		return;
+
+	if (!debugfs_create_file("powerclamp_calib", S_IRUGO, debug_dir,
+					cal_data, &powerclamp_debug_fops))
+		goto file_error;
+
+	return;
+
+file_error:
+	debugfs_remove_recursive(debug_dir);
+}
+
+static int powerclamp_init(void)
+{
+	int retval;
+	int bitmap_size;
+
+	bitmap_size = BITS_TO_LONGS(num_possible_cpus()) * sizeof(long);
+	cpu_clamping_mask = kzalloc(bitmap_size, GFP_KERNEL);
+	if (!cpu_clamping_mask)
+		return -ENOMEM;
+
+	/* probe cpu features and ids here */
+	retval = powerclamp_probe();
+	if (retval)
+		return retval;
+	/* set default limit, maybe adjusted during runtime based on feedback */
+	window_size = 2;
+	register_hotcpu_notifier(&powerclamp_cpu_notifier);
+	powerclamp_thread = alloc_percpu(struct task_struct *);
+	cooling_dev = thermal_cooling_device_register("intel_powerclamp", NULL,
+						&powerclamp_cooling_ops);
+	if (IS_ERR(cooling_dev))
+		return -ENODEV;
+
+	if (!duration)
+		duration = jiffies_to_msecs(DEFAULT_DURATION_JIFFIES);
+	powerclamp_create_debug_files();
+
+	return 0;
+}
+module_init(powerclamp_init);
+
+static void powerclamp_exit(void)
+{
+	unregister_hotcpu_notifier(&powerclamp_cpu_notifier);
+	end_power_clamp();
+	free_percpu(powerclamp_thread);
+	thermal_cooling_device_unregister(cooling_dev);
+	kfree(cpu_clamping_mask);
+
+	cancel_delayed_work_sync(&poll_pkg_cstate_work);
+	debugfs_remove_recursive(debug_dir);
+}
+module_exit(powerclamp_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Arjan van de Ven <arjan@linux.intel.com>");
+MODULE_AUTHOR("Jacob Pan <jacob.jun.pan@linux.intel.com>");
+MODULE_DESCRIPTION("Package Level C-state Idle Injection for Intel CPUs");
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH v6 0/3] PM: Intel PowerClamp driver
From: Jacob Pan @ 2013-01-04 11:12 UTC (permalink / raw)
  To: Linux PM, LKML
  Cc: Peter Zijlstra, Rafael Wysocki, Len Brown, Thomas Gleixner,
	H. Peter Anvin, Ingo Molnar, Zhang Rui, Joe Perches, Rob Landley,
	Arjan van de Ven, Paul McKenney, Jacob Pan

v6 changes:
- clamp module parameters duration and window size, reword warning
  messages when input parameters are out of range.

We have done some experiment with idle injection on Intel platforms.
The idea is to use the increasingly power efficient package level
C-states for power capping and passive thermal control.

Documentation is included in the patch to explain the theory of
operation, performance implication, calibration, scalability, and user
interface. Please refer to the following file for more details.

Documentation/thermal/intel_powerclamp.txt

Arjan van de Ven created the original idea and driver, I have been
refining driver in hope that they can be to be useful beyond a proof
of concept.

Jacob Pan (3):
  tick: export nohz tick idle symbols for module use
  x86/nmi: export local_touch_nmi() symbol for modules
  PM: Introduce Intel PowerClamp Driver

 Documentation/thermal/intel_powerclamp.txt |  307 +++++++++++
 arch/x86/kernel/nmi.c                      |    1 +
 drivers/thermal/Kconfig                    |   10 +
 drivers/thermal/Makefile                   |    2 +
 drivers/thermal/intel_powerclamp.c         |  788 ++++++++++++++++++++++++++++
 kernel/time/tick-sched.c                   |    2 +
 6 files changed, 1110 insertions(+)
 create mode 100644 Documentation/thermal/intel_powerclamp.txt
 create mode 100644 drivers/thermal/intel_powerclamp.c

-- 
1.7.9.5

^ permalink raw reply

* Re: [PATCH 1/2] thermal: Add support for thermal sensor for Orion SoC
From: Eduardo Valentin @ 2013-01-04  9:47 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Thomas Petazzoni, Jason Cooper, iwamatsu, linux-pm,
	Sebastian Hesselbarth, jgunthorpe, linux ARM
In-Reply-To: <50E6A378.90503@ti.com>


Hello again,

On 04-01-2013 11:40, Eduardo Valentin wrote:
> Hey Andrew,
>
> On 14-12-2012 13:03, Andrew Lunn wrote:
>> From: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>

<cut>

>
> I believe you forgot to request this memory. I suggest you do:
> +    thermal_dev->base_addr = devm_request_and_ioremap(&pdev->dev,
> res->start,
> +                          res);
>

small fix:

+    thermal_dev->base_addr = devm_request_and_ioremap(&pdev->dev,
+                                                      res);


^ permalink raw reply

* Re: [PATCH 1/2] thermal: Add support for thermal sensor for Orion SoC
From: Eduardo Valentin @ 2013-01-04  9:40 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: linux ARM, iwamatsu, linux-pm, Thomas Petazzoni, jgunthorpe,
	Sebastian Hesselbarth, Jason Cooper
In-Reply-To: <1355482986-885-2-git-send-email-andrew@lunn.ch>

Hey Andrew,

On 14-12-2012 13:03, Andrew Lunn wrote:
> From: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
>
> Some Orion SoC has thermal sensor.
> This patch adds support for 88F6282 and 88F6283.
>
> Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> ---
>   .../devicetree/bindings/thermal/orion-thermal.txt  |   16 +++
>   drivers/thermal/Kconfig                            |    7 ++
>   drivers/thermal/Makefile                           |    1 +
>   drivers/thermal/orion_thermal.c                    |  133 ++++++++++++++++++++
>   4 files changed, 157 insertions(+)
>   create mode 100644 Documentation/devicetree/bindings/thermal/orion-thermal.txt
>   create mode 100644 drivers/thermal/orion_thermal.c
>
> diff --git a/Documentation/devicetree/bindings/thermal/orion-thermal.txt b/Documentation/devicetree/bindings/thermal/orion-thermal.txt
> new file mode 100644
> index 0000000..5ce925d
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/thermal/orion-thermal.txt
> @@ -0,0 +1,16 @@
> +* Orion Thermal
> +
> +This initial version is for Kirkwood 88F8262 & 88F6283 SoCs, however
> +it is expected the driver will sometime in the future be expanded to
> +also support Dove, using a different compatibility string.
> +
> +Required properties:
> +- compatible : "marvell,kirkwood-thermal"
> +- reg : Address range of the thermal registers
> +
> +Example:
> +
> +	thermal@10078 {
> +		compatible = "marvell,kirkwood";
> +		reg = <0x10078 0x4>;
> +	};

How do you differentiate if the SoC has the temperature sensor? On your 
patch description you are very clear saying that this supports only 
88F8262 & 88F6283 SoCs.

> diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
> index e1cb6bd..3bba13f 100644
> --- a/drivers/thermal/Kconfig
> +++ b/drivers/thermal/Kconfig
> @@ -55,3 +55,10 @@ config EXYNOS_THERMAL
>   	help
>   	  If you say yes here you get support for TMU (Thermal Managment
>   	  Unit) on SAMSUNG EXYNOS series of SoC.
> +
> +config ORION_THERMAL
> +	tristate "Temperature sensor on Marvel Orion SoCs"
> +	depends on PLAT_ORION && THERMAL
> +	help
> +	  Support for the Orion thermal sensor driver into the Linux thermal
> +	  framework. This currently supports only 88F6282 and 88F6283.
> diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
> index 885550d..2fc64aa 100644
> --- a/drivers/thermal/Makefile
> +++ b/drivers/thermal/Makefile
> @@ -6,4 +6,5 @@ obj-$(CONFIG_THERMAL)		+= thermal_sys.o
>   obj-$(CONFIG_CPU_THERMAL)		+= cpu_cooling.o
>   obj-$(CONFIG_SPEAR_THERMAL)		+= spear_thermal.o
>   obj-$(CONFIG_RCAR_THERMAL)	+= rcar_thermal.o
> +obj-$(CONFIG_ORION_THERMAL)         	+= orion_thermal.o
>   obj-$(CONFIG_EXYNOS_THERMAL)		+= exynos_thermal.o
> diff --git a/drivers/thermal/orion_thermal.c b/drivers/thermal/orion_thermal.c
> new file mode 100644
> index 0000000..e8a2a68
> --- /dev/null
> +++ b/drivers/thermal/orion_thermal.c
> @@ -0,0 +1,133 @@
> +/*
> + * Orion thermal sensor driver
> + *
> + * Copyright (C) 2012 Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +#include <linux/device.h>
> +#include <linux/err.h>
> +#include <linux/io.h>
> +#include <linux/kernel.h>
> +#include <linux/of.h>
> +#include <linux/module.h>
> +#include <linux/platform_device.h>
> +#include <linux/thermal.h>
> +
> +#define THERMAL_VALID_OFFSET	9
> +#define THERMAL_VALID_MASK	0x1
> +#define THERMAL_TEMP_OFFSET	10
> +#define THERMAL_TEMP_MASK	0x1FF
> +
> +/* Orion Thermal Sensor Dev Structure */
> +struct orion_thermal_dev {
> +	void __iomem *base_addr;
> +};
> +
> +static int orion_get_temp(struct thermal_zone_device *thermal,
> +			  unsigned long *temp)
> +{
> +	unsigned long reg;
> +	struct orion_thermal_dev *thermal_dev = thermal->devdata;
> +
> +	reg = readl_relaxed(thermal_dev->base_addr);
> +
> +	/* Valid check */
> +	if (!(reg >> THERMAL_VALID_OFFSET) & THERMAL_VALID_MASK) {
> +		dev_info(&thermal->device,

This state seams to be severe enough to get a dev_err level message.

> +			 "Temperature sensor reading not valid\n");
> +		return -EIO;
> +	}
> +
> +	reg = (reg >> THERMAL_TEMP_OFFSET) & THERMAL_TEMP_MASK;
> +	/* Calculate temperature. See Table 814 in 8262 hardware manual. */
> +	*temp = ((322UL - reg) * 10000UL * 1000UL) / 13625UL;
> +
> +	return 0;
> +}
> +
> +static struct thermal_zone_device_ops ops = {
> +	.get_temp = orion_get_temp,
> +};
> +
> +static int orion_thermal_probe(struct platform_device *pdev)
> +{
> +	struct thermal_zone_device *thermal = NULL;
> +	struct orion_thermal_dev *thermal_dev;
> +	struct resource *res;
> +
> +	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +	if (!res) {
> +		dev_err(&pdev->dev, "Failed to get platform resource\n");
> +		return -ENODEV;
> +	}
> +
> +	thermal_dev = devm_kzalloc(&pdev->dev, sizeof(*thermal_dev),
> +				   GFP_KERNEL);
> +	if (!thermal_dev) {
> +		dev_err(&pdev->dev, "kzalloc fail\n");
> +		return -ENOMEM;
> +	}
> +
> +	thermal_dev->base_addr = devm_ioremap(&pdev->dev, res->start,
> +					      resource_size(res));


I believe you forgot to request this memory. I suggest you do:
+	thermal_dev->base_addr = devm_request_and_ioremap(&pdev->dev, res->start,
+					      res);


> +	if (!thermal_dev->base_addr) {
> +		dev_err(&pdev->dev, "Failed to ioremap memory\n");
> +		return -ENOMEM;
> +	}
> +
> +	thermal = thermal_zone_device_register("orion_thermal", 0, 0,
> +				   thermal_dev, &ops, 0, 0);
> +	if (IS_ERR(thermal)) {
> +		dev_err(&pdev->dev,
> +			"Failed to register thermal zone device\n");
> +		return  PTR_ERR(thermal);
> +	}
> +
> +	platform_set_drvdata(pdev, thermal);
> +
> +	dev_info(&thermal->device,
> +		 KBUILD_MODNAME ": Thermal sensor registered\n");

Do you really need to be verbose? I suppose one can always check sysfs 
entries to see if there is a successful driver & device binding...

> +
> +	return 0;
> +}
> +
> +static int orion_thermal_exit(struct platform_device *pdev)
> +{
> +	struct thermal_zone_device *orion_thermal = platform_get_drvdata(pdev);
> +
> +	thermal_zone_device_unregister(orion_thermal);
> +	platform_set_drvdata(pdev, NULL);
> +
> +	return 0;
> +}
> +
> +static const struct of_device_id orion_thermal_id_table[] = {
> +	{ .compatible = "marvell,kirkwood-thermal" },
> +	{}
> +};
> +MODULE_DEVICE_TABLE(of, orion_thermal_id_table);
> +
> +static struct platform_driver orion_thermal_driver = {
> +	.probe = orion_thermal_probe,
> +	.remove = orion_thermal_exit,
> +	.driver = {
> +		.name = "orion_thermal",
> +		.owner = THIS_MODULE,
> +		.of_match_table = of_match_ptr(orion_thermal_id_table),
> +	},
> +};
> +
> +module_platform_driver(orion_thermal_driver);
> +
> +MODULE_AUTHOR("Nobuhiro Iwamatsu <iwamatsu@nigauri.org>");
> +MODULE_DESCRIPTION("orion thermal driver");
> +MODULE_LICENSE("GPL");
>


^ permalink raw reply

* Re: [PATCH 1/2] thermal: Add support for thermal sensor for Orion SoC
From: Andrew Lunn @ 2013-01-04  8:15 UTC (permalink / raw)
  To: Zhang Rui
  Cc: Nobuhiro Iwamatsu, linux ARM, linux-pm, Jason Cooper,
	Sebastian Hesselbarth, Thomas Petazzoni, jgunthorpe
In-Reply-To: <1357284783.2152.2.camel@rzhang1-mobl4>

On 04/01/13 08:33, Zhang Rui wrote:
> On Fri, 2012-12-14 at 23:11 +0100, Andrew Lunn wrote:
>> On Sat, Dec 15, 2012 at 06:54:17AM +0900, Nobuhiro Iwamatsu wrote:
>>> Hi,
>>>
>>> Thanks you for your work.
>>> Sorry, I dont hava a time at this week about this.
>>
>> Its not a problem. We have plenty of time before the next merge
>> window. I was just interested in seeing it work on my QNAP device, so
>> did some of the cleanup work.
>>
>>> On Fri, Dec 14, 2012 at 8:03 PM, Andrew Lunn<andrew@lunn.ch>  wrote:
>>>> From: Nobuhiro Iwamatsu<iwamatsu@nigauri.org>
>>>>
>>>> Some Orion SoC has thermal sensor.
>>>> This patch adds support for 88F6282 and 88F6283.
>>>>
>>>> Signed-off-by: Nobuhiro Iwamatsu<iwamatsu@nigauri.org>
>>>> Signed-off-by: Andrew Lunn<andrew@lunn.ch>
>>>> ---
>>>>   .../devicetree/bindings/thermal/orion-thermal.txt  |   16 +++
>>>>   drivers/thermal/Kconfig                            |    7 ++
>>>>   drivers/thermal/Makefile                           |    1 +
>>>>   drivers/thermal/orion_thermal.c                    |  133 ++++++++++++++++++++
>>>>   4 files changed, 157 insertions(+)
>>>>   create mode 100644 Documentation/devicetree/bindings/thermal/orion-thermal.txt
>>>>   create mode 100644 drivers/thermal/orion_thermal.c
>>>>
>>>> diff --git a/Documentation/devicetree/bindings/thermal/orion-thermal.txt b/Documentation/devicetree/bindings/thermal/orion-thermal.txt
>>>> new file mode 100644
>>>> index 0000000..5ce925d
>>>> --- /dev/null
>>>> +++ b/Documentation/devicetree/bindings/thermal/orion-thermal.txt
>>>> @@ -0,0 +1,16 @@
>>>> +* Orion Thermal
>>>> +
>>>> +This initial version is for Kirkwood 88F8262&  88F6283 SoCs, however
>>>> +it is expected the driver will sometime in the future be expanded to
>>>> +also support Dove, using a different compatibility string.
>>>> +
>>>> +Required properties:
>>>> +- compatible : "marvell,kirkwood-thermal"
>>>> +- reg : Address range of the thermal registers
>>>> +
>>>> +Example:
>>>> +
>>>> +       thermal@10078 {
>>>> +               compatible = "marvell,kirkwood";
>>>
>>> compatible = "marvell,kirkwood-thermal"; ?
>>
>> Yep, my error.
>>
>>>> +               reg =<0x10078 0x4>;
>>>> +       };
>>>> diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
>>>> index e1cb6bd..3bba13f 100644
>>>> --- a/drivers/thermal/Kconfig
>>>> +++ b/drivers/thermal/Kconfig
>>>> @@ -55,3 +55,10 @@ config EXYNOS_THERMAL
>>>>          help
>>>>            If you say yes here you get support for TMU (Thermal Managment
>>>>            Unit) on SAMSUNG EXYNOS series of SoC.
>>>> +
>>>> +config ORION_THERMAL
>>>> +       tristate "Temperature sensor on Marvel Orion SoCs"
>>>
>>> Marvel ->  Marvell
>>
>> Missed that one, thanks.
>>
>> Thanks for the Tested-by. I will add it to the next version.  I
>> started work on Dove support, so i will probably repost when i have
>> that ready for testing.
>
> sorry for the late response, can you resend the refreshed version on top
> of the thermal next tree?

Hi Rui

Im in the process of generalizing the driver so that it works for 
Kirkwood, Dove, Armada 370 and Armada XP. The combined driver is not 
finished yet. Once i have something ready i will repost.

	Andrew


^ permalink raw reply

* Re: [PATCH 1/1] thermal: exynos: Use of_match_ptr() macro
From: Zhang Rui @ 2013-01-04  7:36 UTC (permalink / raw)
  To: Sachin Kamat; +Cc: linux-pm, patches
In-Reply-To: <1355307864-22194-1-git-send-email-sachin.kamat@linaro.org>

On Wed, 2012-12-12 at 15:54 +0530, Sachin Kamat wrote:
> This eliminates having an #ifdef returning NULL for the case
> when OF is disabled.
> 
> Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>

applied to thermal-next.

thanks,
rui

> ---
>  drivers/thermal/exynos_thermal.c |    4 +---
>  1 files changed, 1 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/thermal/exynos_thermal.c b/drivers/thermal/exynos_thermal.c
> index 7772d16..3f6a54a 100644
> --- a/drivers/thermal/exynos_thermal.c
> +++ b/drivers/thermal/exynos_thermal.c
> @@ -800,8 +800,6 @@ static const struct of_device_id exynos_tmu_match[] = {
>  	{},
>  };
>  MODULE_DEVICE_TABLE(of, exynos_tmu_match);
> -#else
> -#define  exynos_tmu_match NULL
>  #endif
>  
>  static struct platform_device_id exynos_tmu_driver_ids[] = {
> @@ -982,7 +980,7 @@ static struct platform_driver exynos_tmu_driver = {
>  		.name   = "exynos-tmu",
>  		.owner  = THIS_MODULE,
>  		.pm     = EXYNOS_TMU_PM,
> -		.of_match_table = exynos_tmu_match,
> +		.of_match_table = of_match_ptr(exynos_tmu_match),
>  	},
>  	.probe = exynos_tmu_probe,
>  	.remove	= __devexit_p(exynos_tmu_remove),



^ permalink raw reply

* Re: [PATCH 1/2] thermal: db8500: Use of_match_ptr() macro in db8500_thermal.c
From: Zhang Rui @ 2013-01-04  7:36 UTC (permalink / raw)
  To: Sachin Kamat; +Cc: linux-pm, hongbo.zhang, patches, Hongbo Zhang
In-Reply-To: <1355907059-28720-1-git-send-email-sachin.kamat@linaro.org>

On Wed, 2012-12-19 at 14:20 +0530, Sachin Kamat wrote:
> This eliminates having an #ifdef returning NULL for the case
> when OF is disabled.
> 
> Cc: Hongbo Zhang <hongbo.zhang@stericsson.com>
> Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>

applied to thermal-next.

thanks,
rui

> ---
> Compile tested on linux-next.
> ---
>  drivers/thermal/db8500_thermal.c |    4 +---
>  1 files changed, 1 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/thermal/db8500_thermal.c b/drivers/thermal/db8500_thermal.c
> index ec71ade..61ce60a 100644
> --- a/drivers/thermal/db8500_thermal.c
> +++ b/drivers/thermal/db8500_thermal.c
> @@ -508,15 +508,13 @@ static const struct of_device_id db8500_thermal_match[] = {
>  	{ .compatible = "stericsson,db8500-thermal" },
>  	{},
>  };
> -#else
> -#define db8500_thermal_match NULL
>  #endif
>  
>  static struct platform_driver db8500_thermal_driver = {
>  	.driver = {
>  		.owner = THIS_MODULE,
>  		.name = "db8500-thermal",
> -		.of_match_table = db8500_thermal_match,
> +		.of_match_table = of_match_ptr(db8500_thermal_match),
>  	},
>  	.probe = db8500_thermal_probe,
>  	.suspend = db8500_thermal_suspend,



^ permalink raw reply

* Re: [PATCH 2/2] thermal: db8500: Use of_match_ptr() macro in db8500_cpufreq_cooling.c
From: Zhang Rui @ 2013-01-04  7:36 UTC (permalink / raw)
  To: Sachin Kamat; +Cc: linux-pm, hongbo.zhang, patches, Hongbo Zhang
In-Reply-To: <1355907059-28720-2-git-send-email-sachin.kamat@linaro.org>

On Wed, 2012-12-19 at 14:20 +0530, Sachin Kamat wrote:
> This eliminates having an #ifdef returning NULL for the case
> when OF is disabled.
> 
> Cc: Hongbo Zhang <hongbo.zhang@stericsson.com>
> Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>

applied to thermal-next.

thanks,
rui

> ---
> Compile tested on linux-next.
> ---
>  drivers/thermal/db8500_cpufreq_cooling.c |    5 ++---
>  1 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/thermal/db8500_cpufreq_cooling.c b/drivers/thermal/db8500_cpufreq_cooling.c
> index 4cf8e72..2141985 100644
> --- a/drivers/thermal/db8500_cpufreq_cooling.c
> +++ b/drivers/thermal/db8500_cpufreq_cooling.c
> @@ -21,6 +21,7 @@
>  #include <linux/cpufreq.h>
>  #include <linux/err.h>
>  #include <linux/module.h>
> +#include <linux/of.h>
>  #include <linux/platform_device.h>
>  #include <linux/slab.h>
>  
> @@ -73,15 +74,13 @@ static const struct of_device_id db8500_cpufreq_cooling_match[] = {
>  	{ .compatible = "stericsson,db8500-cpufreq-cooling" },
>  	{},
>  };
> -#else
> -#define db8500_cpufreq_cooling_match NULL
>  #endif
>  
>  static struct platform_driver db8500_cpufreq_cooling_driver = {
>  	.driver = {
>  		.owner = THIS_MODULE,
>  		.name = "db8500-cpufreq-cooling",
> -		.of_match_table = db8500_cpufreq_cooling_match,
> +		.of_match_table = of_match_ptr(db8500_cpufreq_cooling_match),
>  	},
>  	.probe = db8500_cpufreq_cooling_probe,
>  	.suspend = db8500_cpufreq_cooling_suspend,



^ permalink raw reply

* Re: [PATCH 1/2] thermal: Add support for thermal sensor for Orion SoC
From: Zhang Rui @ 2013-01-04  7:33 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Nobuhiro Iwamatsu, linux ARM, linux-pm, Jason Cooper,
	Sebastian Hesselbarth, Thomas Petazzoni, jgunthorpe
In-Reply-To: <20121214221159.GE7717@lunn.ch>

On Fri, 2012-12-14 at 23:11 +0100, Andrew Lunn wrote:
> On Sat, Dec 15, 2012 at 06:54:17AM +0900, Nobuhiro Iwamatsu wrote:
> > Hi,
> > 
> > Thanks you for your work.
> > Sorry, I dont hava a time at this week about this.
> 
> Its not a problem. We have plenty of time before the next merge
> window. I was just interested in seeing it work on my QNAP device, so
> did some of the cleanup work.
> 
> > On Fri, Dec 14, 2012 at 8:03 PM, Andrew Lunn <andrew@lunn.ch> wrote:
> > > From: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
> > >
> > > Some Orion SoC has thermal sensor.
> > > This patch adds support for 88F6282 and 88F6283.
> > >
> > > Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
> > > Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> > > ---
> > >  .../devicetree/bindings/thermal/orion-thermal.txt  |   16 +++
> > >  drivers/thermal/Kconfig                            |    7 ++
> > >  drivers/thermal/Makefile                           |    1 +
> > >  drivers/thermal/orion_thermal.c                    |  133 ++++++++++++++++++++
> > >  4 files changed, 157 insertions(+)
> > >  create mode 100644 Documentation/devicetree/bindings/thermal/orion-thermal.txt
> > >  create mode 100644 drivers/thermal/orion_thermal.c
> > >
> > > diff --git a/Documentation/devicetree/bindings/thermal/orion-thermal.txt b/Documentation/devicetree/bindings/thermal/orion-thermal.txt
> > > new file mode 100644
> > > index 0000000..5ce925d
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/thermal/orion-thermal.txt
> > > @@ -0,0 +1,16 @@
> > > +* Orion Thermal
> > > +
> > > +This initial version is for Kirkwood 88F8262 & 88F6283 SoCs, however
> > > +it is expected the driver will sometime in the future be expanded to
> > > +also support Dove, using a different compatibility string.
> > > +
> > > +Required properties:
> > > +- compatible : "marvell,kirkwood-thermal"
> > > +- reg : Address range of the thermal registers
> > > +
> > > +Example:
> > > +
> > > +       thermal@10078 {
> > > +               compatible = "marvell,kirkwood";
> > 
> > compatible = "marvell,kirkwood-thermal"; ?
> 
> Yep, my error.
> 
> > > +               reg = <0x10078 0x4>;
> > > +       };
> > > diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
> > > index e1cb6bd..3bba13f 100644
> > > --- a/drivers/thermal/Kconfig
> > > +++ b/drivers/thermal/Kconfig
> > > @@ -55,3 +55,10 @@ config EXYNOS_THERMAL
> > >         help
> > >           If you say yes here you get support for TMU (Thermal Managment
> > >           Unit) on SAMSUNG EXYNOS series of SoC.
> > > +
> > > +config ORION_THERMAL
> > > +       tristate "Temperature sensor on Marvel Orion SoCs"
> > 
> > Marvel -> Marvell
> 
> Missed that one, thanks.
> 
> Thanks for the Tested-by. I will add it to the next version.  I
> started work on Dove support, so i will probably repost when i have
> that ready for testing.

sorry for the late response, can you resend the refreshed version on top
of the thermal next tree?

thanks,
rui


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox