Linux Power Management development
 help / color / mirror / Atom feed
* Re: [PATCHv3] ARM: EXYNOS4: Support for generic I/O power domains on EXYNOS4210
From: Chanwoo Choi @ 2011-07-16  3:58 UTC (permalink / raw)
  To: Kukjin Kim
  Cc: 'Sylwester Nawrocki', 'linux-samsung-soc',
	'linux-kernel', 'Kyungmin Park',
	'MyungJoo Ham', linux-pm
In-Reply-To: <035201cc4367$2b165c80$81431580$%kim@samsung.com>

Kukjin Kim wrote:
> Chanwoo Choi wrote:
>> Kukjin Kim wrote:
> 
> (snip)
> 
>>>> @@ -183,6 +183,7 @@ config MACH_NURI
>>>>  	select EXYNOS4_SETUP_SDHCI
>>>>  	select EXYNOS4_SETUP_USB_PHY
>>>>  	select SAMSUNG_DEV_PWM
>>>> +	select PM_GENERIC_DOMAINS if PM
>>> Do you _really_ think this should be under MACH_NURI?
>>>
>> This patch apply the generic power-domain to NURI board.
> 
> I mean, the PM_GENERIC_DOMAINS feature depends on CPU not board even though this is for only NURI board.
> 
OK, I will move it from CONFIG_MACH_NURI to CONFIG_CPU_EXYNOS4210 in arch/arm/mach-exynos4/Kconfig.

> (snip)
> 
>>>> diff --git a/arch/arm/mach-exynos4/include/mach/pm-exynos4210.h
>>>> b/arch/arm/mach-exynos4/include/mach/pm-exynos4210.h
>>>> new file mode 100644
>>>> index 0000000..ab09034
>>>> --- /dev/null
>>>> +++ b/arch/arm/mach-exynos4/include/mach/pm-exynos4210.h
>>>> @@ -0,0 +1,53 @@
>>>> +/* linux/arch/arm/mach-exynos4/include/mach/pm-exynos4.h
>>> According to your file name, should be linux/arch/arm/mach-
>> exynos4/include/mach/pm-exynos4210.h ?
>>
> I said just typo...
> 
> +/* linux/arch/arm/mach-exynos4/include/mach/pm-exynos4210.h
> 
>> This patch support only power-domain of EXYNOS4210, so I named "pm-
>> exynos4210.h"
>> If EXYNOS4 series board will be released, I will modify include file name
>> from "pm-exynos4210.h" to "pm-exynos4.h" to support all EXYNOS4 series.
> 
> No need it, it's different.

My mistake. I will fix it.
> 
> (snip)
> 
>>> Would be better if you could add some common function for pd_power_up and
>> _down...
>>
>> What do you mean about common function?
>>
> I thought, pd_power_up and _down can be implemented in one function.
> 

OK, If you want to combine two function(pd_power_up and pd_power_down),
I will modify it.

> (snip)
> 
>>> I wonder why we need to add this for handling power domain not update current
>> pd.
>>
> See arch/arm/mach-exynos4/dev-pd.c
> 
> I mean, we _really_ need to keep duplication files for handling power domain?...

Now, you're right. 
But, If Generic Power-domain Framework will be applied to the mainline,
I think it necessary that we should apply Generic Power-domain Framework to all of 
the board using EXYNOS4210(arch/arm/mach-exynos4) and delete old power-domain
(arch/arm/mach-exynos4/dev-pd.c) to remove duplication file.

Thanks & Regards,
Chanwoo Choi

^ permalink raw reply

* Re: [PATCHv3] ARM: EXYNOS4: Support for generic I/O power domains on EXYNOS4210
From: Kukjin Kim @ 2011-07-16  3:19 UTC (permalink / raw)
  To: 'Chanwoo Choi'
  Cc: 'Sylwester Nawrocki', 'linux-samsung-soc',
	'linux-kernel', 'Kyungmin Park',
	'MyungJoo Ham', linux-pm
In-Reply-To: <4E16A171.7040509@samsung.com>

Chanwoo Choi wrote:
> 
> Kukjin Kim wrote:

(snip)

> >> @@ -183,6 +183,7 @@ config MACH_NURI
> >>  	select EXYNOS4_SETUP_SDHCI
> >>  	select EXYNOS4_SETUP_USB_PHY
> >>  	select SAMSUNG_DEV_PWM
> >> +	select PM_GENERIC_DOMAINS if PM
> >
> > Do you _really_ think this should be under MACH_NURI?
> >
> This patch apply the generic power-domain to NURI board.

I mean, the PM_GENERIC_DOMAINS feature depends on CPU not board even though this is for only NURI board.

(snip)

> >> diff --git a/arch/arm/mach-exynos4/include/mach/pm-exynos4210.h
> >> b/arch/arm/mach-exynos4/include/mach/pm-exynos4210.h
> >> new file mode 100644
> >> index 0000000..ab09034
> >> --- /dev/null
> >> +++ b/arch/arm/mach-exynos4/include/mach/pm-exynos4210.h
> >> @@ -0,0 +1,53 @@
> >> +/* linux/arch/arm/mach-exynos4/include/mach/pm-exynos4.h
> >
> > According to your file name, should be linux/arch/arm/mach-
> exynos4/include/mach/pm-exynos4210.h ?
> 
I said just typo...

+/* linux/arch/arm/mach-exynos4/include/mach/pm-exynos4210.h

> This patch support only power-domain of EXYNOS4210, so I named "pm-
> exynos4210.h"
> If EXYNOS4 series board will be released, I will modify include file name
> from "pm-exynos4210.h" to "pm-exynos4.h" to support all EXYNOS4 series.

No need it, it's different.

(snip)

> > Would be better if you could add some common function for pd_power_up and
> _down...
> 
> What do you mean about common function?
> 
I thought, pd_power_up and _down can be implemented in one function.

(snip)

> > I wonder why we need to add this for handling power domain not update current
> pd.
> 
See arch/arm/mach-exynos4/dev-pd.c

I mean, we _really_ need to keep duplication files for handling power domain?...

(snip)

Thanks.

Best regards,
Kgene.
--
Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer,
SW Solution Development Team, Samsung Electronics Co., Ltd.

^ permalink raw reply

* [PATCH] MIPS: Convert i8259.c to using syscore_ops (was: Re: Status of MIPS on 3.0.0-rc6 kernel)
From: Rafael J. Wysocki @ 2011-07-15 21:53 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: linux-mips@linux-mips.org, Roland Vossen,
	linux-kernel@vger.kernel.org, Geert Uytterhoeven, Jonas Gorski,
	devel@linuxdriverproject.org, Linux PM mailing list
In-Reply-To: <4E2032D7.9000704@broadcom.com>

On Friday, July 15, 2011, Roland Vossen wrote:
> > Please check if the appended patch helps.
> 
> It does, I am able to build a big endian MIPS kernel now. Can you notify 
> me if you submit this patch ?

Well, it's been submitted already. :-)

Ralf, the appended patch is necessary to fix build on MIPS due to a
missing conversion to syscore_ops.  Please take it to your tree or
let me know if you want me to push it myself.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: MIPS: Convert i8259.c to using syscore_ops

The code in arch/mips/kernel/i8259.c still hasn't been converted to
using struct syscore_ops instead of a sysdev for resume and shutdown.
As a result, this code doesn't build any more after suspend, resume
and shutdown callbacks have been removed from struct sysdev_class.
Fix this problem by converting i8259.c to using syscore_ops.

Reported-and-tested-by: Roland Vossen <rvossen@broadcom.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/mips/kernel/i8259.c |   22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

Index: linux-2.6/arch/mips/kernel/i8259.c
===================================================================
--- linux-2.6.orig/arch/mips/kernel/i8259.c
+++ linux-2.6/arch/mips/kernel/i8259.c
@@ -14,7 +14,7 @@
 #include <linux/interrupt.h>
 #include <linux/kernel.h>
 #include <linux/spinlock.h>
-#include <linux/sysdev.h>
+#include <linux/syscore_ops.h>
 #include <linux/irq.h>
 
 #include <asm/i8259.h>
@@ -215,14 +215,13 @@ spurious_8259A_irq:
 	}
 }
 
-static int i8259A_resume(struct sys_device *dev)
+static void i8259A_resume(void)
 {
 	if (i8259A_auto_eoi >= 0)
 		init_8259A(i8259A_auto_eoi);
-	return 0;
 }
 
-static int i8259A_shutdown(struct sys_device *dev)
+static void i8259A_shutdown(void)
 {
 	/* Put the i8259A into a quiescent state that
 	 * the kernel initialization code can get it
@@ -232,26 +231,17 @@ static int i8259A_shutdown(struct sys_de
 		outb(0xff, PIC_MASTER_IMR);	/* mask all of 8259A-1 */
 		outb(0xff, PIC_SLAVE_IMR);	/* mask all of 8259A-1 */
 	}
-	return 0;
 }
 
-static struct sysdev_class i8259_sysdev_class = {
-	.name = "i8259",
+static struct syscore_ops i8259_syscore_ops = {
 	.resume = i8259A_resume,
 	.shutdown = i8259A_shutdown,
 };
 
-static struct sys_device device_i8259A = {
-	.id	= 0,
-	.cls	= &i8259_sysdev_class,
-};
-
 static int __init i8259A_init_sysfs(void)
 {
-	int error = sysdev_class_register(&i8259_sysdev_class);
-	if (!error)
-		error = sysdev_register(&device_i8259A);
-	return error;
+	register_syscore_ops(&i8259_syscore_ops);
+	return 0;
 }
 
 device_initcall(i8259A_init_sysfs);
 

^ permalink raw reply

* [PATCH v3] x86 intel power: Initialize MSR_IA32_ENERGY_PERF_BIAS
From: Len Brown @ 2011-07-15 21:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andrew Morton, x86, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Linus Torvalds, linux-pm, Alan Cox, Arjan van de Ven
In-Reply-To: <alpine.LFD.2.02.1107140051020.18606@x980>

From: Len Brown <len.brown@intel.com>

Since 2.6.36 (23016bf0d25), Linux prints the existence of "epb" in /proc/cpuinfo,
Since 2.6.38 (d5532ee7b40), the x86_energy_perf_policy(8) utility has
been available in-tree to update MSR_IA32_ENERGY_PERF_BIAS.

However, the typical BIOS fails to initialize the MSR, presumably
because this is handled by high-volume shrink-wrap operating systems...

Linux distros, on the other hand, do not yet invoke x86_energy_perf_policy(8).
As a result, WSM-EP, SNB, and later hardware from Intel will run in its
default hardware power-on state (performance), which assumes that users
care for performance at all costs and not for energy efficiency.
While that is fine for performance benchmarks, the hardware's intended default
operating point is "normal" mode...

Initialize the MSR to the "normal" by default during kernel boot.

x86_energy_perf_policy(8) is available to change the default after boot,
should the user have a different preference.

cc: stable@kernel.org
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
---
v3: fix #define typo in header and ',' typo in printk
shorten printk to fix in 80 columns

 arch/x86/include/asm/msr-index.h |    3 +++
 arch/x86/kernel/cpu/intel.c      |   18 ++++++++++++++++++
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 43a18c7..55a11e0 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -250,6 +250,9 @@
 #define MSR_IA32_TEMPERATURE_TARGET	0x000001a2
 
 #define MSR_IA32_ENERGY_PERF_BIAS	0x000001b0
+#define ENERGY_PERF_BIAS_PERFORMANCE	0
+#define ENERGY_PERF_BIAS_NORMAL		6
+#define ENERGY_PERF_BIAS_POWERSAVE	15
 
 #define MSR_IA32_PACKAGE_THERM_STATUS		0x000001b1
 
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index d16c2c5..24cba78 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -448,6 +448,24 @@ static void __cpuinit init_intel(struct cpuinfo_x86 *c)
 
 	if (cpu_has(c, X86_FEATURE_VMX))
 		detect_vmx_virtcap(c);
+
+	/*
+	 * Initialize MSR_IA32_ENERGY_PERF_BIAS if BIOS did not.
+	 * x86_energy_perf_policy(8) is available to change it at run-time
+	 */
+	if (cpu_has(c, X86_FEATURE_EPB)) {
+		u64 epb;
+
+		rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
+		if ((epb & 0xF) == ENERGY_PERF_BIAS_PERFORMANCE) {
+			printk_once(KERN_WARNING "ENERGY_PERF_BIAS:"
+				" Set to 'normal', was 'performance'\n"
+				"ENERGY_PERF_BIAS: View and update with"
+				" x86_energy_perf_policy(8)\n");
+			epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL;
+			wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
+		}
+	}
 }
 
 #ifdef CONFIG_X86_32
-- 
1.7.6.134.gcf13f

^ permalink raw reply related

* Re: [PATCH 08/18] cpuidle: create bootparam "cpuidle.off=1"
From: Deepthi Dharwar @ 2011-07-15 11:27 UTC (permalink / raw)
  To: Len Brown; +Cc: Len Brown, linux-pm, linux-kernel
In-Reply-To: <d0d4b0749cf60042ce56636fdb019ab496fb45ff.1301724243.git.len.brown@intel.com>


On Saturday 02 April 2011 11:52 AM, Len Brown wrote:
> From: Len Brown <len.brown@intel.com>
> 
> useful for disabling cpuidle to fall back
> to architecture-default idle loop
> 
> cpuidle drivers and governors will fail to register.
> on x86 they'll say so:
> 
> intel_idle: intel_idle yielding to (null)
> ACPI: acpi_idle yielding to (null)
> 
> Signed-off-by: Len Brown <len.brown@intel.com>
> ---
>  Documentation/kernel-parameters.txt |    3 +++
>  drivers/cpuidle/cpuidle.c           |   10 ++++++++++
>  drivers/cpuidle/cpuidle.h           |    1 +
>  drivers/cpuidle/driver.c            |    3 +++
>  drivers/cpuidle/governor.c          |    3 +++
>  5 files changed, 20 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index f4a04c0..08e8a22 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -546,6 +546,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>  			/proc/<pid>/coredump_filter.
>  			See also Documentation/filesystems/proc.txt.
> 
> +	cpuidle.off=1	[CPU_IDLE]
> +			disable the cpuidle sub-system
> +
>  	cpcihp_generic=	[HW,PCI] Generic port I/O CompactPCI driver
>  			Format:
>  			<first_slot>,<last_slot>,<port>,<enum_bit>[,<debug>]
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index bf50924..faae2c3 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -28,6 +28,12 @@ LIST_HEAD(cpuidle_detected_devices);
>  static void (*pm_idle_old)(void);
> 
>  static int enabled_devices;
> +static int off __read_mostly;
> +
> +int cpuidle_disabled(void)
> +{
> +	return off;
> +}
> 
>  #if defined(CONFIG_ARCH_HAS_CPU_IDLE_WAIT)
>  static void cpuidle_kick_cpus(void)
> @@ -427,6 +433,9 @@ static int __init cpuidle_init(void)
>  {
>  	int ret;
> 
> +	if (cpuidle_disabled())
> +		return -ENODEV;
> +
>  	pm_idle_old = pm_idle;
> 
>  	ret = cpuidle_add_class_sysfs(&cpu_sysdev_class);
> @@ -438,4 +447,5 @@ static int __init cpuidle_init(void)
>  	return 0;
>  }
> 
> +module_param(off, int, 0444);
>  core_initcall(cpuidle_init);
> diff --git a/drivers/cpuidle/cpuidle.h b/drivers/cpuidle/cpuidle.h
> index 33e50d5..38c3fd8 100644
> --- a/drivers/cpuidle/cpuidle.h
> +++ b/drivers/cpuidle/cpuidle.h
> @@ -13,6 +13,7 @@ extern struct list_head cpuidle_governors;
>  extern struct list_head cpuidle_detected_devices;
>  extern struct mutex cpuidle_lock;
>  extern spinlock_t cpuidle_driver_lock;
> +extern int cpuidle_disabled(void);
> 
>  /* idle loop */
>  extern void cpuidle_install_idle_handler(void);
> diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
> index fd1601e..3f7e3ce 100644
> --- a/drivers/cpuidle/driver.c
> +++ b/drivers/cpuidle/driver.c
> @@ -26,6 +26,9 @@ int cpuidle_register_driver(struct cpuidle_driver *drv)
>  	if (!drv)
>  		return -EINVAL;
> 
> +	if (cpuidle_disabled())
> +		return -ENODEV;
> +
>  	spin_lock(&cpuidle_driver_lock);
>  	if (cpuidle_curr_driver) {
>  		spin_unlock(&cpuidle_driver_lock);
> diff --git a/drivers/cpuidle/governor.c b/drivers/cpuidle/governor.c
> index 724c164..ea2f8e7 100644
> --- a/drivers/cpuidle/governor.c
> +++ b/drivers/cpuidle/governor.c
> @@ -81,6 +81,9 @@ int cpuidle_register_governor(struct cpuidle_governor *gov)
>  	if (!gov || !gov->select)
>  		return -EINVAL;
> 
> +	if (cpuidle_disabled())
> +		return -ENODEV;
> +
>  	mutex_lock(&cpuidle_lock);
>  	if (__cpuidle_find_governor(gov->name) == NULL) {
>  		ret = 0;

Hi Len,
We would like to know as to when the patch that completely removes
pm_idle for x86 "cpuidle: stop using pm_idle" and that which introduces
bootparam cpuidle.off "cpuidle: create bootparam cpuidle.off=1"
be pushed ? Are there any dependencies that needs to be addressed
before queuing it up for the next merge window ? If yes, can you please
let us know what the dependencies are. We need these patches for running 
cpuidle driver for pseries.  
https://lkml.org/lkml/2011/6/7/375

Thanks
- Deepthi

^ permalink raw reply

* [PATCH v4 3/3] PM / DEVFREQ: add sysfs interface (including user tickling)
From: MyungJoo Ham @ 2011-07-15  8:11 UTC (permalink / raw)
  To: linux-pm; +Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner
In-Reply-To: <1310717510-19002-1-git-send-email-myungjoo.ham@samsung.com>

1. System-wide sysfs interface /sys/power/
- tickle_all	R: number of tickle_all execution
		W: tickle all devfreq devices
- min_interval	R: devfreq monitoring base interval in ms
- monitoring	R: shows whether devfreq monitoring is active or
 not.

2. Device specific sysfs interface /sys/devices/.../power/devfreq_*
- tickle	R: number of tickle execution for the device
		W: tickle the device
- governor	R: name of governor
- cur_freq	R: current frequency
- max_freq	R: maximum operable frequency
- min_freq	R: minimum operable frequency
- polling_interval	R: polling interval in ms given with devfreq profile

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

--
Changed from v3
- corrected sysfs API usage
- corrected error messages
- moved sysfs entry location
- added sysfs entries

Changed from v2
- add ABI entries for devfreq sysfs interface
---
 Documentation/ABI/testing/sysfs-devices-power |   50 ++++++
 Documentation/ABI/testing/sysfs-power         |   43 +++++
 drivers/base/power/devfreq.c                  |  232 +++++++++++++++++++++++++
 include/linux/devfreq.h                       |    3 +
 4 files changed, 328 insertions(+), 0 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-devices-power b/Documentation/ABI/testing/sysfs-devices-power
index 8ffbc25..692f845 100644
--- a/Documentation/ABI/testing/sysfs-devices-power
+++ b/Documentation/ABI/testing/sysfs-devices-power
@@ -165,3 +165,53 @@ Description:
 
 		Not all drivers support this attribute.  If it isn't supported,
 		attempts to read or write it will yield I/O errors.
+
+What:		/sys/devices/.../power/devfreq_tickle
+Date:		July 2011
+Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+		The /sys/devices/.../power/devfreq_tickle file allows users
+		to force the corresponding device to operate at its maximum
+		operable frequency instaneously and temporarily. After a
+		designated duration has passed, the operating frequency returns
+		to normal. When a user reads the tickle entry, it returns
+		the number of tickle executions for the device. When a user
+		writes to the tickle entry with the tickle duration in ms,
+		the effect of device tickling is held for the designated
+		duration. Note that the duration is rounded-up by
+		the value DEVFREQ_INTERVAL defined in devfreq.c
+
+What:		/sys/devices/.../power/devfreq_governor
+Date:		July 2011
+Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+		The /sys/devices/.../power/devfreq_governor shows the name
+		of the governor used by the corresponding device.
+
+What:		/sys/devices/.../power/devfreq_cur_freq
+Date:		July 2011
+Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+		The /sys/devices/.../power/devfreq_cur_freq shows the current
+		frequency of the corresponding device.
+
+What:		/sys/devices/.../power/devfreq_max_freq
+Date:		July 2011
+Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+		The /sys/devices/.../power/devfreq_cur_freq shows the
+		maximum operable frequency of the corresponding device.
+
+What:		/sys/devices/.../power/devfreq_min_freq
+Date:		July 2011
+Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+		The /sys/devices/.../power/devfreq_cur_freq shows the
+		minimum operable frequency of the corresponding device.
+
+What:		/sys/devices/.../power/devfreq_polling_interval
+Date:		July 2011
+Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+		The /sys/devices/.../power/devfreq_polling_interval shows the
+		requested polling interval of the corresponding device.
diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power
index b464d12..4d8434b 100644
--- a/Documentation/ABI/testing/sysfs-power
+++ b/Documentation/ABI/testing/sysfs-power
@@ -172,3 +172,46 @@ Description:
 
 		Reading from this file will display the current value, which is
 		set to 1 MB by default.
+
+What:		/sys/power/devfreq/
+Date:		May 2011
+Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+		The /sys/power/devfreq directory will contain files that will
+		provide a unified interface to the DEVFREQ, a generic DVFS
+		(dynamic voltage and frequency scaling) framework.
+
+What:		/sys/power/devfreq/tickle_all
+Date:		May 2011
+Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+		The /sys/power/devfreq/tickle_all file allows user space to
+		force every device with DEVFREQ to operate at the maximum
+		frequency of the device instaneously and temporarily. After
+		a designated delay has passed, the operating frequency returns
+		to normal. If a user reads the tickle_all entry, it returns
+		the number of tickle_all executions. When writing to the
+		tickle_all entry, the user should supply with the duration of
+		tickle in ms (the "designated delay" mentioned before). Then,
+		the effect of tickle_all will hold for the denoted duration.
+		Note that the duration is rounded by the monitoring period
+		defined by DEVFREQ_INTERVAL in /drivers/base/power/devfreq.c.
+
+What:		/sys/power/devfreq/min_interval
+Date:		May 2011
+Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+		The /sys/power/devfreq/min_interval file shows the monitoring
+		period defined by DEVFREQ_INTERVAL in
+		/drivers/base/power/devfreq.c. The duration of device tickling
+		is rounded-up by DEVFREQ_INTERVAL.
+
+What:		/sys/power/devfreq/monitoring
+Date:		May 2011
+Contact:	MyungJoo Ham <myungjoo.ham@samsung.com>
+Description:
+		The /sys/power/devfreq/monitoring file shows whether DEVFREQ
+		is periodically monitoring. Periodic monitoring is activated
+		if there is a device that wants periodic monitoring for DVFS or
+		there is a device that is tickled (and the tickling duration is
+		not yet expired).
diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
index e5a73aa..a62e757 100644
--- a/drivers/base/power/devfreq.c
+++ b/drivers/base/power/devfreq.c
@@ -40,6 +40,9 @@ static struct delayed_work devfreq_work;
 static LIST_HEAD(devfreq_list);
 static DEFINE_MUTEX(devfreq_list_lock);
 
+static struct kobject *devfreq_kobj;
+static struct attribute_group dev_attr_group;
+
 /**
  * find_device_devfreq() - find devfreq struct using device pointer
  * @dev:	device pointer used to lookup device devfreq.
@@ -151,6 +154,8 @@ static void devfreq_monitor(struct work_struct *work)
 					"devfreq is removed from the device\n",
 					error);
 
+				sysfs_remove_group(&devfreq->dev->kobj,
+						   &dev_attr_group);
 				list_del(&devfreq->node);
 				kfree(devfreq);
 
@@ -218,6 +223,8 @@ int devfreq_add_device(struct device *dev, struct devfreq_dev_profile *profile,
 		queue_delayed_work(devfreq_wq, &devfreq_work,
 				   msecs_to_jiffies(DEVFREQ_INTERVAL));
 	}
+
+	sysfs_merge_group(&dev->kobj, &dev_attr_group);
 out:
 	mutex_unlock(&devfreq_list_lock);
 
@@ -244,6 +251,8 @@ int devfreq_remove_device(struct device *dev)
 		return -EINVAL;
 	}
 
+	sysfs_unmerge_group(&dev->kobj, &dev_attr_group);
+
 	list_del(&devfreq->node);
 
 	kfree(devfreq);
@@ -378,6 +387,215 @@ int devfreq_tickle_device(struct device *dev, unsigned long duration_ms)
 	return err;
 }
 
+static int num_tickle_all;
+
+static ssize_t tickle_all_store(struct kobject *kobj,
+				struct kobj_attribute *attr, const char *buf,
+				size_t count)
+{
+	int duration = 0;
+	struct devfreq *tmp;
+	unsigned long delay;
+
+	sscanf(buf, "%d", &duration);
+	if (duration < DEVFREQ_INTERVAL)
+		duration = DEVFREQ_INTERVAL;
+
+	delay = DIV_ROUND_UP(duration, DEVFREQ_INTERVAL);
+
+	mutex_lock(&devfreq_list_lock);
+	list_for_each_entry(tmp, &devfreq_list, node) {
+		_devfreq_tickle_device(tmp, delay);
+	}
+	mutex_unlock(&devfreq_list_lock);
+
+	num_tickle_all++;
+	return count;
+}
+
+static ssize_t tickle_all_show(struct kobject *kobj,
+				   struct kobj_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", num_tickle_all);
+}
+
+static ssize_t min_interval_show(struct kobject *kobj,
+				 struct kobj_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", DEVFREQ_INTERVAL);
+}
+
+static ssize_t monitoring_show(struct kobject *kobj,
+			       struct kobj_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", polling ? 1 : 0);
+}
+
+static struct kobj_attribute tickle_all_attr = {
+	.attr = {
+		.name = "tickle_all",
+		.mode = 0644,
+	},
+	.show = tickle_all_show,
+	.store = tickle_all_store,
+};
+static struct kobj_attribute min_interval_attr = {
+	.attr = {
+		.name = "min_interval",
+		.mode = 0444,
+	},
+	.show = min_interval_show,
+};
+static struct kobj_attribute monitoring_attr = {
+	.attr = {
+		.name = "monitoring",
+		.mode = 0444,
+	},
+	.show = monitoring_show,
+};
+static struct attribute *devfreq_entries[] = {
+	&tickle_all_attr.attr,
+	&min_interval_attr.attr,
+	&monitoring_attr.attr,
+	NULL,
+};
+static struct attribute_group devfreq_attr_group = {
+	.name	= NULL,
+	.attrs	= devfreq_entries,
+};
+
+static ssize_t tickle(struct device *dev, struct device_attribute *attr,
+		      const char *buf, size_t count)
+{
+	int duration;
+	struct devfreq *df;
+	unsigned long delay;
+
+	sscanf(buf, "%d", &duration);
+	if (duration < DEVFREQ_INTERVAL)
+		duration = DEVFREQ_INTERVAL;
+
+	if (unlikely(IS_ERR_OR_NULL(dev))) {
+		pr_err("%s: Null or invalid device.\n", __func__);
+		return -EINVAL;
+	}
+
+	delay = DIV_ROUND_UP(duration, DEVFREQ_INTERVAL);
+
+	mutex_lock(&devfreq_list_lock);
+	df = find_device_devfreq(dev);
+	_devfreq_tickle_device(df, delay);
+	mutex_unlock(&devfreq_list_lock);
+
+	return count;
+}
+
+static ssize_t show_num_tickle(struct device *dev,
+			       struct device_attribute *attr, char *buf)
+{
+	struct devfreq *df = find_device_devfreq(dev);
+
+	if (!IS_ERR(df))
+		return sprintf(buf, "%d\n", df->num_tickle);
+
+	return PTR_ERR(df);
+}
+
+static ssize_t show_governor(struct device *dev,
+			     struct device_attribute *attr, char *buf)
+{
+	struct devfreq *df = find_device_devfreq(dev);
+
+	if (IS_ERR(df))
+		return PTR_ERR(df);
+	if (!df->governor)
+		return -EINVAL;
+
+	return sprintf(buf, "%s\n", df->governor->name);
+}
+
+static ssize_t show_freq(struct device *dev,
+			 struct device_attribute *attr, char *buf)
+{
+	struct devfreq *df = find_device_devfreq(dev);
+
+	if (IS_ERR(df))
+		return PTR_ERR(df);
+
+	return sprintf(buf, "%lu\n", df->previous_freq);
+}
+
+static ssize_t show_max_freq(struct device *dev,
+			     struct device_attribute *attr, char *buf)
+{
+	struct devfreq *df = find_device_devfreq(dev);
+	unsigned long freq = ULONG_MAX;
+	struct opp *opp;
+
+	if (IS_ERR(df))
+		return PTR_ERR(df);
+	if (!df->dev)
+		return -EINVAL;
+
+	opp = opp_find_freq_floor(df->dev, &freq);
+	if (IS_ERR(opp))
+		return PTR_ERR(opp);
+
+	return sprintf(buf, "%lu\n", freq);
+}
+
+static ssize_t show_min_freq(struct device *dev,
+			     struct device_attribute *attr, char *buf)
+{
+	struct devfreq *df = find_device_devfreq(dev);
+	unsigned long freq = 0;
+	struct opp *opp;
+
+	if (IS_ERR(df))
+		return PTR_ERR(df);
+	if (!df->dev)
+		return -EINVAL;
+
+	opp = opp_find_freq_ceil(df->dev, &freq);
+	if (IS_ERR(opp))
+		return PTR_ERR(opp);
+
+	return sprintf(buf, "%lu\n", freq);
+}
+
+static ssize_t show_polling_interval(struct device *dev,
+				     struct device_attribute *attr, char *buf)
+{
+	struct devfreq *df = find_device_devfreq(dev);
+
+	if (IS_ERR(df))
+		return PTR_ERR(df);
+	if (!df->profile)
+		return -EINVAL;
+
+	return sprintf(buf, "%d\n", df->profile->polling_ms);
+}
+
+static DEVICE_ATTR(devfreq_tickle, 0644, show_num_tickle, tickle);
+static DEVICE_ATTR(devfreq_governor, 0444, show_governor, NULL);
+static DEVICE_ATTR(devfreq_cur_freq, 0444, show_freq, NULL);
+static DEVICE_ATTR(devfreq_max_freq, 0444, show_max_freq, NULL);
+static DEVICE_ATTR(devfreq_min_freq, 0444, show_min_freq, NULL);
+static DEVICE_ATTR(devfreq_polling_interval, 0444, show_polling_interval, NULL);
+static struct attribute *dev_entries[] = {
+	&dev_attr_devfreq_tickle.attr,
+	&dev_attr_devfreq_governor.attr,
+	&dev_attr_devfreq_cur_freq.attr,
+	&dev_attr_devfreq_max_freq.attr,
+	&dev_attr_devfreq_min_freq.attr,
+	&dev_attr_devfreq_polling_interval.attr,
+	NULL,
+};
+static struct attribute_group dev_attr_group = {
+	.name	= power_group_name,
+	.attrs	= dev_entries,
+};
+
 /**
  * devfreq_init() - Initialize data structure for devfreq framework and
  *		  start polling registered devfreq devices.
@@ -389,6 +607,20 @@ static int __init devfreq_init(void)
 	polling = false;
 	devfreq_wq = create_freezable_workqueue("devfreq_wq");
 	INIT_DELAYED_WORK_DEFERRABLE(&devfreq_work, devfreq_monitor);
+
+#ifdef CONFIG_PM
+	/* Create sysfs */
+	devfreq_kobj = kobject_create_and_add("devfreq", power_kobj);
+	if (!devfreq_kobj) {
+		pr_err("Unable to create devfreq kobject.\n");
+		goto out;
+	}
+	if (sysfs_create_group(devfreq_kobj, &devfreq_attr_group)) {
+		pr_err("Unable to create devfreq sysfs entries.\n");
+		goto out;
+	}
+#endif
+out:
 	mutex_unlock(&devfreq_list_lock);
 
 	devfreq_monitor(&devfreq_work.work);
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index baa074c..f6e4e3b 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -62,6 +62,7 @@ struct devfreq_governor {
  *		at each executino of devfreq_monitor, tickle is decremented.
  *		User may tickle a device-devfreq in order to set maximum
  *		frequency instaneously with some guaranteed duration.
+ * @num_tickle	number of tickle calls.
  *
  * This structure stores the DEVFREQ information for a give device.
  */
@@ -75,6 +76,8 @@ struct devfreq {
 	unsigned long previous_freq;
 	unsigned int next_polling;
 	unsigned int tickle;
+
+	unsigned int num_tickle;
 };
 
 #if defined(CONFIG_PM_DEVFREQ)
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH v4 2/3] PM / DEVFREQ: add example governors
From: MyungJoo Ham @ 2011-07-15  8:11 UTC (permalink / raw)
  To: linux-pm; +Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner
In-Reply-To: <1310717510-19002-1-git-send-email-myungjoo.ham@samsung.com>

Three CPUFREQ-like governors are provided as examples.

powersave: use the lowest frequency possible. The user (device) should
set the polling_ms as 0 because polling is useless for this governor.

performance: use the highest freqeuncy possible. The user (device)
should set the polling_ms as 0 because polling is useless for this
governor.

simple_ondemand: simplified version of CPUFREQ's ONDEMAND governor.

When a user updates OPP entries (enable/disable/add), OPP framework
automatically notifies DEVFREQ to update operating frequency
accordingly. Thus, DEVFREQ users (device drivers) do not need to update
DEVFREQ manually with OPP entry updates or set polling_ms for powersave
, performance, or any other "static" governors.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

---
Changes from v3:
- Bugfixes on simple-ondemand governor (divide by zero / overflow)
- Style fixes
- Give names to governors
---
 drivers/base/power/devfreq.c |   85 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/devfreq.h      |    5 ++
 2 files changed, 90 insertions(+), 0 deletions(-)

diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
index aba9768..e5a73aa 100644
--- a/drivers/base/power/devfreq.c
+++ b/drivers/base/power/devfreq.c
@@ -395,3 +395,88 @@ static int __init devfreq_init(void)
 	return 0;
 }
 late_initcall(devfreq_init);
+
+static int devfreq_powersave_func(struct devfreq *df,
+				  unsigned long *freq)
+{
+	*freq = 0; /* devfreq_do will run "ceiling" to 0 */
+	return 0;
+}
+
+struct devfreq_governor devfreq_powersave = {
+	.name = "powersave",
+	.get_target_freq = devfreq_powersave_func,
+};
+
+static int devfreq_performance_func(struct devfreq *df,
+				    unsigned long *freq)
+{
+	*freq = UINT_MAX; /* devfreq_do will run "floor" */
+	return 0;
+}
+
+struct devfreq_governor devfreq_performance = {
+	.name = "performance",
+	.get_target_freq = devfreq_performance_func,
+};
+
+/* Constants for DevFreq-Simple-Ondemand (DFSO) */
+#define DFSO_UPTHRESHOLD	(90)
+#define DFSO_DOWNDIFFERENCTIAL	(5)
+static int devfreq_simple_ondemand_func(struct devfreq *df,
+					unsigned long *freq)
+{
+	struct devfreq_dev_status stat;
+	int err = df->profile->get_dev_status(df->dev, &stat);
+	unsigned long long a, b;
+
+	if (err)
+		return err;
+
+	/* Assume MAX if it is going to be divided by zero */
+	if (stat.total_time == 0) {
+		*freq = UINT_MAX;
+		return 0;
+	}
+
+	/* Prevent overflow */
+	if (stat.busy_time >= (1 << 24) || stat.total_time >= (1 << 24)) {
+		stat.busy_time >>= 7;
+		stat.total_time >>= 7;
+	}
+
+	/* Set MAX if it's busy enough */
+	if (stat.busy_time * 100 >
+	    stat.total_time * DFSO_UPTHRESHOLD) {
+		*freq = UINT_MAX;
+		return 0;
+	}
+
+	/* Set MAX if we do not know the initial frequency */
+	if (stat.current_frequency == 0) {
+		*freq = UINT_MAX;
+		return 0;
+	}
+
+	/* Keep the current frequency */
+	if (stat.busy_time * 100 >
+	    stat.total_time * (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL)) {
+		*freq = stat.current_frequency;
+		return 0;
+	}
+
+	/* Set the desired frequency based on the load */
+	a = stat.busy_time;
+	a *= stat.current_frequency;
+	b = div_u64(a, stat.total_time);
+	b *= 100;
+	b = div_u64(b, (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL / 2));
+	*freq = (unsigned long) b;
+
+	return 0;
+}
+
+struct devfreq_governor devfreq_simple_ondemand = {
+	.name = "simple_ondemand",
+	.get_target_freq = devfreq_simple_ondemand_func,
+};
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index 7c881cc..baa074c 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -84,6 +84,11 @@ extern int devfreq_add_device(struct device *dev,
 extern int devfreq_remove_device(struct device *dev);
 extern int devfreq_update(struct device *dev);
 extern int devfreq_tickle_device(struct device *dev, unsigned long duration_ms);
+
+extern struct devfreq_governor devfreq_powersave;
+extern struct devfreq_governor devfreq_performance;
+extern struct devfreq_governor devfreq_simple_ondemand;
+
 #else /* !CONFIG_PM_DEVFREQ */
 static int devfreq_add_device(struct device *dev,
 			   struct devfreq_dev_profile *profile,
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH v4 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs
From: MyungJoo Ham @ 2011-07-15  8:11 UTC (permalink / raw)
  To: linux-pm; +Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner
In-Reply-To: <1310717510-19002-1-git-send-email-myungjoo.ham@samsung.com>

With OPPs, a device may have multiple operable frequency and voltage
sets. However, there can be multiple possible operable sets and a system
will need to choose one from them. In order to reduce the power
consumption (by reducing frequency and voltage) without affecting the
performance too much, a Dynamic Voltage and Frequency Scaling (DVFS)
scheme may be used.

This patch introduces the DVFS capability to non-CPU devices with OPPs.
DVFS is a techique whereby the frequency and supplied voltage of a
device is adjusted on-the-fly. DVFS usually sets the frequency as low
as possible with given conditions (such as QoS assurance) and adjusts
voltage according to the chosen frequency in order to reduce power
consumption and heat dissipation.

The generic DVFS for devices, DEVFREQ, may appear quite similar with
/drivers/cpufreq.  However, CPUFREQ does not allow to have multiple
devices registered and is not suitable to have multiple heterogenous
devices with different (but simple) governors.

Normally, DVFS mechanism controls frequency based on the demand for
the device, and then, chooses voltage based on the chosen frequency.
DEVFREQ also controls the frequency based on the governor's frequency
recommendation and let OPP pick up the pair of frequency and voltage
based on the recommended frequency. Then, the chosen OPP is passed to
device driver's "target" callback.

Tested with memory bus of Exynos4-NURI board.

The test code with board support for Exynos4-NURI is at
http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>

--
Thank you for your valuable comments, Rafael, Greg, Pavel, and Colin.

Changed from v3
- In kerneldoc comments, DEVFREQ has ben replaced by devfreq
- Revised removing devfreq entries with error mechanism
- Added and revised comments
- Removed unnecessary codes
- Allow to give a name to a governor
- Bugfix: a tickle call may cancel an older tickle call that is still in
  effect.

Changed from v2
- Code style revised and cleaned up.
- Remove DEVFREQ entries that incur errors except for EAGAIN
- Bug fixed: tickle for devices without polling governors

Changes from v1(RFC)
- Rename: DVFS --> DEVFREQ
- Revised governor design
    . Governor receives the whole struct devfreq
    . Governor should gather usage information (thru get_dev_status)
itself
- Periodic monitoring runs only when needed.
- DEVFREQ no more deals with voltage information directly
- Removed some printks.
- Some cosmetics update
- Use freezable_wq.
---
 drivers/base/power/Makefile  |    1 +
 drivers/base/power/devfreq.c |  397 ++++++++++++++++++++++++++++++++++++++++++
 drivers/base/power/opp.c     |    9 +
 include/linux/devfreq.h      |  111 ++++++++++++
 kernel/power/Kconfig         |   34 ++++
 5 files changed, 552 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/power/devfreq.c
 create mode 100644 include/linux/devfreq.h

diff --git a/drivers/base/power/Makefile b/drivers/base/power/Makefile
index 3647e11..20118dc 100644
--- a/drivers/base/power/Makefile
+++ b/drivers/base/power/Makefile
@@ -4,5 +4,6 @@ obj-$(CONFIG_PM_RUNTIME)	+= runtime.o
 obj-$(CONFIG_PM_TRACE_RTC)	+= trace.o
 obj-$(CONFIG_PM_OPP)	+= opp.o
 obj-$(CONFIG_HAVE_CLK)	+= clock_ops.o
+obj-$(CONFIG_PM_DEVFREQ)	+= devfreq.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
\ No newline at end of file
diff --git a/drivers/base/power/devfreq.c b/drivers/base/power/devfreq.c
new file mode 100644
index 0000000..aba9768
--- /dev/null
+++ b/drivers/base/power/devfreq.c
@@ -0,0 +1,397 @@
+/*
+ * devfreq: Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework
+ *	    for Non-CPU Devices Based on OPP.
+ *
+ * Copyright (C) 2011 Samsung Electronics
+ *	MyungJoo Ham <myungjoo.ham@samsung.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/opp.h>
+#include <linux/devfreq.h>
+#include <linux/workqueue.h>
+#include <linux/platform_device.h>
+#include <linux/list.h>
+#include <linux/printk.h>
+
+/*
+ * devfreq polling interval in ms.
+ * It is recommended to be "jiffy_in_ms" * n, where n is an integer >= 1.
+ */
+#define DEVFREQ_INTERVAL	20
+
+/*
+ * devfreq_work periodically (given by DEVFREQ_INTERVAL) monitors every
+ * registered device.
+ */
+static bool polling;
+static struct workqueue_struct *devfreq_wq;
+static struct delayed_work devfreq_work;
+
+/* The list of all device-devfreq */
+static LIST_HEAD(devfreq_list);
+static DEFINE_MUTEX(devfreq_list_lock);
+
+/**
+ * find_device_devfreq() - find devfreq struct using device pointer
+ * @dev:	device pointer used to lookup device devfreq.
+ *
+ * Search the list of device devfreqs and return the matched device's
+ * devfreq info. devfreq_list_lock should be held by the caller.
+ */
+static struct devfreq *find_device_devfreq(struct device *dev)
+{
+	struct devfreq *tmp_devfreq;
+
+	if (unlikely(IS_ERR_OR_NULL(dev))) {
+		pr_err("%s: Invalid parameters\n", __func__);
+		return ERR_PTR(-EINVAL);
+	}
+
+	list_for_each_entry(tmp_devfreq, &devfreq_list, node) {
+		if (tmp_devfreq->dev == dev)
+			return tmp_devfreq;
+	}
+
+	return ERR_PTR(-ENODEV);
+}
+
+/**
+ * devfreq_do() - Check the usage profile of a given device and configure
+ *		frequency and voltage accordingly
+ * @devfreq:	devfreq info of the given device
+ */
+static int devfreq_do(struct devfreq *devfreq)
+{
+	struct opp *opp;
+	unsigned long freq;
+	int err;
+
+	err = devfreq->governor->get_target_freq(devfreq, &freq);
+	if (err)
+		return err;
+
+	opp = opp_find_freq_ceil(devfreq->dev, &freq);
+	if (opp == ERR_PTR(-ENODEV))
+		opp = opp_find_freq_floor(devfreq->dev, &freq);
+
+	if (IS_ERR(opp))
+		return PTR_ERR(opp);
+
+	if (devfreq->previous_freq == freq)
+		return 0;
+
+	err = devfreq->profile->target(devfreq->dev, opp);
+	if (err)
+		return err;
+
+	devfreq->previous_freq = freq;
+	return 0;
+}
+
+/**
+ * devfreq_monitor() - Periodically run devfreq_do() and support
+ *		     device devfreq tickle.
+ * @work: the work struct used to run devfreq_monitor periodically.
+ *
+ * Tickle is to force a device to operate at its maximum operable frequency
+ * for a while temporarily. Look at devfreq_tickle_device() for more
+ * information about tickle.
+ */
+static void devfreq_monitor(struct work_struct *work)
+{
+	struct devfreq *devfreq, *tmp;
+	int error;
+
+	mutex_lock(&devfreq_list_lock);
+
+	polling = false;
+
+	list_for_each_entry_safe(devfreq, tmp, &devfreq_list, node) {
+		/*
+		 * If the device is tickled and the tickle duration is left,
+		 * do not change the frequency for a while
+		 */
+		if (devfreq->tickle) {
+			polling = true;
+			devfreq->tickle--;
+
+			/*
+			 * If tickling is ending and the device is not going
+			 * to poll, force the device to poll next time so that
+			 * it can return to the original frequency.
+			 * However, as a non-polling device has 0 polling_ms,
+			 * it will not poll again later.
+			 */
+			if (devfreq->tickle == 0 && devfreq->next_polling == 0)
+				devfreq->next_polling = 1;
+
+			continue;
+		}
+
+		if (devfreq->next_polling == 0)
+			continue;
+
+		polling = true;
+
+		if (devfreq->next_polling-- == 1) {
+			error = devfreq_do(devfreq);
+
+			/* Remove a devfreq with an error. */
+			if (error && error != -EAGAIN) {
+				dev_err(devfreq->dev, "devfreq_do error(%d). "
+					"devfreq is removed from the device\n",
+					error);
+
+				list_del(&devfreq->node);
+				kfree(devfreq);
+
+				continue;
+			}
+			devfreq->next_polling = DIV_ROUND_UP(
+						devfreq->profile->polling_ms,
+						DEVFREQ_INTERVAL);
+		}
+	}
+
+	if (polling)
+		queue_delayed_work(devfreq_wq, &devfreq_work,
+				   msecs_to_jiffies(DEVFREQ_INTERVAL));
+
+	mutex_unlock(&devfreq_list_lock);
+}
+
+/**
+ * devfreq_add_device() - Add devfreq feature to the device
+ * @dev:	the device to add devfreq feature.
+ * @profile:	device-specific profile to run devfreq.
+ * @governor:	the policy to choose frequency.
+ */
+int devfreq_add_device(struct device *dev, struct devfreq_dev_profile *profile,
+		       struct devfreq_governor *governor)
+{
+	struct devfreq *new_devfreq, *devfreq;
+	int err = 0;
+
+	if (!dev || !profile || !governor) {
+		dev_err(dev, "%s: Invalid parameters.\n", __func__);
+		return -EINVAL;
+	}
+
+	mutex_lock(&devfreq_list_lock);
+
+	devfreq = find_device_devfreq(dev);
+	if (!IS_ERR(devfreq)) {
+		dev_err(dev, "%s: Unable to create devfreq for the device. "
+			"It already has one.\n", __func__);
+		err = -EINVAL;
+		goto out;
+	}
+
+	new_devfreq = kzalloc(sizeof(struct devfreq), GFP_KERNEL);
+	if (!new_devfreq) {
+		dev_err(dev, "%s: Unable to create devfreq for the device\n",
+			__func__);
+		err = -ENOMEM;
+		goto out;
+	}
+
+	new_devfreq->dev = dev;
+	new_devfreq->profile = profile;
+	new_devfreq->governor = governor;
+	new_devfreq->next_polling = DIV_ROUND_UP(profile->polling_ms,
+						 DEVFREQ_INTERVAL);
+	new_devfreq->previous_freq = profile->initial_freq;
+
+	list_add(&new_devfreq->node, &devfreq_list);
+
+	if (devfreq_wq && new_devfreq->next_polling && !polling) {
+		polling = true;
+		queue_delayed_work(devfreq_wq, &devfreq_work,
+				   msecs_to_jiffies(DEVFREQ_INTERVAL));
+	}
+out:
+	mutex_unlock(&devfreq_list_lock);
+
+	return err;
+}
+
+/**
+ * devfreq_remove_device() - Remove devfreq feature from a device.
+ * @device:	the device to remove devfreq feature.
+ */
+int devfreq_remove_device(struct device *dev)
+{
+	struct devfreq *devfreq;
+
+	if (!dev)
+		return -EINVAL;
+
+	mutex_lock(&devfreq_list_lock);
+	devfreq = find_device_devfreq(dev);
+	if (IS_ERR(devfreq)) {
+		dev_err(dev, "%s: Unable to find devfreq entry for the device.\n",
+			__func__);
+		mutex_unlock(&devfreq_list_lock);
+		return -EINVAL;
+	}
+
+	list_del(&devfreq->node);
+
+	kfree(devfreq);
+
+	mutex_unlock(&devfreq_list_lock);
+
+	return 0;
+}
+
+/**
+ * devfreq_update() - Notify that the device OPP has been changed.
+ * @dev:	the device whose OPP has been changed.
+ */
+int devfreq_update(struct device *dev)
+{
+	struct devfreq *devfreq;
+	int err = 0;
+
+	mutex_lock(&devfreq_list_lock);
+
+	devfreq = find_device_devfreq(dev);
+	if (IS_ERR(devfreq)) {
+		err = PTR_ERR(devfreq);
+		goto out;
+	}
+
+	/*
+	 * If the maximum frequency available is changed either by
+	 * enabling higher frequency or disabling the current
+	 * maximum frequency, we need to adjust the frequency
+	 * (tickle) again if the device has been being tickled.
+	 */
+	if (devfreq->tickle) {
+		unsigned long freq = devfreq->profile->max_freq;
+		struct opp *opp = opp_find_freq_floor(devfreq->dev, &freq);
+
+		if (IS_ERR(opp)) {
+			err = PTR_ERR(opp);
+			goto out;
+		}
+
+		/* Max freq available is not changed */
+		if (devfreq->previous_freq == freq)
+			goto out;
+
+		/* Tickle again. Max freq available is changed */
+		err = devfreq->profile->target(devfreq->dev, opp);
+		if (!err)
+			devfreq->previous_freq = freq;
+	} else {
+		/* Reevaluate the proper frequency */
+		err = devfreq_do(devfreq);
+	}
+
+out:
+	mutex_unlock(&devfreq_list_lock);
+	return err;
+}
+
+/**
+ * _devfreq_tickle_device() - Adjust operating frequency at maximum and
+ *			    keep the frequency for the designiated delay.
+ * @df:		devfreq entry of the device being tickled.
+ * @delay:	duration of tickle effect in the number of polling.
+ */
+static int _devfreq_tickle_device(struct devfreq *df, unsigned long delay)
+{
+	int err = 0;
+	unsigned long freq;
+	struct opp *opp;
+
+	freq = df->profile->max_freq;
+	opp = opp_find_freq_floor(df->dev, &freq);
+	if (IS_ERR(opp))
+		return PTR_ERR(opp);
+
+	if (df->previous_freq != freq) {
+		err = df->profile->target(df->dev, opp);
+		if (!err)
+			df->previous_freq = freq;
+	}
+	if (err) {
+		dev_err(df->dev, "%s: Cannot set frequency.\n", __func__);
+	} else {
+		/* Do not shorten tickle duration with a new tickle call */
+		if (df->tickle < delay)
+			df->tickle = delay;
+
+		df->num_tickle++;
+	}
+
+	if (devfreq_wq && !polling) {
+		polling = true;
+		queue_delayed_work(devfreq_wq, &devfreq_work,
+				   msecs_to_jiffies(DEVFREQ_INTERVAL));
+	}
+
+	return err;
+}
+
+/**
+ * devfreq_tickle_device() - Guarantee maximum operation speed for a while
+ *			instaneously.
+ * @dev:	the device to be tickled.
+ * @duration_ms:	the duration of tickle effect.
+ *
+ * Tickle sets the device at the maximum frequency instaneously and
+ * the maximum frequency is guaranteed to be used for the given duration.
+ * For faster user reponse time, an input event may tickle a related device
+ * so that the input event does not need to wait for the devfreq to react with
+ * normal interval.
+ *
+ * _devfreq_tickle_device() is used as a helper function for tickling.
+ */
+int devfreq_tickle_device(struct device *dev, unsigned long duration_ms)
+{
+	struct devfreq *devfreq;
+	int err = 0;
+	unsigned long delay; /* in # of DEVFREQ_INTERVAL */
+
+	mutex_lock(&devfreq_list_lock);
+	devfreq = find_device_devfreq(dev);
+	delay = DIV_ROUND_UP(duration_ms, DEVFREQ_INTERVAL);
+
+	if (IS_ERR(devfreq))
+		err = PTR_ERR(devfreq);
+	else
+		err = _devfreq_tickle_device(devfreq, delay);
+
+	mutex_unlock(&devfreq_list_lock);
+
+	return err;
+}
+
+/**
+ * devfreq_init() - Initialize data structure for devfreq framework and
+ *		  start polling registered devfreq devices.
+ */
+static int __init devfreq_init(void)
+{
+	mutex_lock(&devfreq_list_lock);
+
+	polling = false;
+	devfreq_wq = create_freezable_workqueue("devfreq_wq");
+	INIT_DELAYED_WORK_DEFERRABLE(&devfreq_work, devfreq_monitor);
+	mutex_unlock(&devfreq_list_lock);
+
+	devfreq_monitor(&devfreq_work.work);
+	return 0;
+}
+late_initcall(devfreq_init);
diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c
index 56a6899..819c1b3 100644
--- a/drivers/base/power/opp.c
+++ b/drivers/base/power/opp.c
@@ -21,6 +21,7 @@
 #include <linux/rculist.h>
 #include <linux/rcupdate.h>
 #include <linux/opp.h>
+#include <linux/devfreq.h>
 
 /*
  * Internal data structure organization with the OPP layer library is as
@@ -428,6 +429,11 @@ int opp_add(struct device *dev, unsigned long freq, unsigned long u_volt)
 	list_add_rcu(&new_opp->node, head);
 	mutex_unlock(&dev_opp_list_lock);
 
+	/*
+	 * Notify generic dvfs for the change and ignore error
+	 * because the device may not have a devfreq entry
+	 */
+	devfreq_update(dev);
 	return 0;
 }
 
@@ -512,6 +518,9 @@ unlock:
 	mutex_unlock(&dev_opp_list_lock);
 out:
 	kfree(new_opp);
+
+	/* Notify generic dvfs for the change and ignore error */
+	devfreq_update(dev);
 	return r;
 }
 
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
new file mode 100644
index 0000000..7c881cc
--- /dev/null
+++ b/include/linux/devfreq.h
@@ -0,0 +1,111 @@
+/*
+ * devfreq: Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework
+ *	    for Non-CPU Devices Based on OPP.
+ *
+ * Copyright (C) 2011 Samsung Electronics
+ *	MyungJoo Ham <myungjoo.ham@samsung.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __LINUX_DEVFREQ_H__
+#define __LINUX_DEVFREQ_H__
+
+#define DEVFREQ_NAME_LEN 16
+
+struct devfreq;
+struct devfreq_dev_status {
+	/* both since the last measure */
+	unsigned long total_time;
+	unsigned long busy_time;
+	unsigned long current_frequency;
+};
+
+struct devfreq_dev_profile {
+	unsigned long max_freq; /* may be larger than the actual value */
+	unsigned long initial_freq;
+	int polling_ms;	/* 0 for at opp change only */
+
+	int (*target)(struct device *dev, struct opp *opp);
+	int (*get_dev_status)(struct device *dev,
+			      struct devfreq_dev_status *stat);
+};
+
+/**
+ * struct devfreq_governor - DEVFREQ Policy Governor
+ * @data	Governor's internal data. The framework does not care of it.
+ * @get_target_freq	Returns desired operating frequency for the device.
+ *			Basically, get_target_freq will run
+ *			devfreq_dev_profile.get_dev_status() to get the
+ *			status of the device (load = busy_time / total_time).
+ */
+struct devfreq_governor {
+	char name[DEVFREQ_NAME_LEN];
+	void *data; /* private data for get_target_freq */
+	int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
+};
+
+/**
+ * struct devfreq - Device DEVFREQ structure
+ * @node	list node - contains the devices with DEVFREQ that have been
+ *		registered.
+ * @dev		device pointer
+ * @profile	device-specific devfreq profile
+ * @governor	method how to choose frequency based on the usage.
+ * @previous_freq	previously configured frequency value.
+ * @next_polling	the number of remaining "devfreq_monitor" executions to
+ *			reevaluate frequency/voltage of the device. Set by
+ *			profile's polling_ms interval.
+ * @tickle	positive if DEVFREQ-tickling is activated for the device.
+ *		at each executino of devfreq_monitor, tickle is decremented.
+ *		User may tickle a device-devfreq in order to set maximum
+ *		frequency instaneously with some guaranteed duration.
+ *
+ * This structure stores the DEVFREQ information for a give device.
+ */
+struct devfreq {
+	struct list_head node;
+
+	struct device *dev;
+	struct devfreq_dev_profile *profile;
+	struct devfreq_governor *governor;
+
+	unsigned long previous_freq;
+	unsigned int next_polling;
+	unsigned int tickle;
+};
+
+#if defined(CONFIG_PM_DEVFREQ)
+extern int devfreq_add_device(struct device *dev,
+			   struct devfreq_dev_profile *profile,
+			   struct devfreq_governor *governor);
+extern int devfreq_remove_device(struct device *dev);
+extern int devfreq_update(struct device *dev);
+extern int devfreq_tickle_device(struct device *dev, unsigned long duration_ms);
+#else /* !CONFIG_PM_DEVFREQ */
+static int devfreq_add_device(struct device *dev,
+			   struct devfreq_dev_profile *profile,
+			   struct devfreq_governor *governor)
+{
+	return 0;
+}
+
+static int devfreq_remove_device(struct device *dev)
+{
+	return 0;
+}
+
+static int devfreq_update(struct device *dev)
+{
+	return 0;
+}
+
+static int devfreq_tickle_device(struct device *dev, unsigned long duration_ms)
+{
+	return 0;
+}
+#endif /* CONFIG_PM_DEVFREQ */
+
+#endif /* __LINUX_DEVFREQ_H__ */
diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
index 87f4d24..b7e15c8 100644
--- a/kernel/power/Kconfig
+++ b/kernel/power/Kconfig
@@ -227,3 +227,37 @@ config PM_OPP
 config PM_RUNTIME_CLK
 	def_bool y
 	depends on PM_RUNTIME && HAVE_CLK
+
+config ARCH_HAS_DEVFREQ
+	bool
+	depends on ARCH_HAS_OPP
+	help
+	  Denotes that the architecture supports DEVFREQ. If the architecture
+	  supports multiple OPP entries per device and the frequency of the
+	  devices with OPPs may be altered dynamically, the architecture
+	  supports DEVFREQ.
+
+config PM_DEVFREQ
+	bool "Generic Dynamic Voltage and Frequency Scaling (DVFS) Framework"
+	depends on PM_OPP && ARCH_HAS_DEVFREQ
+	help
+	  With OPP support, a device may have a list of frequencies and
+	  voltages available. DEVFREQ, a generic DVFS framework can be
+	  registered for a device with OPP support in order to let the
+	  governor provided to DEVFREQ choose an operating frequency
+	  based on the OPP's list and the policy given with DEVFREQ.
+
+	  Each device may have its own governor and policy. DEVFREQ can
+	  reevaluate the device state periodically and/or based on the
+	  OPP list changes (each frequency/voltage pair in OPP may be
+	  disabled or enabled).
+
+	  Like some CPUs with CPUFREQ, a device may have multiple clocks.
+	  However, because the clock frequencies of a single device are
+	  determined by the single device's state, an instance of DEVFREQ
+	  is attached to a single device and returns a "representative"
+	  clock frequency from the OPP of the device, which is also attached
+	  to a device by 1-to-1. The device registering DEVFREQ takes the
+	  responsiblity to "interpret" the frequency listed in OPP and
+	  to set its every clock accordingly with the "target" callback
+	  given to DEVFREQ.
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices
From: MyungJoo Ham @ 2011-07-15  8:11 UTC (permalink / raw)
  To: linux-pm; +Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, Thomas Gleixner

For a usage example, please look at
http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq

In the above git tree, DVFS (dynamic voltage and frequency scaling) mechanism
is applied to the memory bus of Exynos4210 for Exynos4210-NURI boards.
In the example, the LPDDR2 DRAM frequency changes between 133, 266, and 400MHz
and other related clocks simply follow the determined DDR RAM clock.

The DEVFREQ driver for Exynos4210 memory bus is at
/arch/arm/mach-exynos4/devfreq_bus.c in the git tree.

MyungJoo Ham (3):
  PM: Introduce DEVFREQ: generic DVFS framework with device-specific
    OPPs
  PM / DEVFREQ: add example governors
  PM / DEVFREQ: add sysfs interface (including user tickling)

 Documentation/ABI/testing/sysfs-devices-power |   50 ++
 Documentation/ABI/testing/sysfs-power         |   43 ++
 drivers/base/power/Makefile                   |    1 +
 drivers/base/power/devfreq.c                  |  714 +++++++++++++++++++++++++
 drivers/base/power/opp.c                      |    9 +
 include/linux/devfreq.h                       |  119 ++++
 kernel/power/Kconfig                          |   34 ++
 7 files changed, 970 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/power/devfreq.c
 create mode 100644 include/linux/devfreq.h

-- 
1.7.4.1

^ permalink raw reply

* Re: [PATCH v2] x86 intel power: Initialize MSR_IA32_ENERGY_PERF_BIAS
From: Rafael J. Wysocki @ 2011-07-14 19:35 UTC (permalink / raw)
  To: Len Brown
  Cc: Andrew Morton, x86, linux-kernel, Thomas Gleixner, H. Peter Anvin,
	Ingo Molnar, Linus Torvalds, linux-pm, Alan Cox, Arjan van de Ven
In-Reply-To: <alpine.LFD.2.02.1107140051020.18606@x980>

On Thursday, July 14, 2011, Len Brown wrote:
> From: Len Brown <len.brown@intel.com>
> 
> Since 2.6.36 (23016bf0d25), Linux prints the existence of "epb" in /proc/cpuinfo,
> Since 2.6.38 (d5532ee7b40), the x86_energy_perf_policy(8) utility has
> been available in-tree to update MSR_IA32_ENERGY_PERF_BIAS.
> 
> However, the typical BIOS fails to initialize the MSR, presumably
> because this is handled by high-volume shrink-wrap operating systems...
> 
> Linux distros, on the other hand, do not yet invoke x86_energy_perf_policy(8).
> As a result, WSM-EP, SNB, and later hardware from Intel will run in its
> default hardware power-on state (performance), which assumes that users
> care for performance at all costs and not for energy efficiency.
> While that is fine for performance benchmarks, the hardware's intended default
> operating point is "normal" mode...
> 
> Initialize the MSR to the "normal" by default during kernel boot.
> 
> x86_energy_perf_policy(8) is available to change the default after boot,
> should the user have a different preference.
> 
> cc: stable@kernel.org
> Signed-off-by: Len Brown <len.brown@intel.com>

Acked-by: Rafael J. Wysocki <rjw@sisk.pl>

> ---
>  arch/x86/include/asm/msr-index.h |    3 +++
>  arch/x86/kernel/cpu/intel.c      |   18 ++++++++++++++++++
>  2 files changed, 21 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 43a18c7..91fedd9 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -250,6 +250,9 @@
>  #define MSR_IA32_TEMPERATURE_TARGET	0x000001a2
>  
>  #define MSR_IA32_ENERGY_PERF_BIAS	0x000001b0
> +#define ENERGY_PERF_BIAS_PERFORMANCE	0
> +#define ENERGY_PERF_BIAS_NORMAL		6
> +#define ENERGY_PERF_BIAS_POWERSWAVE	15
>  
>  #define MSR_IA32_PACKAGE_THERM_STATUS		0x000001b1
>  
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index d16c2c5..7c1ca07 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -448,6 +448,24 @@ static void __cpuinit init_intel(struct cpuinfo_x86 *c)
>  
>  	if (cpu_has(c, X86_FEATURE_VMX))
>  		detect_vmx_virtcap(c);
> +
> +	/*
> +	 * Initialize MSR_IA32_ENERGY_PERF_BIAS if BIOS did not.
> +	 * x86_energy_perf_policy(8) is available to change it at run-time
> +	 */
> +	if (cpu_has(c, X86_FEATURE_EPB)) {
> +		u64 epb;
> +
> +		rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
> +		if ((epb & 0xF) == 0) {
> +			printk_once(KERN_WARNING, "x86: updated energy_perf_bias"
> +				" to 'normal' from 'performance'\n"
> +				"You can view and update epb via utility,"
> +				" such as x86_energy_perf_policy(8)\n");
> +			epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL;
> +			wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
> +		}
> +	}
>  }
>  
>  #ifdef CONFIG_X86_32
> 

^ permalink raw reply

* Re: [PATCH 0/3] PM / Domains / shmobile fixes
From: Rafael J. Wysocki @ 2011-07-14 19:34 UTC (permalink / raw)
  To: Magnus Damm; +Cc: Linux PM mailing list, LKML, linux-sh
In-Reply-To: <CANqRtoTLO=DqehwanGdE+rgxL_ofa5YCJjsx47doHzwZDP4Ocw@mail.gmail.com>

On Thursday, July 14, 2011, Magnus Damm wrote:
> On Thu, Jul 14, 2011 at 6:52 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > Hi,
> >
> > The following three patches fix a couple of issues in the code currently
> > in my pm-domains branch at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git pm-domains
> >
> > [1/3] - Use genpd_queue_power_off_work() for queuing up the powering off
> >        of A4LC (this avoids attempting to queue up a work item while it is
> >        pending).
> >
> > [2/3] - Make the generic PM domains code react to -EBUSY returned from a
> >        PM domain's .power_down() callback (needed for [3/3]).
> >
> > [3/3] - Return -EBUSY from the A4LC's .power_down() callback to indicate that
> >        the domain hasn't been powered down on purpose and remove the confusing
> >        (and now redundant) pm_genpd_poweron(A4LC) from pd_power_down_a3rv().
> 
> All patches above look great, thanks a lot for your help!
> 
> Acked-by: Magnus Damm <damm@opensource.se>

Thanks!

Rafael

^ permalink raw reply

* Re: [Q] freezing tasks vs. suspending drivers during STR
From: Alan Stern @ 2011-07-14 14:43 UTC (permalink / raw)
  To: Guennadi Liakhovetski; +Cc: linux-pm
In-Reply-To: <Pine.LNX.4.64.1107141430460.10688@axis700.grange>

On Thu, 14 Jul 2011, Guennadi Liakhovetski wrote:

> Hi all
> 
> I ran across an issue, which I cannot understand. In short: kernel threads 
> continue running also after freezing of the tasks has completed, late into 
> the device suspend. My scenario was suspending the system, while running 
> dmatest. At some point also the DMA driver is suspended, but the dmatest 
> threads continue running, so they issue further requests, which then 
> confuse the driver. In case this is a known design "decision," you do not 
> have to read on. Below I'll just provide some excerpts from kernel logs, 
> supporting my statement. Any explanation and fix suggestions would be 
> appreciated!

I don't know how dmatest was designed, but it seems clear that the 
kernel threads it uses ought to be marked freezable.  Probably nobody 
imagined running dmatest while suspending the system.  Or maybe for 
some reason it's not possible to make those threads freezable.

> Well, sure, I could hack the driver to reject any submits after suspend or 
> queue them without executing, but... Is this behaviour correct?

The driver shouldn't have to worry about rejecting submissions after it 
is suspended.  Rather, no thread should be running to make those 
submissions in the first place.

Alan Stern

^ permalink raw reply

* [Q] freezing tasks vs. suspending drivers during STR
From: Guennadi Liakhovetski @ 2011-07-14 12:42 UTC (permalink / raw)
  To: linux-pm

Hi all

I ran across an issue, which I cannot understand. In short: kernel threads 
continue running also after freezing of the tasks has completed, late into 
the device suspend. My scenario was suspending the system, while running 
dmatest. At some point also the DMA driver is suspended, but the dmatest 
threads continue running, so they issue further requests, which then 
confuse the driver. In case this is a known design "decision," you do not 
have to read on. Below I'll just provide some excerpts from kernel logs, 
supporting my statement. Any explanation and fix suggestions would be 
appreciated!

* In the kernel log we first dee processes being frozen:

[   60.414062] Freezing user space processes ... 
[   60.421875] sh-dma-engine sh-dma-engine.2: submit #60@cc87fdb4 on 3: 4ca1c388[15140] -> 4ca203b4
[   60.421875] sh-dma-engine sh-dma-engine.2: submit #61@cc87fdb4 on 3: 4ca24be8[9316] -> 4ca28928
[   60.421875] sh-dma-engine sh-dma-engine.2: submit #62@cc87fdb4 on 3: 4ca1f2ec[3332] -> 4ca22644

* The above "submit" lines come from the dmatest _kernel threads_, which 
* is the context, from which all these calls are made

[   60.437500] (elapsed 0.01 seconds) done.
[   60.437500] Freezing remaining freezable tasks ...

* Some more freezable tasks...

[   60.437500] sh-dma-engine sh-dma-engine.2: submit #72@cc87fcf4 on 3: 4ca27584[236] -> 4ca2abdc
[   60.437500] sh-dma-engine sh-dma-engine.2: submit #73@cc87fcf4 on 3: 4ca24430[11668] -> 4ca28c3c
[   60.437500] sh-dma-engine sh-dma-engine.0: submit #32@cf7343d4 on 2: 4f7de588[1668] -> 4f7e1d34

...

[   60.460937] sh-dma-engine sh-dma-engine.2: submit #76@cc87fcf4 on 3: 4ca244cc[12012] -> 4ca2852c
[   60.460937] sh-dma-engine sh-dma-engine.2: submit #43@cc87f0f4 on 4: 4ca4cf90[11548] -> 4ca507f8
[   60.468750] sh-dma-engine sh-dma-engine.0: submit #50@cf668e14 on 0: 4f78cb18[9008] -> 4f7916c0
[   60.468750] (elapsed 0.03 seconds) done.
[   60.468750] Suspending console(s) (use no_console_suspend to debug)

* Ok, should be done with processes by now, right?

[   60.476562] sh-dma-engine sh-dma-engine.0: submit #51@cf668e14 on 0: 4f784b30[620] -> 4f788098
[   60.476562] sh-dma-engine sh-dma-engine.2: submit #77@cc87fcf4 on 3: 4ca24670[2712] -> 4ca2a250
[   60.476562] sh-dma-engine sh-dma-engine.2: submit #44@cc87f0f4 on 4: 4ca4cd80[5324] -> 4ca51ba4
[   60.476562] sh-dma-engine sh-dma-engine.0: submit #52@cf668e14 on 0: 4f78c380[12772] -> 4f790420
...
[   60.585937] sh-dma-engine sh-dma-engine.0: submit #70@cf668214 on 1: 4f7a8764[13720] -> 4f7ac7dc
[   60.593750] sh-dma-engine sh-dma-engine.0: submit #58@cf734434 on 2: 4f7c4978[12780] -> 4f7c82c4
[   60.593750] sh-dma-engine sh-dma-engine.0: submit #71@cf6681b4 on 1: 4f7a2074[7960] -> 4f7a47ec
[   60.593750] sh-dma-engine sh-dma-engine.0: submit #59@cf734434 on 2: 4f7c70d4[1116] -> 4f7cb630

* Ouch... they are still running.

[   60.593750] soc_camera_platform soc_camera_platform.0: platform_pm_suspend().703: soc_camera_platform
[   60.593750] snd-soc-dummy snd-soc-dummy: platform_pm_suspend().703: snd-soc-dummy
[   60.593750] alarmtimer alarmtimer: platform_pm_suspend().703: alarmtimer
[   60.593750] sh_mobile_meram sh_mobile_meram.0: platform_pm_suspend().703: sh_mobile_meram
[   60.593750] sh-mobile-hdmi sh-mobile-hdmi: platform_pm_suspend().703: sh-mobile-hdmi
[   60.593750] sh_mobile_lcdc_fb sh_mobile_lcdc_fb.1: pm_genpd_suspend()

* Above random drivers suspend their devices already

[   60.601562] sh-dma-engine sh-dma-engine.0: submit #60@cf734434 on 2: 4f7c52ec[11480] -> 4f7c9310
[   60.601562] sh-dma-engine sh-dma-engine.0: submit #61@cf734434 on 2: 4f7c5bec[7448] -> 4f7c9428
[   60.601562] sh-dma-engine sh-dma-engine.0: submit #62@cf734434 on 2: 4f7c5434[2036] -> 4f7cac20
[   60.601562] sh-dma-engine sh-dma-engine.0: submit #63@cf734434 on 2: 4f7c4190[8088] -> 4f7c9500
[   60.601562] sh-dma-engine sh-dma-engine.0: submit #64@cf734434 on 2: 4f7c58fc[2824] -> 4f7cad58

* more dmatest kthread runs

[   60.835937] sh-dma-engine sh-dma-engine.1: submit #74@cc874434 on 1: 4c8b8954[7940] -> 4c8bd460
[   60.835937] sh-dma-engine sh-dma-engine.1: submit #77@cc878154 on 5: 4c955d34[7040] -> 4c958bd0
[   60.843750] sh-dma-engine sh-dma-engine.1: submit #78@cc878154 on 5: 4c956258[4796] -> 4c959e98
[   60.843750] smsc911x smsc911x: platform_pm_suspend().703: smsc911x
[   60.843750] physmap-flash physmap-flash.0: platform_pm_suspend().703: physmap-flash
[   60.843750] platform uio_pdrv_genirq.0: pm_genpd_suspend()

* more yet, intermixed with random device suspends

[   60.843750] sh-dma-engine sh-dma-engine.2: sh_dmae_suspend(dma2chan0): 0 @ cf69d0c4
[   60.843750] sh-dma-engine sh-dma-engine.2: sh_dmae_suspend(dma2chan1): 0 @ cf2ace64
[   60.843750] sh-dma-engine sh-dma-engine.2: sh_dmae_suspend(dma2chan2): 1 @ cf2acce4
[   60.843750] sh-dma-engine sh-dma-engine.2: sh_dmae_suspend(dma2chan3): 1 @ cf2acb64
[   60.843750] sh-dma-engine sh-dma-engine.2: sh_dmae_suspend(dma2chan4): 1 @ cf2ac9e4
[   60.843750] sh-dma-engine sh-dma-engine.2: sh_dmae_suspend(dma2chan5): 1 @ cf2ac864
[   60.843750] sh-dma-engine sh-dma-engine.1: pm_genpd_suspend()
[   60.843750] sh-dma-engine sh-dma-engine.1: sh_dmae_suspend(dma1chan0): 1 @ cf69da84
[   60.843750] sh-dma-engine sh-dma-engine.1: sh_dmae_suspend(dma1chan1): 1 @ cf69d904
[   60.843750] sh-dma-engine sh-dma-engine.1: sh_dmae_suspend(dma1chan2): 1 @ cf69d784
[   60.843750] sh-dma-engine sh-dma-engine.1: sh_dmae_suspend(dma1chan3): 0 @ cf69d604
[   60.843750] sh-dma-engine sh-dma-engine.1: sh_dmae_suspend(dma1chan4): 0 @ cf69d484
[   60.843750] sh-dma-engine sh-dma-engine.1: sh_dmae_suspend(dma1chan5): 1 @ cf69d304
[   60.843750] sh-dma-engine sh-dma-engine.0: pm_genpd_suspend()
[   60.843750] sh-dma-engine sh-dma-engine.0: sh_dmae_suspend(dma0chan0): 1 @ cf707824
[   60.843750] sh-dma-engine sh-dma-engine.0: sh_dmae_suspend(dma0chan1): 0 @ cf7076a4
[   60.843750] sh-dma-engine sh-dma-engine.0: sh_dmae_suspend(dma0chan2): 1 @ cf707524
[   60.843750] sh-dma-engine sh-dma-engine.0: sh_dmae_suspend(dma0chan3): 1 @ cf7073a4
[   60.843750] sh-dma-engine sh-dma-engine.0: sh_dmae_suspend(dma0chan4): 1 @ cf69de44
[   60.843750] sh-dma-engine sh-dma-engine.0: sh_dmae_suspend(dma0chan5): 1 @ cf69dcc4

* shdma suspended, good

[   60.843750] PM: suspend of devices complete after 248.086 msecs

[   60.851562] sh-dma-engine sh-dma-engine.1: submit #79@cc878154 on 5: 4c95739c[180] -> 4c95805c
[   60.851562] sh-dma-engine sh-dma-engine.1: submit #72@cc8756f4 on 2: 4c91481c[4456] -> 4c91a6dc
[   60.851562] sh-dma-engine sh-dma-engine.1: submit #75@cc874434 on 1: 4c8ac7d8[8576] -> 4c8b07bc
[   60.851562] sh-dma-engine sh-dma-engine.1: submit #59@cc878cf4 on 4: 4c9407d0[2628] -> 4c945dd0
[   60.851562] sh-dma-engine sh-dma-engine.2: submit #40@cc87caf4 on 2: 4c9ca0b0[5200] -> 4c9cdbf0
[   60.851562] sh-dma-engine sh-dma-engine.2: submit #41@cc87c9d4 on 2: 4c9b9b28[5804] -> 4c9bde50
[   60.851562] sh-dma-engine sh-dma-engine.2: submit #42@cc87ca34 on 2: 4c9b07d4[11964] -> 4c9b4390
[   60.851562] sh-dma-engine sh-dma-engine.1: submit #47@cc877a34 on 3: 4c8fcf4c[3784] -> 4c9023a8
[   60.851562] sh-dma-engine sh-dma-engine.2: submit #43@cc87ca94 on 2: 4c9c16c8[9576] -> 4c9c5568
[   60.851562] sh-dma-engine sh-dma-engine.2: submit #120@cc87fcf4 on 3: 4ca2c878[11636] -> 4ca30da4

[   60.859375] sh-dma-engine sh-dma-engine.2: submit #121@cc87fdb4 on 3: 4ca1e2dc[3520] -> 4ca22394
[   60.859375] sh-dma-engine sh-dma-engine.2: submit #122@cc87fe14 on 3: 4c9ab7c8[584] -> 4c9aed98

* Heeeeeelp... still running......

[   60.859375] PM: late suspend of devices complete after 16.277 msecs

* Good night
* Good morning

[   60.859375] sh_tmu sh_tmu.0: used for periodic clock events
[   60.859375] sh-sci sh-sci.0: pm_genpd_resume_noirq()

[   60.859375] PM: early resume of devices complete after 1.416 msecs

[   60.859375] sh-dma-engine sh-dma-engine.1: sh_dmae_resume(dma1chan1): cf69d904
[   60.859375] sh-dma-engine sh-dma-engine.1: sh_dmae_resume(dma1chan2): cf69d784
[   60.859375] sh-dma-engine sh-dma-engine.1: sh_dmae_resume(dma1chan3): cf69d604
[   60.859375] sh-dma-engine sh-dma-engine.1: sh_dmae_resume(dma1chan5): cf69d304
[   60.859375] sh-dma-engine sh-dma-engine.2: pm_genpd_resume(): 0
[   60.859375] sh-dma-engine sh-dma-engine.2: sh_dmae_resume(dma2chan0): cf69d0c4
[   60.859375] sh-dma-engine sh-dma-engine.2: sh_dmae_resume(dma2chan2): cf2acce4
[   60.859375] sh-dma-engine sh-dma-engine.2: sh_dmae_resume(dma2chan3): cf2acb64

* shdma resumed

[   60.859375] PM: resume of devices complete after 2.624 msecs

* threads woken up...

[   60.867187] sh-dma-engine sh-dma-engine.0: submit #65@cf668e14 on 0: 4f7963cc[1904] -> 4f79b0fc
[   60.867187] sh-dma-engine sh-dma-engine.0: submit #72@cf668154 on 1: 4f7b6824[5448] -> 4f7b8674
[   60.867187] sh-dma-engine sh-dma-engine.2: submit #49@cc87f0f4 on 4: 4ca689d8[10648] -> 4ca6c4ec
[   60.867187] sh-dma-engine sh-dma-engine.2: submit #50@cc87f0f4 on 4: 4ca38e94[2036] -> 4ca3d970
[   60.867187] sh-dma-engine sh-dma-engine.0: submit #66@cf668e14 on 0: 4f78d82c[4672] -> 4f791aec
[   60.867187] sh-dma-engine sh-dma-engine.2: submit #51@cc87f0f4 on 4: 4ca4c430[11020] -> 4ca50bdc
[   60.867187] sh-dma-engine sh-dma-engine.0: submit #67@cf668e14 on 0: 4f7848a8[11372] -> 4f7886e8
[   60.867187] sh-dma-engine sh-dma-engine.0: submit #59@cf2a2814 on 3: 4c808f10[7244] -> 4c80d098
[   60.867187] sh-dma-engine sh-dma-engine.0: submit #60@cf2a2814 on 3: 4f7e9684[1428] -> 4f7ecb64
[   60.867187] sh-dma-engine sh-dma-engine.0: submit #61@cf2a2814 on 3: 4c801074[3308] -> 4c8058e8
[   60.867187] sh-dma-engine sh-dma-engine.2: submit #123@cc87fd54 on 3: 4ca24fe0[1372] -> 4ca2b8a8

* What am I missing?

Well, sure, I could hack the driver to reject any submits after suspend or 
queue them without executing, but... Is this behaviour correct?

Thanks
Guennadi
---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/

^ permalink raw reply

* [PATCH v2] x86 intel power: Initialize MSR_IA32_ENERGY_PERF_BIAS
From: Len Brown @ 2011-07-14  4:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andrew Morton, x86, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Linus Torvalds, linux-pm, Alan Cox, Arjan van de Ven
In-Reply-To: <4E1E106E.4060408@zytor.com>

From: Len Brown <len.brown@intel.com>

Since 2.6.36 (23016bf0d25), Linux prints the existence of "epb" in /proc/cpuinfo,
Since 2.6.38 (d5532ee7b40), the x86_energy_perf_policy(8) utility has
been available in-tree to update MSR_IA32_ENERGY_PERF_BIAS.

However, the typical BIOS fails to initialize the MSR, presumably
because this is handled by high-volume shrink-wrap operating systems...

Linux distros, on the other hand, do not yet invoke x86_energy_perf_policy(8).
As a result, WSM-EP, SNB, and later hardware from Intel will run in its
default hardware power-on state (performance), which assumes that users
care for performance at all costs and not for energy efficiency.
While that is fine for performance benchmarks, the hardware's intended default
operating point is "normal" mode...

Initialize the MSR to the "normal" by default during kernel boot.

x86_energy_perf_policy(8) is available to change the default after boot,
should the user have a different preference.

cc: stable@kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
---
 arch/x86/include/asm/msr-index.h |    3 +++
 arch/x86/kernel/cpu/intel.c      |   18 ++++++++++++++++++
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 43a18c7..91fedd9 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -250,6 +250,9 @@
 #define MSR_IA32_TEMPERATURE_TARGET	0x000001a2
 
 #define MSR_IA32_ENERGY_PERF_BIAS	0x000001b0
+#define ENERGY_PERF_BIAS_PERFORMANCE	0
+#define ENERGY_PERF_BIAS_NORMAL		6
+#define ENERGY_PERF_BIAS_POWERSWAVE	15
 
 #define MSR_IA32_PACKAGE_THERM_STATUS		0x000001b1
 
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index d16c2c5..7c1ca07 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -448,6 +448,24 @@ static void __cpuinit init_intel(struct cpuinfo_x86 *c)
 
 	if (cpu_has(c, X86_FEATURE_VMX))
 		detect_vmx_virtcap(c);
+
+	/*
+	 * Initialize MSR_IA32_ENERGY_PERF_BIAS if BIOS did not.
+	 * x86_energy_perf_policy(8) is available to change it at run-time
+	 */
+	if (cpu_has(c, X86_FEATURE_EPB)) {
+		u64 epb;
+
+		rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
+		if ((epb & 0xF) == 0) {
+			printk_once(KERN_WARNING, "x86: updated energy_perf_bias"
+				" to 'normal' from 'performance'\n"
+				"You can view and update epb via utility,"
+				" such as x86_energy_perf_policy(8)\n");
+			epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL;
+			wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
+		}
+	}
 }
 
 #ifdef CONFIG_X86_32
-- 
1.7.6.134.gcf13f

^ permalink raw reply related

* Re: [PATCH 0/3] PM / Domains / shmobile fixes
From: Magnus Damm @ 2011-07-14  2:10 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM mailing list, LKML, linux-sh
In-Reply-To: <201107132352.59801.rjw@sisk.pl>

On Thu, Jul 14, 2011 at 6:52 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> Hi,
>
> The following three patches fix a couple of issues in the code currently
> in my pm-domains branch at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git pm-domains
>
> [1/3] - Use genpd_queue_power_off_work() for queuing up the powering off
>        of A4LC (this avoids attempting to queue up a work item while it is
>        pending).
>
> [2/3] - Make the generic PM domains code react to -EBUSY returned from a
>        PM domain's .power_down() callback (needed for [3/3]).
>
> [3/3] - Return -EBUSY from the A4LC's .power_down() callback to indicate that
>        the domain hasn't been powered down on purpose and remove the confusing
>        (and now redundant) pm_genpd_poweron(A4LC) from pd_power_down_a3rv().

All patches above look great, thanks a lot for your help!

Acked-by: Magnus Damm <damm@opensource.se>

^ permalink raw reply

* [PATCH 3/3] ARM / shmobile: Return -EBUSY from A4LC power off if A3RV is active
From: Rafael J. Wysocki @ 2011-07-13 21:56 UTC (permalink / raw)
  To: Linux PM mailing list; +Cc: LKML, linux-sh
In-Reply-To: <201107132352.59801.rjw@sisk.pl>

From: Rafael J. Wysocki <rjw@sisk.pl>

Since the A4LC should only be powered off if the A3RV is off, make
the A4LC's power down routine return -EBUSY if A3RV is not off to
indicate to the core that it doesn't want to power off the domain in
that case.  This will cause the core to regard A4LC as active, so
the pm_genpd_poweron() in pd_power_down_a3rv() is not necessary any
more.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/arm/mach-shmobile/pm-sh7372.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Index: linux-2.6/arch/arm/mach-shmobile/pm-sh7372.c
===================================================================
--- linux-2.6.orig/arch/arm/mach-shmobile/pm-sh7372.c
+++ linux-2.6/arch/arm/mach-shmobile/pm-sh7372.c
@@ -106,7 +106,6 @@ static int pd_power_down_a3rv(struct gen
 	int ret = pd_power_down(genpd);
 
 	/* try to power down A4LC after A3RV is requested off */
-	pm_genpd_poweron(&sh7372_a4lc.genpd);
 	genpd_queue_power_off_work(&sh7372_a4lc.genpd);
 
 	return ret;
@@ -118,7 +117,7 @@ static int pd_power_down_a4lc(struct gen
 	if (!(__raw_readl(PSTR) & (1 << sh7372_a3rv.bit_shift)))
 		return pd_power_down(genpd);
 
-	return 0;
+	return -EBUSY;
 }
 
 static bool pd_active_wakeup(struct device *dev)

^ permalink raw reply

* [PATCH 2/3] PM / Domains: Take .power_off() error code into account
From: Rafael J. Wysocki @ 2011-07-13 21:55 UTC (permalink / raw)
  To: Linux PM mailing list; +Cc: LKML, linux-sh
In-Reply-To: <201107132352.59801.rjw@sisk.pl>

From: Rafael J. Wysocki <rjw@sisk.pl>

Currently pm_genpd_poweroff() discards error codes returned by
the PM domain's .power_off() callback, because it's safer to always
regard the domain as inaccessible to drivers after a failing
.power_off().  Still, there are situations in which the low-level
code may want to indicate that it doesn't want to power off the
domain, so allow it to do that by returning -EBUSY from .power_off().

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/base/power/domain.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/base/power/domain.c
===================================================================
--- linux-2.6.orig/drivers/base/power/domain.c
+++ linux-2.6/drivers/base/power/domain.c
@@ -312,8 +312,16 @@ static int pm_genpd_poweroff(struct gene
 		}
 	}
 
-	if (genpd->power_off)
-		genpd->power_off(genpd);
+	if (genpd->power_off) {
+		ret = genpd->power_off(genpd);
+		if (ret == -EBUSY) {
+			genpd_set_active(genpd);
+			if (parent)
+				genpd_release_lock(parent);
+
+			goto out;
+		}
+	}
 
 	genpd->status = GPD_STATE_POWER_OFF;
 

^ permalink raw reply

* [PATCH 1/3] ARM / shmobile: Use genpd_queue_power_off_work()
From: Rafael J. Wysocki @ 2011-07-13 21:54 UTC (permalink / raw)
  To: Linux PM mailing list; +Cc: LKML, linux-sh
In-Reply-To: <201107132352.59801.rjw@sisk.pl>

From: Rafael J. Wysocki <rjw@sisk.pl>

Make pd_power_down_a3rv() use genpd_queue_power_off_work() to queue
up the powering off of the A4LC domain to avoid queuing it up when
it is pending.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 arch/arm/mach-shmobile/pm-sh7372.c |    2 +-
 drivers/base/power/domain.c        |    2 +-
 include/linux/pm_domain.h          |    2 ++
 3 files changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/arm/mach-shmobile/pm-sh7372.c
===================================================================
--- linux-2.6.orig/arch/arm/mach-shmobile/pm-sh7372.c
+++ linux-2.6/arch/arm/mach-shmobile/pm-sh7372.c
@@ -107,7 +107,7 @@ static int pd_power_down_a3rv(struct gen
 
 	/* try to power down A4LC after A3RV is requested off */
 	pm_genpd_poweron(&sh7372_a4lc.genpd);
-	queue_work(pm_wq, &sh7372_a4lc.genpd.power_off_work);
+	genpd_queue_power_off_work(&sh7372_a4lc.genpd);
 
 	return ret;
 }
Index: linux-2.6/drivers/base/power/domain.c
===================================================================
--- linux-2.6.orig/drivers/base/power/domain.c
+++ linux-2.6/drivers/base/power/domain.c
@@ -222,7 +222,7 @@ static bool genpd_abort_poweroff(struct
  * Queue up the execution of pm_genpd_poweroff() unless it's already been done
  * before.
  */
-static void genpd_queue_power_off_work(struct generic_pm_domain *genpd)
+void genpd_queue_power_off_work(struct generic_pm_domain *genpd)
 {
 	if (!work_pending(&genpd->power_off_work))
 		queue_work(pm_wq, &genpd->power_off_work);
Index: linux-2.6/include/linux/pm_domain.h
===================================================================
--- linux-2.6.orig/include/linux/pm_domain.h
+++ linux-2.6/include/linux/pm_domain.h
@@ -73,6 +73,7 @@ extern void pm_genpd_init(struct generic
 			  struct dev_power_governor *gov, bool is_off);
 extern int pm_genpd_poweron(struct generic_pm_domain *genpd);
 extern void pm_genpd_poweroff_unused(void);
+extern void genpd_queue_power_off_work(struct generic_pm_domain *genpd);
 #else
 static inline int pm_genpd_add_device(struct generic_pm_domain *genpd,
 				      struct device *dev)
@@ -101,6 +102,7 @@ static inline int pm_genpd_poweron(struc
 	return -ENOSYS;
 }
 static inline void pm_genpd_poweroff_unused(void) {}
+static inline void genpd_queue_power_off_work(struct generic_pm_domain *gpd) {}
 #endif
 
 #endif /* _LINUX_PM_DOMAIN_H */

^ permalink raw reply

* [PATCH 0/3] PM / Domains / shmobile fixes
From: Rafael J. Wysocki @ 2011-07-13 21:52 UTC (permalink / raw)
  To: Linux PM mailing list; +Cc: LKML, linux-sh

Hi,

The following three patches fix a couple of issues in the code currently
in my pm-domains branch at:

git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git pm-domains

[1/3] - Use genpd_queue_power_off_work() for queuing up the powering off
        of A4LC (this avoids attempting to queue up a work item while it is
        pending).

[2/3] - Make the generic PM domains code react to -EBUSY returned from a
        PM domain's .power_down() callback (needed for [3/3]).

[3/3] - Return -EBUSY from the A4LC's .power_down() callback to indicate that
        the domain hasn't been powered down on purpose and remove the confusing
        (and now redundant) pm_genpd_poweron(A4LC) from pd_power_down_a3rv().

Thanks,
Rafael

^ permalink raw reply

* Re: [PATCH] x86 intel power: Initialize MSR_IA32_ENERGY_PERF_BIAS
From: H. Peter Anvin @ 2011-07-13 21:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, x86, linux-kernel, linux-pm, Ingo Molnar,
	Arjan van de Ven, Thomas Gleixner, Alan Cox
In-Reply-To: <CA+55aFxBkgwBbnqwO7DuizsWiVcULYtPzqr-mYrT6aHmXV0zZw@mail.gmail.com>

On 07/13/2011 01:49 PM, Linus Torvalds wrote:
> Ack. Let's just do this. Ingo?
> 
>             Linus

Ingo is travelling this week, but this seems to have converged.

	-hpa

^ permalink raw reply

* Re: [PATCH] x86 intel power: Initialize MSR_IA32_ENERGY_PERF_BIAS
From: Len Brown @ 2011-07-13 20:51 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andrew Morton, Linus Torvalds, x86, linux-kernel, Thomas Gleixner,
	H. Peter Anvin, Ingo Molnar, linux-pm, stable, Alan Cox,
	Arjan van de Ven
In-Reply-To: <20110420131442.GA29418@atrey.karlin.mff.cuni.cz>

> Ok. So... what "serious bug" does this fix? You really need to use cc:
> stable less. Tweaking performance/power ratio is _not_ serious bug.

While the performance difference may not be significant,
the energy difference may be.

Some people think it is serious when Linux has worse out-of-box
energy efficiency than Windows on the same hardware.

Some people think that it is serious when their Linux distribution
has worse energy efficiency than competing Linux distributions.

Greg has given me a hard time for not cc'ing stable _enough_.
I guess the folks that answer the stable mail get to decide...

cheers,
-Len

^ permalink raw reply

* Re: [PATCH] x86 intel power: Initialize MSR_IA32_ENERGY_PERF_BIAS
From: Linus Torvalds @ 2011-07-13 20:49 UTC (permalink / raw)
  To: Len Brown
  Cc: Andrew Morton, x86, linux-kernel, Thomas Gleixner, H. Peter Anvin,
	Ingo Molnar, linux-pm, Alan Cox, Arjan van de Ven
In-Reply-To: <alpine.LFD.2.02.1107131640280.9546@x980>

Ack. Let's just do this. Ingo?

            Linus

On Wed, Jul 13, 2011 at 1:44 PM, Len Brown <lenb@kernel.org> wrote:
>
>> So how about informing users, how about making it non-silent? An informative
>> printk that also mentions the power configuration tool, etc. This solves the
>> concerns i mentioned.
>
> Something like this?
>
>                rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
>                if ((epb & 0xF) == 0) {
>                        printk_once(KERN_WARN, "x86: updated energy_perf_bias"
>                                " to 'normal' from 'performance'\n"
>                                "You can view and update epb via utility,"
>                                " such as x86_energy_perf_policy(8)\n");
>                        epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL;
>                        wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
>                }
>
>

^ permalink raw reply

* Re: [PATCH] x86 intel power: Initialize MSR_IA32_ENERGY_PERF_BIAS
From: Len Brown @ 2011-07-13 20:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, x86, linux-kernel, H. Peter Anvin, linux-pm,
	Linus Torvalds, Thomas Gleixner, Alan Cox, Arjan van de Ven
In-Reply-To: <20110415101712.GB28007@elte.hu>


> So how about informing users, how about making it non-silent? An informative 
> printk that also mentions the power configuration tool, etc. This solves the 
> concerns i mentioned.

Something like this?

                rdmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
                if ((epb & 0xF) == 0) {
                        printk_once(KERN_WARN, "x86: updated energy_perf_bias"
                                " to 'normal' from 'performance'\n"
                                "You can view and update epb via utility,"
                                " such as x86_energy_perf_policy(8)\n");
                        epb = (epb & ~0xF) | ENERGY_PERF_BIAS_NORMAL;
                        wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, epb);
                }

^ permalink raw reply

* Re: [PATCH] mrst_pmu: driver for Intel Moorestown Power Management Unit
From: Alan Cox @ 2011-07-13  9:51 UTC (permalink / raw)
  To: Len Brown; +Cc: linux-pm, linux-kernel
In-Reply-To: <alpine.LFD.2.02.1107122328180.4502@x980>

> +static struct mrst_device *pci_id_2_mrst_dev(u16 pci_dev_num)
> +{
> +	int index;
> +
> +	if ((pci_dev_num >= 0x0800) && (pci_dev_num <= 0x815))
> +		index = pci_dev_num - 0x800;
> +	else if (pci_dev_num == 0x084F)
> +		index = 22;
> +	else if (pci_dev_num == 0x4102)
> +		index = 23;
> +	else if (pci_dev_num == 0x4110)
> +		index = 24;
> +	else
> +		BUG();
> +
> +	BUG_ON(pci_dev_num != mrst_devs[index].pci_dev_num);

That strikes me as needlessly unfriendly, you could warn/return NULL and
propogate a WARN_ONCE back to the user.

> +static int __init scu_fw_check(void)
> +{
> +	int ret;
> +	u32 fw_version;
> +
> +	sfi_table_parse("OEMB", NULL, NULL, pmu_sfi_parse_oem);
> +
> +	if (ia_major < 0x6005 || ia_minor < 0x1525) {
> +		WARN(1, "mrst_pmu: IA FW version too old\n");
> +		return -1;
> +	}
> +
> +	ret = intel_scu_ipc_command(IPCMSG_FW_REVISION, 0, NULL, 0,
> +					&fw_version, 1);
> +
> +	if (ret) {
> +		WARN(1, "mrst_pmu: IPC FW version? %d\n", ret);
> +	} else {
> +		int scu_major = (fw_version >> 8) & 0xFF;
> +		int scu_minor = (fw_version >> 0) & 0xFF;
> +
> +		printk(KERN_INFO "mrst_pmu: firmware v%x\n", fw_version);
> +
> +		if ((scu_major >= 0xC0) && (scu_minor >= 0x49)) {
> +			printk(KERN_INFO "mrst_pmu: enabling S0i3\n");
> +			mrst_pmu_s0i3_enable = true;
> +		} else {
> +			WARN(1, "mrst_pmu: S0i3 disabled, old firmware %X.%X",
> +					scu_major, scu_minor);
> +		}
> +	}
> +	return 0;
> +}
> +late_initcall(scu_fw_check);

NAK. I pointed this problem with the driver to you way back - this code
gets run always - even on machines that are not in fact Moorestown which
are then going to crash.

You need to check that the platform is Moorestown (not just the CPU
either because of Oaktrail)

^ permalink raw reply

* Re: Runtime PM discussion notes
From: Rafael J. Wysocki @ 2011-07-13  9:04 UTC (permalink / raw)
  To: Paul Walmsley; +Cc: Mark Brown, linux-pm, linux-omap
In-Reply-To: <alpine.DEB.2.00.1107130105530.30229@utopia.booyaka.com>

On Wednesday, July 13, 2011, Paul Walmsley wrote:
> (cc'ing Len)
> 
> Hi Mark,
> 
> On Mon, 11 Jul 2011, Mark Brown wrote:
> 
> > The interesting bits are things like being able to kill lots of the SoC
> > core supplies when the RAM is in retention mode - the CPU needs to go
> > through its shutdown procedures.
> 
> This is indeed possible on OMAP3+ chips with TWL4030+ PMICs.  Probably 
> other PMICs also.  TI calls it "off-mode."  The N900 shipped with this 
> feature enabled.  Not sure how many other similar products did.
> 
> This can be enabled in mainline, but not all of the mainline drivers have 
> context save/restore code merged yet, so in mainline it only works with a 
> subset of drivers.
> 
> > Actually, it just occurred to me that if we're waiting for a system
> > timer and can hand that off to a suitable timer in the PMIC then we can
> > do a suspend to RAM for the deep idle state from the hardware point of
> > view.
> 
> Yep.  At LinuxCon Cambridge two years ago, we had a discussion about 
> whether it would be possible to enter ACPI S-states from CPUIdle (or some 
> idle governor) on Intel chips.  If I remember correctly, the conclusion 
> was that ACPI always disables the screen/backlight, so it would only be 
> useful for situations where that was acceptable.

The reason why you can't enter ACPI S-states from CPUidle is because you
need to go out of the idle loop to execute some ACPI-specific stuff.  Which
is not even specific to Intel chips, but to ACPI in general.

So entering ACPI S-states from idle is a no-no and I don't think it'll
change in foreseeable future.

> To the best of my (limited) knowledge, that's the only case I know of 
> where there's a hardware limitation that prevents dynamic idle from 
> reaching the same low power state as system suspend.  If someone has hard 
> details of a similar example, it would be great to know about it.

Google G1 had this problem IIRC, but I don't have any details.

Thanks,
Rafael

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox