* [v3 PATCH 0/4]: CPUIDLE/POWER: Introducing cpuidle infrastructure to POWER
@ 2009-08-27 11:49 Arun R Bharadwaj
2009-08-27 11:51 ` [PATCH 1/4]: CPUIDLE/POWER: Enable cpuidle for pSeries Arun R Bharadwaj
` (3 more replies)
0 siblings, 4 replies; 15+ messages in thread
From: Arun R Bharadwaj @ 2009-08-27 11:49 UTC (permalink / raw)
To: oel Schopp, Benjamin Herrenschmidt, Paul Mackerras,
Peter Zijlstra, Ingo Molnar, Vaidyanathan Srinivasan,
Dipankar Sarma, Balbir Singh, Gautham R Shenoy,
Pallipadi, Venkatesh, Arun R Bharadwaj
Cc: linuxppc-dev, linux-kernel
Hi,
Changes from previous iteration:
--------------------------------
* Remove the EXPORT_SYMBOL(pm_idle) from
arch/powerpc/platform/pseries/processor_idle.c and introduce a
generic cpuidle_pm_idle in cpuidle.c which was earlier assuming pm_idle
to be the default idle routine. (As suggested by Peter and Ben).
* Move the cpu_idle_wait function from arch/powerpc/platforms/pseries/setup.c
to arch/powerpc/kernel/idle.c which would prevent breaking the build of
other platforms. (As suggested by Ben).
---------------------------------------
"Cpuidle" is a CPU Power Management infrastrusture which helps manage
idle CPUs in a clean and efficient manner. The architecture can register
its driver (in this case, pseries_idle driver) so that it subscribes for
cpuidle feature. Cpuidle has a set of governors (ladder and menu),
which will decide the best idle state to be chosen for the current situation,
based on heuristics, and calculates the expected residency time
for the current idle state. So based on this, the cpu is put into
the right idle state.
Currently, cpuidle infrasture is exploited by ACPI to choose between
the available ACPI C-states. This patch-set is aimed at enabling
cpuidle for powerpc and provides a sample implementation for pseries.
Currently, in the pseries_dedicated_idle_sleep(), the processor would
poll for a time period, which is called the snooze, and only then it
is ceded, which would put the processor in nap state. Cpuidle aims at
separating this into 2 different idle states. Based on the expected
residency time predicted by the cpuidle governor, the idle state is
chosen directly. So, choosing to enter the nap state directly based on
the decision made by cpuidle would avoid unnecessary snoozing before
entering nap.
This patch-set tries to achieve the above objective by introducing a
pseries processor idle driver called pseries_idle_driver in
arch/powerpc/platform/pseries/processor_idle.c, which implements the
idle loop which would replace the pseries_dedicated_idle_sleep()
when cpuidle is enabled.
Experiment conducted:
----------------------
The following experiment was conducted on a completely idle JS22 blade,
to prove that using cpuidle infrastructure, the amount of nap time increases.
Nap and snooze times were sampled for all the cpus.
For a window of 1000 samples, When cpuidle was enabled,
the total nap time was of the order of a few seconds (5-10s), whereas
the total snooze time was of the order of a few milliseconds(10-30 ms).
When cpuidle infrastructure was disabled and the regular
pseries_dedicated_idle_sleep() idle loop was used, the snooze time itself
was of the order of hundreds of milliseconds. (100 - 500 ms).
This is clearly due to unnecessary snoozing before napping even on a
completely idle system.
The previous post in this area can be found at
http://lkml.org/lkml/2009/8/26/233
Patches included in this set:
------------------------------
PATCH 1/4 - Enable cpuidle for pSeries.
PATCH 2/4 - Introduce architecture independent cpuidle_pm_idle in
drivers/cpuidle/cpuidle.c
PATCH 3/4 - Register for cpuidle_pm_idle in drivers/acpi/processor_idle.c
and arch/arm/mach-kirkwood/cpuidle.c
PATCH 4/4 - Implement Pseries Processor Idle idle module
--arun
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/4]: CPUIDLE/POWER: Enable cpuidle for pSeries.
2009-08-27 11:49 [v3 PATCH 0/4]: CPUIDLE/POWER: Introducing cpuidle infrastructure to POWER Arun R Bharadwaj
@ 2009-08-27 11:51 ` Arun R Bharadwaj
2009-08-27 11:53 ` [PATCH 2/4]: CPUIDLE: Introduce architecture independent cpuidle_pm_idle in drivers/cpuidle/cpuidle.c Arun R Bharadwaj
` (2 subsequent siblings)
3 siblings, 0 replies; 15+ messages in thread
From: Arun R Bharadwaj @ 2009-08-27 11:51 UTC (permalink / raw)
To: Joel Schopp, Benjamin Herrenschmidt, Paul Mackerras,
Peter Zijlstra, Ingo Molnar, Vaidyanathan Srinivasan,
Dipankar Sarma, Balbir Singh, Gautham R Shenoy,
Pallipadi, Venkatesh, Arun Bharadwaj
Cc: linuxppc-dev, linux-kernel
* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-08-27 17:19:08]:
This patch enables the cpuidle option in Kconfig for pSeries.
Currently cpuidle infrastructure is enabled only for x86 and ARM.
This code is almost completely borrowed from x86 to enable
cpuidle for pSeries.
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
arch/powerpc/Kconfig | 17 +++++++++++++++++
arch/powerpc/include/asm/system.h | 2 ++
arch/powerpc/kernel/idle.c | 19 +++++++++++++++++++
3 files changed, 38 insertions(+)
Index: linux.trees.git/arch/powerpc/Kconfig
===================================================================
--- linux.trees.git.orig/arch/powerpc/Kconfig
+++ linux.trees.git/arch/powerpc/Kconfig
@@ -88,6 +88,9 @@ config ARCH_HAS_ILOG2_U64
bool
default y if 64BIT
+config ARCH_HAS_CPU_IDLE_WAIT
+ def_bool y
+
config GENERIC_HWEIGHT
bool
default y
@@ -243,6 +246,20 @@ source "kernel/Kconfig.freezer"
source "arch/powerpc/sysdev/Kconfig"
source "arch/powerpc/platforms/Kconfig"
+menu "Power management options"
+
+source "drivers/cpuidle/Kconfig"
+
+config PSERIES_PROCESSOR_IDLE
+ bool "Idle Power Management Support for pSeries"
+ depends on PPC_PSERIES && CPU_IDLE
+ default y
+ help
+ Idle Power Management Support for pSeries. This hooks onto cpuidle
+ infrastructure to help in idle cpu power management.
+
+endmenu
+
menu "Kernel options"
config HIGHMEM
Index: linux.trees.git/arch/powerpc/include/asm/system.h
===================================================================
--- linux.trees.git.orig/arch/powerpc/include/asm/system.h
+++ linux.trees.git/arch/powerpc/include/asm/system.h
@@ -546,5 +546,7 @@ extern void account_system_vtime(struct
extern struct dentry *powerpc_debugfs_root;
+void cpu_idle_wait(void);
+
#endif /* __KERNEL__ */
#endif /* _ASM_POWERPC_SYSTEM_H */
Index: linux.trees.git/arch/powerpc/kernel/idle.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/kernel/idle.c
+++ linux.trees.git/arch/powerpc/kernel/idle.c
@@ -102,6 +102,25 @@ void cpu_idle(void)
}
}
+static void do_nothing(void *unused)
+{
+}
+
+/*
+ * cpu_idle_wait - Used to ensure that all the CPUs discard old value of
+ * ppc_md.power_save and update to new value.
+ * Required while changing ppc_md.power_save handler on SMP systems.
+ * Caller must have changed ppc_md.power_save to the new value before the call.
+ */
+void cpu_idle_wait(void)
+{
+ /* Ensure that new value of ppc_md.power_save is set */
+ smp_mb();
+ /* kick all the CPUs so that they exit out of ppc_md.power_save */
+ smp_call_function(do_nothing, NULL, 1);
+}
+EXPORT_SYMBOL_GPL(cpu_idle_wait);
+
int powersave_nap;
#ifdef CONFIG_SYSCTL
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/4]: CPUIDLE: Introduce architecture independent cpuidle_pm_idle in drivers/cpuidle/cpuidle.c
2009-08-27 11:49 [v3 PATCH 0/4]: CPUIDLE/POWER: Introducing cpuidle infrastructure to POWER Arun R Bharadwaj
2009-08-27 11:51 ` [PATCH 1/4]: CPUIDLE/POWER: Enable cpuidle for pSeries Arun R Bharadwaj
@ 2009-08-27 11:53 ` Arun R Bharadwaj
2009-08-27 12:53 ` Peter Zijlstra
2009-08-27 11:55 ` [PATCH 3/4]: ACPI/ARM: Register for cpuidle_pm_idle in drivers/acpi/processor_idle.c and arch/arm/mach-kirkwood/cpuidle.c Arun R Bharadwaj
2009-08-27 11:57 ` [PATCH 4/4]: CPUIDLE/POWER: Implement Pseries Processor Idle module Arun R Bharadwaj
3 siblings, 1 reply; 15+ messages in thread
From: Arun R Bharadwaj @ 2009-08-27 11:53 UTC (permalink / raw)
To: Joel Schopp, Benjamin Herrenschmidt, Paul Mackerras,
Peter Zijlstra, Ingo Molnar, Vaidyanathan Srinivasan,
Dipankar Sarma, Balbir Singh, Gautham R Shenoy,
Pallipadi, Venkatesh, Arun Bharadwaj
Cc: linuxppc-dev, linux-kernel
* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-08-27 17:19:08]:
Cpuidle infrastructure assumes pm_idle as the default idle routine.
But, ppc_md.power_save is the default idle callback in case of pSeries.
So, create a more generic, architecture independent cpuidle_pm_idle
function pointer in driver/cpuidle/cpuidle.c and allow the idle routines
of architectures to be set to cpuidle_pm_idle.
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
drivers/cpuidle/cpuidle.c | 12 +++++++-----
include/linux/cpuidle.h | 7 +++++++
2 files changed, 14 insertions(+), 5 deletions(-)
Index: linux.trees.git/drivers/cpuidle/cpuidle.c
===================================================================
--- linux.trees.git.orig/drivers/cpuidle/cpuidle.c
+++ linux.trees.git/drivers/cpuidle/cpuidle.c
@@ -25,6 +25,7 @@ DEFINE_PER_CPU(struct cpuidle_device *,
DEFINE_MUTEX(cpuidle_lock);
LIST_HEAD(cpuidle_detected_devices);
static void (*pm_idle_old)(void);
+void (*cpuidle_pm_idle)(void);
static int enabled_devices;
@@ -98,10 +99,10 @@ static void cpuidle_idle_call(void)
*/
void cpuidle_install_idle_handler(void)
{
- if (enabled_devices && (pm_idle != cpuidle_idle_call)) {
+ if (enabled_devices && (cpuidle_pm_idle != cpuidle_idle_call)) {
/* Make sure all changes finished before we switch to new idle */
smp_wmb();
- pm_idle = cpuidle_idle_call;
+ cpuidle_pm_idle = cpuidle_idle_call;
}
}
@@ -110,8 +111,9 @@ void cpuidle_install_idle_handler(void)
*/
void cpuidle_uninstall_idle_handler(void)
{
- if (enabled_devices && pm_idle_old && (pm_idle != pm_idle_old)) {
- pm_idle = pm_idle_old;
+ if (enabled_devices && pm_idle_old &&
+ (cpuidle_pm_idle != pm_idle_old)) {
+ cpuidle_pm_idle = pm_idle_old;
cpuidle_kick_cpus();
}
}
@@ -382,7 +384,7 @@ static int __init cpuidle_init(void)
{
int ret;
- pm_idle_old = pm_idle;
+ pm_idle_old = cpuidle_pm_idle;
ret = cpuidle_add_class_sysfs(&cpu_sysdev_class);
if (ret)
Index: linux.trees.git/include/linux/cpuidle.h
===================================================================
--- linux.trees.git.orig/include/linux/cpuidle.h
+++ linux.trees.git/include/linux/cpuidle.h
@@ -188,4 +188,11 @@ static inline void cpuidle_unregister_go
#define CPUIDLE_DRIVER_STATE_START 0
#endif
+/*
+ * Idle callback used by cpuidle to call the cpuidle_idle_call().
+ * Platform drivers can use this to register to cpuidle's idle loop.
+ */
+
+extern void (*cpuidle_pm_idle)(void);
+
#endif /* _LINUX_CPUIDLE_H */
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 3/4]: ACPI/ARM: Register for cpuidle_pm_idle in drivers/acpi/processor_idle.c and arch/arm/mach-kirkwood/cpuidle.c
2009-08-27 11:49 [v3 PATCH 0/4]: CPUIDLE/POWER: Introducing cpuidle infrastructure to POWER Arun R Bharadwaj
2009-08-27 11:51 ` [PATCH 1/4]: CPUIDLE/POWER: Enable cpuidle for pSeries Arun R Bharadwaj
2009-08-27 11:53 ` [PATCH 2/4]: CPUIDLE: Introduce architecture independent cpuidle_pm_idle in drivers/cpuidle/cpuidle.c Arun R Bharadwaj
@ 2009-08-27 11:55 ` Arun R Bharadwaj
2009-08-27 11:57 ` [PATCH 4/4]: CPUIDLE/POWER: Implement Pseries Processor Idle module Arun R Bharadwaj
3 siblings, 0 replies; 15+ messages in thread
From: Arun R Bharadwaj @ 2009-08-27 11:55 UTC (permalink / raw)
To: Joel Schopp, Benjamin Herrenschmidt, Paul Mackerras,
Peter Zijlstra, Ingo Molnar, Vaidyanathan Srinivasan,
Dipankar Sarma, Balbir Singh, Gautham R Shenoy,
Pallipadi, Venkatesh, Arun Bharadwaj
Cc: linuxppc-dev, linux-kernel
* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-08-27 17:19:08]:
Set the idle routine to cpuidle_pm_idle after registering cpuidle
devices. Earlier pm_idle was assumed as the defualt idle loop by
cpuidle infrastructure. This is changed to an architecture independent
cpuidle_pm_idle.
There are 2 instances which are using cpuidle infrastructure currently.
This patch makes the change in both the places.
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
arch/arm/mach-kirkwood/cpuidle.c | 6 ++++++
drivers/acpi/processor_idle.c | 5 +++++
2 files changed, 11 insertions(+)
Index: linux.trees.git/arch/arm/mach-kirkwood/cpuidle.c
===================================================================
--- linux.trees.git.orig/arch/arm/mach-kirkwood/cpuidle.c
+++ linux.trees.git/arch/arm/mach-kirkwood/cpuidle.c
@@ -90,6 +90,12 @@ static int kirkwood_init_cpuidle(void)
printk(KERN_ERR "kirkwood_init_cpuidle: Failed registering\n");
return -EIO;
}
+
+ if (pm_idle != cpuidle_pm_idle) {
+ printk(KERN_INFO "using cpuidle idle loop\n");
+ pm_idle = cpuidle_pm_idle;
+ }
+
return 0;
}
Index: linux.trees.git/drivers/acpi/processor_idle.c
===================================================================
--- linux.trees.git.orig/drivers/acpi/processor_idle.c
+++ linux.trees.git/drivers/acpi/processor_idle.c
@@ -1216,6 +1216,11 @@ int __cpuinit acpi_processor_power_init(
printk(" C%d[C%d]", i,
pr->power.states[i].type);
printk(")\n");
+
+ if (pm_idle != cpuidle_pm_idle) {
+ printk(KERN_INFO "using cpuidle idle loop\n");
+ pm_idle = cpuidle_pm_idle;
+ }
}
/* 'power' [R] */
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 4/4]: CPUIDLE/POWER: Implement Pseries Processor Idle module
2009-08-27 11:49 [v3 PATCH 0/4]: CPUIDLE/POWER: Introducing cpuidle infrastructure to POWER Arun R Bharadwaj
` (2 preceding siblings ...)
2009-08-27 11:55 ` [PATCH 3/4]: ACPI/ARM: Register for cpuidle_pm_idle in drivers/acpi/processor_idle.c and arch/arm/mach-kirkwood/cpuidle.c Arun R Bharadwaj
@ 2009-08-27 11:57 ` Arun R Bharadwaj
3 siblings, 0 replies; 15+ messages in thread
From: Arun R Bharadwaj @ 2009-08-27 11:57 UTC (permalink / raw)
To: Joel Schopp, Benjamin Herrenschmidt, Paul Mackerras,
Peter Zijlstra, Ingo Molnar, Vaidyanathan Srinivasan,
Dipankar Sarma, Balbir Singh, Gautham R Shenoy,
Pallipadi, Venkatesh, Arun Bharadwaj
Cc: linuxppc-dev, linux-kernel
* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-08-27 17:19:08]:
This patch creates arch/powerpc/platforms/pseries/processor_idle.c,
which implements the cpuidle infrastructure for pseries.
It implements a pseries_cpuidle_loop() which would be the main idle loop
called from cpu_idle(). It makes decision of entering either snooze or nap
state based on the decision taken by the cpuidle governor.
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
arch/powerpc/platforms/pseries/Makefile | 1
arch/powerpc/platforms/pseries/processor_idle.c | 178 ++++++++++++++++++++++++
arch/powerpc/platforms/pseries/pseries.h | 14 +
arch/powerpc/platforms/pseries/setup.c | 3
4 files changed, 193 insertions(+), 3 deletions(-)
Index: linux.trees.git/arch/powerpc/platforms/pseries/Makefile
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/pseries/Makefile
+++ linux.trees.git/arch/powerpc/platforms/pseries/Makefile
@@ -26,3 +26,4 @@ obj-$(CONFIG_HCALL_STATS) += hvCall_inst
obj-$(CONFIG_PHYP_DUMP) += phyp_dump.o
obj-$(CONFIG_CMM) += cmm.o
obj-$(CONFIG_DTL) += dtl.o
+obj-$(CONFIG_PSERIES_PROCESSOR_IDLE) += processor_idle.o
Index: linux.trees.git/arch/powerpc/platforms/pseries/pseries.h
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/pseries/pseries.h
+++ linux.trees.git/arch/powerpc/platforms/pseries/pseries.h
@@ -10,6 +10,8 @@
#ifndef _PSERIES_PSERIES_H
#define _PSERIES_PSERIES_H
+#include <linux/cpuidle.h>
+
extern void __init fw_feature_init(const char *hypertas, unsigned long len);
struct pt_regs;
@@ -40,4 +42,16 @@ extern unsigned long rtas_poweron_auto;
extern void find_udbg_vterm(void);
+DECLARE_PER_CPU(unsigned long, smt_snooze_delay);
+
+#ifdef CONFIG_PSERIES_PROCESSOR_IDLE
+struct pseries_processor_power {
+ struct cpuidle_device dev;
+ int count;
+ int id;
+};
+
+extern struct cpuidle_driver pseries_idle_driver;
+#endif
+
#endif /* _PSERIES_PSERIES_H */
Index: linux.trees.git/arch/powerpc/platforms/pseries/processor_idle.c
===================================================================
--- /dev/null
+++ linux.trees.git/arch/powerpc/platforms/pseries/processor_idle.c
@@ -0,0 +1,178 @@
+/*
+ * processor_idle - idle state cpuidle driver.
+ * Adapted from drivers/acpi/processor_idle.c
+ *
+ * Arun R Bharadwaj <arun@linux.vnet.ibm.com>
+ *
+ * Copyright (C) 2009 IBM Corporation.
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or (at
+ * your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/moduleparam.h>
+#include <linux/cpuidle.h>
+
+#include <asm/paca.h>
+#include <asm/reg.h>
+#include <asm/machdep.h>
+
+#include "plpar_wrappers.h"
+#include "pseries.h"
+
+MODULE_AUTHOR("Arun R Bharadwaj");
+MODULE_DESCRIPTION("pSeries Idle State Driver");
+MODULE_LICENSE("GPL");
+
+struct cpuidle_driver pseries_idle_driver = {
+ .name = "pseries_idle",
+ .owner = THIS_MODULE,
+};
+
+DEFINE_PER_CPU(struct pseries_processor_power, power);
+
+#define IDLE_STATE_COUNT 2
+
+static int pseries_idle_init(struct pseries_processor_power *power)
+{
+ return cpuidle_register_device(&power->dev);
+}
+
+static void snooze(void)
+{
+ local_irq_enable();
+ set_thread_flag(TIF_POLLING_NRFLAG);
+ while (!need_resched()) {
+ ppc64_runlatch_off();
+ HMT_low();
+ HMT_very_low();
+ }
+ HMT_medium();
+ clear_thread_flag(TIF_POLLING_NRFLAG);
+ smp_mb();
+ local_irq_disable();
+}
+
+static void nap(void)
+{
+ ppc64_runlatch_off();
+ HMT_medium();
+ cede_processor();
+}
+
+static int pseries_cpuidle_loop(struct cpuidle_device *dev,
+ struct cpuidle_state *st)
+{
+ ktime_t t1, t2;
+ s64 diff;
+ int ret;
+ unsigned long in_purr, out_purr;
+
+ get_lppaca()->idle = 1;
+ get_lppaca()->donate_dedicated_cpu = 1;
+ in_purr = mfspr(SPRN_PURR);
+
+ t1 = ktime_get();
+
+ if (strcmp(st->desc, "snooze") == 0)
+ snooze();
+ else
+ nap();
+
+ t2 = ktime_get();
+ diff = ktime_to_us(ktime_sub(t2, t1));
+ if (diff > INT_MAX)
+ diff = INT_MAX;
+
+ ret = (int) diff;
+
+ out_purr = mfspr(SPRN_PURR);
+ get_lppaca()->wait_state_cycles += out_purr - in_purr;
+ get_lppaca()->donate_dedicated_cpu = 0;
+ get_lppaca()->idle = 0;
+
+ return ret;
+}
+
+static int pseries_setup_cpuidle(struct pseries_processor_power *power)
+{
+ int i;
+ struct cpuidle_state *state;
+ struct cpuidle_device *dev = &power->dev;
+
+ dev->cpu = power->id;
+
+ dev->enabled = 0;
+ for (i = 0; i < IDLE_STATE_COUNT; i++) {
+ state = &dev->states[i];
+
+ snprintf(state->name, CPUIDLE_NAME_LEN, "IDLE%d", i);
+ state->enter = pseries_cpuidle_loop;
+
+ switch (i) {
+ case 0:
+ strncpy(state->desc, "snooze", CPUIDLE_DESC_LEN);
+ state->exit_latency = 0;
+ state->target_residency = 0;
+ break;
+
+ case 1:
+ strncpy(state->desc, "nap", CPUIDLE_DESC_LEN);
+ state->exit_latency = 1;
+ state->target_residency =
+ __get_cpu_var(smt_snooze_delay);
+ break;
+ }
+ }
+
+ power->dev.state_count = i;
+ return 0;
+}
+
+static int pseries_get_power_info(struct pseries_processor_power *power,
+ int cpu)
+{
+ power->id = cpu;
+ power->count = IDLE_STATE_COUNT;
+ return 0;
+}
+
+static int __init pseries_processor_idle_init(void)
+{
+ int cpu;
+ int result = cpuidle_register_driver(&pseries_idle_driver);
+
+ if (result < 0)
+ return result;
+
+ printk(KERN_DEBUG "pSeries idle driver registered\n");
+
+ for_each_online_cpu(cpu) {
+ pseries_get_power_info(&per_cpu(power, cpu), cpu);
+ pseries_setup_cpuidle(&per_cpu(power, cpu));
+ pseries_idle_init(&per_cpu(power, cpu));
+ }
+
+ printk(KERN_DEBUG "Using cpuidle idle loop\n");
+ ppc_md.power_save = cpuidle_pm_idle;
+ return 0;
+}
+
+late_initcall(pseries_processor_idle_init);
Index: linux.trees.git/arch/powerpc/platforms/pseries/setup.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/pseries/setup.c
+++ linux.trees.git/arch/powerpc/platforms/pseries/setup.c
@@ -500,9 +500,6 @@ static int __init pSeries_probe(void)
return 1;
}
-
-DECLARE_PER_CPU(unsigned long, smt_snooze_delay);
-
static void pseries_dedicated_idle_sleep(void)
{
unsigned int cpu = smp_processor_id();
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4]: CPUIDLE: Introduce architecture independent cpuidle_pm_idle in drivers/cpuidle/cpuidle.c
2009-08-27 11:53 ` [PATCH 2/4]: CPUIDLE: Introduce architecture independent cpuidle_pm_idle in drivers/cpuidle/cpuidle.c Arun R Bharadwaj
@ 2009-08-27 12:53 ` Peter Zijlstra
2009-08-27 21:28 ` Benjamin Herrenschmidt
` (3 more replies)
0 siblings, 4 replies; 15+ messages in thread
From: Peter Zijlstra @ 2009-08-27 12:53 UTC (permalink / raw)
To: arun
Cc: Gautham R Shenoy, Pallipadi, Venkatesh, linux-kernel,
Paul Mackerras, Ingo Molnar, linuxppc-dev, Balbir Singh
On Thu, 2009-08-27 at 17:23 +0530, Arun R Bharadwaj wrote:
> * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-08-27 17:19:08]:
>
> Cpuidle infrastructure assumes pm_idle as the default idle routine.
> But, ppc_md.power_save is the default idle callback in case of pSeries.
>
> So, create a more generic, architecture independent cpuidle_pm_idle
> function pointer in driver/cpuidle/cpuidle.c and allow the idle routines
> of architectures to be set to cpuidle_pm_idle.
>
> Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
> ---
> drivers/cpuidle/cpuidle.c | 12 +++++++-----
> include/linux/cpuidle.h | 7 +++++++
> 2 files changed, 14 insertions(+), 5 deletions(-)
>
> Index: linux.trees.git/drivers/cpuidle/cpuidle.c
> ===================================================================
> --- linux.trees.git.orig/drivers/cpuidle/cpuidle.c
> +++ linux.trees.git/drivers/cpuidle/cpuidle.c
> @@ -25,6 +25,7 @@ DEFINE_PER_CPU(struct cpuidle_device *,
> DEFINE_MUTEX(cpuidle_lock);
> LIST_HEAD(cpuidle_detected_devices);
> static void (*pm_idle_old)(void);
> +void (*cpuidle_pm_idle)(void);
>
> static int enabled_devices;
>
> @@ -98,10 +99,10 @@ static void cpuidle_idle_call(void)
> */
> void cpuidle_install_idle_handler(void)
> {
> - if (enabled_devices && (pm_idle != cpuidle_idle_call)) {
> + if (enabled_devices && (cpuidle_pm_idle != cpuidle_idle_call)) {
> /* Make sure all changes finished before we switch to new idle */
> smp_wmb();
> - pm_idle = cpuidle_idle_call;
> + cpuidle_pm_idle = cpuidle_idle_call;
> }
> }
>
> @@ -110,8 +111,9 @@ void cpuidle_install_idle_handler(void)
> */
> void cpuidle_uninstall_idle_handler(void)
> {
> - if (enabled_devices && pm_idle_old && (pm_idle != pm_idle_old)) {
> - pm_idle = pm_idle_old;
> + if (enabled_devices && pm_idle_old &&
> + (cpuidle_pm_idle != pm_idle_old)) {
> + cpuidle_pm_idle = pm_idle_old;
> cpuidle_kick_cpus();
> }
> }
> @@ -382,7 +384,7 @@ static int __init cpuidle_init(void)
> {
> int ret;
>
> - pm_idle_old = pm_idle;
> + pm_idle_old = cpuidle_pm_idle;
>
> ret = cpuidle_add_class_sysfs(&cpu_sysdev_class);
> if (ret)
> Index: linux.trees.git/include/linux/cpuidle.h
> ===================================================================
> --- linux.trees.git.orig/include/linux/cpuidle.h
> +++ linux.trees.git/include/linux/cpuidle.h
> @@ -188,4 +188,11 @@ static inline void cpuidle_unregister_go
> #define CPUIDLE_DRIVER_STATE_START 0
> #endif
>
> +/*
> + * Idle callback used by cpuidle to call the cpuidle_idle_call().
> + * Platform drivers can use this to register to cpuidle's idle loop.
> + */
> +
> +extern void (*cpuidle_pm_idle)(void);
> +
> #endif /* _LINUX_CPUIDLE_H */
I'm not quite seeing how this makes anything any better. Not we have 3
function pointers, where 1 should suffice.
/me wonders what's wrong with something like:
struct idle_func_desc {
int power;
int latency;
void (*idle)(void);
struct list_head list;
};
static void spin_idle(void)
{
for (;;)
cpu_relax();
}
static idle_func_desc default_idle_func = {
power = 0, /* doesn't safe any power */
latency = INT_MAX, /* has max latency */
idle = spin_idle,
list = INIT_LIST_HEAD(default_idle_func.list),
};
void (*idle_func)(void);
static struct list_head idle_func_list;
static void pick_idle_func(void)
{
struct idle_func_desc *desc, *idle = &default_idle_desc;
list_for_each_entry(desc, &idle_func_list, list) {
if (desc->power < idle->power)
continue;
if (desc->latency > target_latency);
continue;
idle = desc;
}
pm_idle = idle->idle;
}
void register_idle_func(struct idle_func_desc *desc)
{
WARN_ON_ONCE(!list_empty(&desc->list));
list_add_tail(&idle_func_list, &desc->list);
pick_idle_func();
}
void unregister_idle_func(struct idle_func_desc *desc)
{
WARN_ON_ONCE(list_empty(&desc->list));
list_del_init(&desc->list);
if (idle_func == desc->idle)
pick_idle_func();
}
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4]: CPUIDLE: Introduce architecture independent cpuidle_pm_idle in drivers/cpuidle/cpuidle.c
2009-08-27 12:53 ` Peter Zijlstra
@ 2009-08-27 21:28 ` Benjamin Herrenschmidt
2009-08-28 4:49 ` Arun R Bharadwaj
` (2 subsequent siblings)
3 siblings, 0 replies; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2009-08-27 21:28 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Gautham R Shenoy, Pallipadi, Venkatesh, linux-kernel,
Paul Mackerras, arun, Ingo Molnar, linuxppc-dev, Balbir Singh
On Thu, 2009-08-27 at 14:53 +0200, Peter Zijlstra wrote:
> I'm not quite seeing how this makes anything any better. Not we have 3
> function pointers, where 1 should suffice.
There's also the question of us having different "idle" vs.
"power_save", the former being the entire idle loop, the later being the
part that does put the processor into power.
At what level are we trying to change the loop here ?
There are some requirements of things to do in our idle loop that really
don't have their place in generic drivers/* code.
Ben.
> /me wonders what's wrong with something like:
>
> struct idle_func_desc {
> int power;
> int latency;
> void (*idle)(void);
> struct list_head list;
> };
>
> static void spin_idle(void)
> {
> for (;;)
> cpu_relax();
> }
>
> static idle_func_desc default_idle_func = {
> power = 0, /* doesn't safe any power */
> latency = INT_MAX, /* has max latency */
> idle = spin_idle,
> list = INIT_LIST_HEAD(default_idle_func.list),
> };
>
> void (*idle_func)(void);
> static struct list_head idle_func_list;
>
> static void pick_idle_func(void)
> {
> struct idle_func_desc *desc, *idle = &default_idle_desc;
>
> list_for_each_entry(desc, &idle_func_list, list) {
> if (desc->power < idle->power)
> continue;
> if (desc->latency > target_latency);
> continue;
> idle = desc;
> }
>
> pm_idle = idle->idle;
> }
>
> void register_idle_func(struct idle_func_desc *desc)
> {
> WARN_ON_ONCE(!list_empty(&desc->list));
>
> list_add_tail(&idle_func_list, &desc->list);
> pick_idle_func();
> }
>
> void unregister_idle_func(struct idle_func_desc *desc)
> {
> WARN_ON_ONCE(list_empty(&desc->list));
>
> list_del_init(&desc->list);
> if (idle_func == desc->idle)
> pick_idle_func();
> }
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4]: CPUIDLE: Introduce architecture independent cpuidle_pm_idle in drivers/cpuidle/cpuidle.c
2009-08-27 12:53 ` Peter Zijlstra
2009-08-27 21:28 ` Benjamin Herrenschmidt
@ 2009-08-28 4:49 ` Arun R Bharadwaj
2009-08-28 6:40 ` Peter Zijlstra
2009-08-28 6:14 ` Arun R Bharadwaj
2009-08-28 6:43 ` Arun R Bharadwaj
3 siblings, 1 reply; 15+ messages in thread
From: Arun R Bharadwaj @ 2009-08-28 4:49 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Gautham R Shenoy, Pallipadi, Venkatesh, linux-kernel,
Paul Mackerras, Arun Bharadwaj, Ingo Molnar, linuxppc-dev,
Balbir Singh
* Peter Zijlstra <a.p.zijlstra@chello.nl> [2009-08-27 14:53:27]:
> On Thu, 2009-08-27 at 17:23 +0530, Arun R Bharadwaj wrote:
> > * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-08-27 17:19:08]:
> >
> > Cpuidle infrastructure assumes pm_idle as the default idle routine.
> > But, ppc_md.power_save is the default idle callback in case of pSeries.
> >
> > So, create a more generic, architecture independent cpuidle_pm_idle
> > function pointer in driver/cpuidle/cpuidle.c and allow the idle routines
> > of architectures to be set to cpuidle_pm_idle.
> >
> > Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
> > ---
> > drivers/cpuidle/cpuidle.c | 12 +++++++-----
> > include/linux/cpuidle.h | 7 +++++++
> > 2 files changed, 14 insertions(+), 5 deletions(-)
> >
> > Index: linux.trees.git/drivers/cpuidle/cpuidle.c
> > ===================================================================
> > --- linux.trees.git.orig/drivers/cpuidle/cpuidle.c
> > +++ linux.trees.git/drivers/cpuidle/cpuidle.c
> > @@ -25,6 +25,7 @@ DEFINE_PER_CPU(struct cpuidle_device *,
> > DEFINE_MUTEX(cpuidle_lock);
> > LIST_HEAD(cpuidle_detected_devices);
> > static void (*pm_idle_old)(void);
> > +void (*cpuidle_pm_idle)(void);
> >
> > static int enabled_devices;
> >
> > @@ -98,10 +99,10 @@ static void cpuidle_idle_call(void)
> > */
> > void cpuidle_install_idle_handler(void)
> > {
> > - if (enabled_devices && (pm_idle != cpuidle_idle_call)) {
> > + if (enabled_devices && (cpuidle_pm_idle != cpuidle_idle_call)) {
> > /* Make sure all changes finished before we switch to new idle */
> > smp_wmb();
> > - pm_idle = cpuidle_idle_call;
> > + cpuidle_pm_idle = cpuidle_idle_call;
> > }
> > }
> >
> > @@ -110,8 +111,9 @@ void cpuidle_install_idle_handler(void)
> > */
> > void cpuidle_uninstall_idle_handler(void)
> > {
> > - if (enabled_devices && pm_idle_old && (pm_idle != pm_idle_old)) {
> > - pm_idle = pm_idle_old;
> > + if (enabled_devices && pm_idle_old &&
> > + (cpuidle_pm_idle != pm_idle_old)) {
> > + cpuidle_pm_idle = pm_idle_old;
> > cpuidle_kick_cpus();
> > }
> > }
> > @@ -382,7 +384,7 @@ static int __init cpuidle_init(void)
> > {
> > int ret;
> >
> > - pm_idle_old = pm_idle;
> > + pm_idle_old = cpuidle_pm_idle;
> >
> > ret = cpuidle_add_class_sysfs(&cpu_sysdev_class);
> > if (ret)
> > Index: linux.trees.git/include/linux/cpuidle.h
> > ===================================================================
> > --- linux.trees.git.orig/include/linux/cpuidle.h
> > +++ linux.trees.git/include/linux/cpuidle.h
> > @@ -188,4 +188,11 @@ static inline void cpuidle_unregister_go
> > #define CPUIDLE_DRIVER_STATE_START 0
> > #endif
> >
> > +/*
> > + * Idle callback used by cpuidle to call the cpuidle_idle_call().
> > + * Platform drivers can use this to register to cpuidle's idle loop.
> > + */
> > +
> > +extern void (*cpuidle_pm_idle)(void);
> > +
> > #endif /* _LINUX_CPUIDLE_H */
>
>
> I'm not quite seeing how this makes anything any better. Not we have 3
> function pointers, where 1 should suffice.
>
Not really. We already do have pm_idle in case of x86 and
ppc_md.power_save in case of POWER. So here I'm only introducing
cpuidle_pm_idle which can be used by doing a
ppc_md.power_save = cpuidle_pm_idle;
> /me wonders what's wrong with something like:
>
> struct idle_func_desc {
> int power;
> int latency;
> void (*idle)(void);
> struct list_head list;
> };
>
> static void spin_idle(void)
> {
> for (;;)
> cpu_relax();
> }
>
> static idle_func_desc default_idle_func = {
> power = 0, /* doesn't safe any power */
> latency = INT_MAX, /* has max latency */
> idle = spin_idle,
> list = INIT_LIST_HEAD(default_idle_func.list),
> };
>
> void (*idle_func)(void);
> static struct list_head idle_func_list;
>
> static void pick_idle_func(void)
> {
> struct idle_func_desc *desc, *idle = &default_idle_desc;
>
> list_for_each_entry(desc, &idle_func_list, list) {
> if (desc->power < idle->power)
> continue;
> if (desc->latency > target_latency);
> continue;
> idle = desc;
> }
>
> pm_idle = idle->idle;
> }
>
This only does the job of picking the right idle loop for current
latency and power requirement. This is already done in ladder/menu
governors under the routines menu_select()/ladder_select().
I'm not sure whats the purpose of it here.
Here we are only concerned about the main idle loop, which is
pm_idle/ppc_md.power_save. After setting the main idle loop to
cpuidle_pm_idle, that would call cpuidle_idle_call() which would do
the job of picking the right low level idle loop based on latency and
other requirements.
> void register_idle_func(struct idle_func_desc *desc)
> {
> WARN_ON_ONCE(!list_empty(&desc->list));
>
> list_add_tail(&idle_func_list, &desc->list);
> pick_idle_func();
> }
>
> void unregister_idle_func(struct idle_func_desc *desc)
> {
> WARN_ON_ONCE(list_empty(&desc->list));
>
> list_del_init(&desc->list);
> if (idle_func == desc->idle)
> pick_idle_func();
> }
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4]: CPUIDLE: Introduce architecture independent cpuidle_pm_idle in drivers/cpuidle/cpuidle.c
2009-08-27 12:53 ` Peter Zijlstra
2009-08-27 21:28 ` Benjamin Herrenschmidt
2009-08-28 4:49 ` Arun R Bharadwaj
@ 2009-08-28 6:14 ` Arun R Bharadwaj
2009-08-28 6:48 ` Peter Zijlstra
2009-08-28 6:43 ` Arun R Bharadwaj
3 siblings, 1 reply; 15+ messages in thread
From: Arun R Bharadwaj @ 2009-08-28 6:14 UTC (permalink / raw)
To: Peter Zijlstra, Benjamin Herrenschmidt
Cc: Gautham R Shenoy, Pallipadi, Venkatesh, linux-kernel,
Paul Mackerras, Arun Bharadwaj, Ingo Molnar, linuxppc-dev,
Balbir Singh
* Peter Zijlstra <a.p.zijlstra@chello.nl> [2009-08-27 14:53:27]:
Hi Peter, Ben,
I've put the whole thing in a sort of a block diagram. Hope it
explains things more clearly.
----------------
| CPUIDLE | (Select idle states like
| GOVERNORS | C1, C1e, C6 etc in case
| (Menu/Ladder)| x86 & nap, snooze in
| | case of POWER - based on
---------------- latency & power req)
^
|
|
|
|
|
---------- ----------------- -------------
| | | | | PSERIES |
| ACPI |------------------> | CPUIDLE | <--------------| IDLE |
| | | | | |
---------- ----------------- -------------
Main idle routine- pm_idle() Main idle routine-
ppc_md.power_save()
pm_idle = cpuidle_pm_idle; ppc_md.power_save =
(start using cpuidle's idle cpuidle_pm_idle();
loop, which internally calls
governor to select the right
state to go into).
Relavent code snippet from drivers/cpuidle/cpuidle.c
-------------------------------------
static void cpuidle_idle_call(void)
{
............
............
/* Call the menu_select() to select the idle state to enter. */
next_state = cpuidle_curr_governor->select(dev);
............
............
/*
* Enter the idle state previously selected. target_state->enter
* would call pseries_cpuidle_loop() which selects nap/snooze
* /
dev->last_residency = target_state->enter(dev, target_state);
}
void cpuidle_install_idle_handler(void)
{
.........
.........
cpuidle_pm_idle = cpuidle_idle_call;
}
--arun
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4]: CPUIDLE: Introduce architecture independent cpuidle_pm_idle in drivers/cpuidle/cpuidle.c
2009-08-28 4:49 ` Arun R Bharadwaj
@ 2009-08-28 6:40 ` Peter Zijlstra
0 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2009-08-28 6:40 UTC (permalink / raw)
To: arun
Cc: Gautham R Shenoy, Pallipadi, Venkatesh, linux-kernel,
Paul Mackerras, Ingo Molnar, linuxppc-dev, Balbir Singh
On Fri, 2009-08-28 at 10:19 +0530, Arun R Bharadwaj wrote:
>
>
> This only does the job of picking the right idle loop for current
> latency and power requirement. This is already done in ladder/menu
> governors under the routines menu_select()/ladder_select().
> I'm not sure whats the purpose of it here.
I can't seem to find ladder_select() but menu_select() doesn't manage
pm_idle and its not clear what it does manage.
> Here we are only concerned about the main idle loop, which is
> pm_idle/ppc_md.power_save. After setting the main idle loop to
> cpuidle_pm_idle, that would call cpuidle_idle_call() which would do
> the job of picking the right low level idle loop based on latency and
> other requirements.
It also gets pm_idle unexported and avoids anybody directly tinkering
with the function pointer, _that_ is the whole goal.
pm_idle is it exists today, and the whole cpuidle_{un,}install*() is
utter crap. It relies on unmanaged access to this function pointer.
/me stop looking at drivers/cpuidle/, convoluted mess that is, shame on
you for wanting to have anything to do with it.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4]: CPUIDLE: Introduce architecture independent cpuidle_pm_idle in drivers/cpuidle/cpuidle.c
2009-08-27 12:53 ` Peter Zijlstra
` (2 preceding siblings ...)
2009-08-28 6:14 ` Arun R Bharadwaj
@ 2009-08-28 6:43 ` Arun R Bharadwaj
3 siblings, 0 replies; 15+ messages in thread
From: Arun R Bharadwaj @ 2009-08-28 6:43 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Gautham R Shenoy, Pallipadi, Venkatesh, linux-kernel,
Paul Mackerras, Arun Bharadwaj, Ingo Molnar, linuxppc-dev,
Balbir Singh
* Peter Zijlstra <a.p.zijlstra@chello.nl> [2009-08-27 14:53:27]:
> On Thu, 2009-08-27 at 17:23 +0530, Arun R Bharadwaj wrote:
> > * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-08-27 17:19:08]:
> >
> > Cpuidle infrastructure assumes pm_idle as the default idle routine.
> > But, ppc_md.power_save is the default idle callback in case of pSeries.
> >
> > So, create a more generic, architecture independent cpuidle_pm_idle
> > function pointer in driver/cpuidle/cpuidle.c and allow the idle routines
> > of architectures to be set to cpuidle_pm_idle.
> >
> > Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
> > ---
> > drivers/cpuidle/cpuidle.c | 12 +++++++-----
> > include/linux/cpuidle.h | 7 +++++++
> > 2 files changed, 14 insertions(+), 5 deletions(-)
> >
> > Index: linux.trees.git/drivers/cpuidle/cpuidle.c
> > ===================================================================
> > --- linux.trees.git.orig/drivers/cpuidle/cpuidle.c
> > +++ linux.trees.git/drivers/cpuidle/cpuidle.c
> > @@ -25,6 +25,7 @@ DEFINE_PER_CPU(struct cpuidle_device *,
> > DEFINE_MUTEX(cpuidle_lock);
> > LIST_HEAD(cpuidle_detected_devices);
> > static void (*pm_idle_old)(void);
> > +void (*cpuidle_pm_idle)(void);
> >
> > static int enabled_devices;
> >
> > @@ -98,10 +99,10 @@ static void cpuidle_idle_call(void)
> > */
> > void cpuidle_install_idle_handler(void)
> > {
> > - if (enabled_devices && (pm_idle != cpuidle_idle_call)) {
> > + if (enabled_devices && (cpuidle_pm_idle != cpuidle_idle_call)) {
> > /* Make sure all changes finished before we switch to new idle */
> > smp_wmb();
> > - pm_idle = cpuidle_idle_call;
> > + cpuidle_pm_idle = cpuidle_idle_call;
> > }
> > }
> >
> > @@ -110,8 +111,9 @@ void cpuidle_install_idle_handler(void)
> > */
> > void cpuidle_uninstall_idle_handler(void)
> > {
> > - if (enabled_devices && pm_idle_old && (pm_idle != pm_idle_old)) {
> > - pm_idle = pm_idle_old;
> > + if (enabled_devices && pm_idle_old &&
> > + (cpuidle_pm_idle != pm_idle_old)) {
> > + cpuidle_pm_idle = pm_idle_old;
> > cpuidle_kick_cpus();
> > }
> > }
> > @@ -382,7 +384,7 @@ static int __init cpuidle_init(void)
> > {
> > int ret;
> >
> > - pm_idle_old = pm_idle;
> > + pm_idle_old = cpuidle_pm_idle;
> >
> > ret = cpuidle_add_class_sysfs(&cpu_sysdev_class);
> > if (ret)
> > Index: linux.trees.git/include/linux/cpuidle.h
> > ===================================================================
> > --- linux.trees.git.orig/include/linux/cpuidle.h
> > +++ linux.trees.git/include/linux/cpuidle.h
> > @@ -188,4 +188,11 @@ static inline void cpuidle_unregister_go
> > #define CPUIDLE_DRIVER_STATE_START 0
> > #endif
> >
> > +/*
> > + * Idle callback used by cpuidle to call the cpuidle_idle_call().
> > + * Platform drivers can use this to register to cpuidle's idle loop.
> > + */
> > +
> > +extern void (*cpuidle_pm_idle)(void);
> > +
> > #endif /* _LINUX_CPUIDLE_H */
>
>
> I'm not quite seeing how this makes anything any better. Not we have 3
> function pointers, where 1 should suffice.
>
Or, can we have something like:
(if exporting a function is ok, instead of exporting a function
pointer).
in drivers/cpuidle/cpuidle.c
void (*return_cpuidle_handler(void))(void)
{
return cpuidle_pm_idle;
}
EXPORT_SYMBOL(return_cpuidle_handler);
and from pseries/processor_idle.c,
ppc_md.power_save = return_cpuidle_handler;
--arun
> /me wonders what's wrong with something like:
>
> struct idle_func_desc {
> int power;
> int latency;
> void (*idle)(void);
> struct list_head list;
> };
>
> static void spin_idle(void)
> {
> for (;;)
> cpu_relax();
> }
>
> static idle_func_desc default_idle_func = {
> power = 0, /* doesn't safe any power */
> latency = INT_MAX, /* has max latency */
> idle = spin_idle,
> list = INIT_LIST_HEAD(default_idle_func.list),
> };
>
> void (*idle_func)(void);
> static struct list_head idle_func_list;
>
> static void pick_idle_func(void)
> {
> struct idle_func_desc *desc, *idle = &default_idle_desc;
>
> list_for_each_entry(desc, &idle_func_list, list) {
> if (desc->power < idle->power)
> continue;
> if (desc->latency > target_latency);
> continue;
> idle = desc;
> }
>
> pm_idle = idle->idle;
> }
>
> void register_idle_func(struct idle_func_desc *desc)
> {
> WARN_ON_ONCE(!list_empty(&desc->list));
>
> list_add_tail(&idle_func_list, &desc->list);
> pick_idle_func();
> }
>
> void unregister_idle_func(struct idle_func_desc *desc)
> {
> WARN_ON_ONCE(list_empty(&desc->list));
>
> list_del_init(&desc->list);
> if (idle_func == desc->idle)
> pick_idle_func();
> }
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4]: CPUIDLE: Introduce architecture independent cpuidle_pm_idle in drivers/cpuidle/cpuidle.c
2009-08-28 6:14 ` Arun R Bharadwaj
@ 2009-08-28 6:48 ` Peter Zijlstra
2009-08-28 6:59 ` Balbir Singh
2009-08-28 7:01 ` Peter Zijlstra
0 siblings, 2 replies; 15+ messages in thread
From: Peter Zijlstra @ 2009-08-28 6:48 UTC (permalink / raw)
To: arun
Cc: Gautham R Shenoy, Pallipadi, Venkatesh, linux-kernel,
Paul Mackerras, Ingo Molnar, linuxppc-dev, Balbir Singh
On Fri, 2009-08-28 at 11:44 +0530, Arun R Bharadwaj wrote:
> * Peter Zijlstra <a.p.zijlstra@chello.nl> [2009-08-27 14:53:27]:
>
> Hi Peter, Ben,
>
> I've put the whole thing in a sort of a block diagram. Hope it
> explains things more clearly.
>
>
>
>
>
>
> ----------------
> | CPUIDLE | (Select idle states like
> | GOVERNORS | C1, C1e, C6 etc in case
> | (Menu/Ladder)| x86 & nap, snooze in
> | | case of POWER - based on
> ---------------- latency & power req)
> ^
> |
> |
> |
> |
> |
> ---------- ----------------- -------------
> | | | | | PSERIES |
> | ACPI |------------------> | CPUIDLE | <--------------| IDLE |
> | | | | | |
> ---------- ----------------- -------------
>
> Main idle routine- pm_idle() Main idle routine-
> ppc_md.power_save()
>
> pm_idle = cpuidle_pm_idle; ppc_md.power_save =
> (start using cpuidle's idle cpuidle_pm_idle();
> loop, which internally calls
> governor to select the right
> state to go into).
>
>
> Relavent code snippet from drivers/cpuidle/cpuidle.c
> -------------------------------------
>
> static void cpuidle_idle_call(void)
> {
> ............
> ............
>
> /* Call the menu_select() to select the idle state to enter. */
> next_state = cpuidle_curr_governor->select(dev);
>
> ............
> ............
>
> /*
> * Enter the idle state previously selected. target_state->enter
> * would call pseries_cpuidle_loop() which selects nap/snooze
> * /
> dev->last_residency = target_state->enter(dev, target_state);
> }
>
> void cpuidle_install_idle_handler(void)
> {
> .........
> .........
> cpuidle_pm_idle = cpuidle_idle_call;
> }
All I'm seeing here is a frigging mess.
How on earths can something called: cpuidle_install_idle_handler() have
a void argument, _WHAT_ handler is it going to install?
So somehow you added to the ACPI mess by now having 3 wild function
pointers, that's _NOT_ progress.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4]: CPUIDLE: Introduce architecture independent cpuidle_pm_idle in drivers/cpuidle/cpuidle.c
2009-08-28 6:48 ` Peter Zijlstra
@ 2009-08-28 6:59 ` Balbir Singh
2009-08-28 7:01 ` Peter Zijlstra
1 sibling, 0 replies; 15+ messages in thread
From: Balbir Singh @ 2009-08-28 6:59 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Gautham R Shenoy, Pallipadi, Venkatesh, linux-kernel,
Paul Mackerras, arun, Ingo Molnar, linuxppc-dev
* Peter Zijlstra <a.p.zijlstra@chello.nl> [2009-08-28 08:48:05]:
> On Fri, 2009-08-28 at 11:44 +0530, Arun R Bharadwaj wrote:
> > * Peter Zijlstra <a.p.zijlstra@chello.nl> [2009-08-27 14:53:27]:
> >
> > Hi Peter, Ben,
> >
> > I've put the whole thing in a sort of a block diagram. Hope it
> > explains things more clearly.
> >
> >
> >
> >
> >
> >
> > ----------------
> > | CPUIDLE | (Select idle states like
> > | GOVERNORS | C1, C1e, C6 etc in case
> > | (Menu/Ladder)| x86 & nap, snooze in
> > | | case of POWER - based on
> > ---------------- latency & power req)
> > ^
> > |
> > |
> > |
> > |
> > |
> > ---------- ----------------- -------------
> > | | | | | PSERIES |
> > | ACPI |------------------> | CPUIDLE | <--------------| IDLE |
> > | | | | | |
> > ---------- ----------------- -------------
> >
> > Main idle routine- pm_idle() Main idle routine-
> > ppc_md.power_save()
> >
> > pm_idle = cpuidle_pm_idle; ppc_md.power_save =
> > (start using cpuidle's idle cpuidle_pm_idle();
> > loop, which internally calls
> > governor to select the right
> > state to go into).
> >
> >
> > Relavent code snippet from drivers/cpuidle/cpuidle.c
> > -------------------------------------
> >
> > static void cpuidle_idle_call(void)
> > {
> > ............
> > ............
> >
> > /* Call the menu_select() to select the idle state to enter. */
> > next_state = cpuidle_curr_governor->select(dev);
> >
> > ............
> > ............
> >
> > /*
> > * Enter the idle state previously selected. target_state->enter
> > * would call pseries_cpuidle_loop() which selects nap/snooze
> > * /
> > dev->last_residency = target_state->enter(dev, target_state);
> > }
> >
> > void cpuidle_install_idle_handler(void)
> > {
> > .........
> > .........
> > cpuidle_pm_idle = cpuidle_idle_call;
> > }
>
> All I'm seeing here is a frigging mess.
>
> How on earths can something called: cpuidle_install_idle_handler() have
> a void argument, _WHAT_ handler is it going to install?
>
Peter, I think that is a typo, we need a function pointer like your
snippet of code showed
> So somehow you added to the ACPI mess by now having 3 wild function
> pointers, that's _NOT_ progress.
>
The goal, IIUC is to integrate three modules
1. CPUIdle - for idle CPU management
2. Architecture specific code for idle detection
3. CPUIdle governor
Again, if IIUC
The architecture specific idle code would call an idle loop that would
call into the CPUIdle framework, which inturn would depend on the
governor selected.
Passing void* is a mess, we need function pointer registration
and a framework so that we can query what is registered.
--
Balbir
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4]: CPUIDLE: Introduce architecture independent cpuidle_pm_idle in drivers/cpuidle/cpuidle.c
2009-08-28 6:48 ` Peter Zijlstra
2009-08-28 6:59 ` Balbir Singh
@ 2009-08-28 7:01 ` Peter Zijlstra
2009-08-28 8:19 ` Vaidyanathan Srinivasan
1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2009-08-28 7:01 UTC (permalink / raw)
To: arun
Cc: Gautham R Shenoy, Pallipadi, Venkatesh, linux-kernel,
Paul Mackerras, Ingo Molnar, linuxppc-dev, Balbir Singh
On Fri, 2009-08-28 at 08:48 +0200, Peter Zijlstra wrote:
>
> > void cpuidle_install_idle_handler(void)
> > {
> > .........
> > .........
> > cpuidle_pm_idle = cpuidle_idle_call;
> > }
>
> All I'm seeing here is a frigging mess.
>
> How on earths can something called: cpuidle_install_idle_handler() have
> a void argument, _WHAT_ handler is it going to install?
Argh, now I see, it installs itself as the platform idle handler.
so cpuidle_install_idle_handler() pokes at the unmanaged pm_idle pointer
to make cpuidle take control.
On module load it does:
pm_idle_old = pm_idle;
then in the actual idle loop it does:
if (!dev || !dev->enabled) {
if (pm_idle_old)
pm_idle_old();
who is to say that the pointer stored at module init time is still
around at that time?
So cpuidle recognised the pm_idle stuff was a flaky, but instead of
fixing it, they build a whole new layer on top of it. Brilliant.
/me goes mark this whole thread read, I've got enough things to do.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4]: CPUIDLE: Introduce architecture independent cpuidle_pm_idle in drivers/cpuidle/cpuidle.c
2009-08-28 7:01 ` Peter Zijlstra
@ 2009-08-28 8:19 ` Vaidyanathan Srinivasan
0 siblings, 0 replies; 15+ messages in thread
From: Vaidyanathan Srinivasan @ 2009-08-28 8:19 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Gautham R Shenoy, Pallipadi, Venkatesh, linux-kernel,
Paul Mackerras, arun, Ingo Molnar, linuxppc-dev, Balbir Singh
* Peter Zijlstra <peterz@infradead.org> [2009-08-28 09:01:12]:
> On Fri, 2009-08-28 at 08:48 +0200, Peter Zijlstra wrote:
> >
> > > void cpuidle_install_idle_handler(void)
> > > {
> > > .........
> > > .........
> > > cpuidle_pm_idle = cpuidle_idle_call;
> > > }
> >
> > All I'm seeing here is a frigging mess.
> >
> > How on earths can something called: cpuidle_install_idle_handler() have
> > a void argument, _WHAT_ handler is it going to install?
>
> Argh, now I see, it installs itself as the platform idle handler.
>
> so cpuidle_install_idle_handler() pokes at the unmanaged pm_idle pointer
> to make cpuidle take control.
>
> On module load it does:
>
> pm_idle_old = pm_idle;
>
> then in the actual idle loop it does:
>
> if (!dev || !dev->enabled) {
> if (pm_idle_old)
> pm_idle_old();
>
> who is to say that the pointer stored at module init time is still
> around at that time?
>
> So cpuidle recognised the pm_idle stuff was a flaky, but instead of
> fixing it, they build a whole new layer on top of it. Brilliant.
>
> /me goes mark this whole thread read, I've got enough things to do.
Hi Peter,
I understand that you are frustrated with the mess. We are willing
to clean up the pm_idle pointer at the moment before the cpuidle
framework is exploited my more archs.
At this moment we need your suggestions on what interface should we
call 'clean' and safe.
cpuidle.c and the arch independent cpuidle subsystem is not a module
and its cpuidle_idle_call() routine is valid and can be safely called
from arch dependent process.c
The fragile part is how cpuidle_idle_call() is hooked onto arch
specific cpu_idle() function at runtime. x86 has the pm_idle pointer
exported while powerpc has ppc_md.power_save pointer being called.
At cpuidle init time we can override the platform idle function, but
that will mean we are including arch specific code in cpuidle.c
Do you think having an exported function in cpuidle.c to give us the
correct pointer to arch code be better than the current situation:
in drivers/cpuidle/cpuidle.c
void (*get_cpuidle_handler(void))(void)
{
return cpuidle_pm_idle;
}
EXPORT_SYMBOL(get_cpuidle_handler);
and from pseries/processor_idle.c,
ppc_md.power_save = get_cpuidle_handler();
--Vaidy
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2009-08-28 8:20 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-27 11:49 [v3 PATCH 0/4]: CPUIDLE/POWER: Introducing cpuidle infrastructure to POWER Arun R Bharadwaj
2009-08-27 11:51 ` [PATCH 1/4]: CPUIDLE/POWER: Enable cpuidle for pSeries Arun R Bharadwaj
2009-08-27 11:53 ` [PATCH 2/4]: CPUIDLE: Introduce architecture independent cpuidle_pm_idle in drivers/cpuidle/cpuidle.c Arun R Bharadwaj
2009-08-27 12:53 ` Peter Zijlstra
2009-08-27 21:28 ` Benjamin Herrenschmidt
2009-08-28 4:49 ` Arun R Bharadwaj
2009-08-28 6:40 ` Peter Zijlstra
2009-08-28 6:14 ` Arun R Bharadwaj
2009-08-28 6:48 ` Peter Zijlstra
2009-08-28 6:59 ` Balbir Singh
2009-08-28 7:01 ` Peter Zijlstra
2009-08-28 8:19 ` Vaidyanathan Srinivasan
2009-08-28 6:43 ` Arun R Bharadwaj
2009-08-27 11:55 ` [PATCH 3/4]: ACPI/ARM: Register for cpuidle_pm_idle in drivers/acpi/processor_idle.c and arch/arm/mach-kirkwood/cpuidle.c Arun R Bharadwaj
2009-08-27 11:57 ` [PATCH 4/4]: CPUIDLE/POWER: Implement Pseries Processor Idle module Arun R Bharadwaj
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).