* [v5 RFC PATCH 0/7]: cpuidle/x86/POWER (REDESIGN): Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER.
@ 2009-09-22  5:33 Arun R Bharadwaj
  2009-09-22  5:35 ` [v5 RFC PATCH 1/7]: cpuidle: cleanup drivers/cpuidle/cpuidle.c Arun R Bharadwaj
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Arun R Bharadwaj @ 2009-09-22  5:33 UTC (permalink / raw)
  To: Joel Schopp, Benjamin Herrenschmidt, Paul Mackerras,
	Peter Zijlstra, Ingo Molnar, Vaidyanathan Srinivasan,
	Dipankar Sarma, Balbir Singh, Gautham R Shenoy, Arun Bharadwaj
  Cc: linuxppc-dev, linux-kernel
Hi,
******** This is an RFC, not for inclusion **********
This patchset introduces cpuidle infrastructure to POWER, prototyping
for pSeries, and also does a major refactoring of current x86 idle
power management and a cleanup of cpuidle infrastructure.
My earlier iterations can be found at:
v4 --> http://lkml.org/lkml/2009/9/1/133
v3 --> http://lkml.org/lkml/2009/8/27/124
v2 --> http://lkml.org/lkml/2009/8/26/233
v1 --> http://lkml.org/lkml/2009/8/19/150
Major Changes in this iteration:
------------------------------------------
* Refactoring x86 idle power management code
        Remove all instances of pm_idle and make cpuidle_idle_call as
        _the_ idle routine in x86. So, cpuidle_idle_call will be
        called from the main idle loop, cpu_idle instead of the
        function pointer pm_idle. Also, pm_idle was used by apm module
        and xen. Change those instances such that they register to
        cpuidle instead.
* Cleanup drivers/cpuidle/cpuidle.c
        Currently, the cpuidle implementation has weakness in the
        framework where an exported pm_idle function pointer is
        manipulated by various subsystem. The proposed framework has
        a registration mechanism to cleanly add and remove new idle
        routines from different subsystems.
* Implement cpuidle for pSeries
        Implement the processor_idle module for pseries, which
        registers idle loops to cpuidle and also cleanup
        arch/powerpc/platforms/pseries/setup.c and remove the
        redundant pseries_dedicated/shared_idle_sleep which is
        implemented in processor_idle.c
        Also, remove all instances of ppc_md.power_save, for the same
        reason as that given for pm_idle.
TODO:
---------------------------------------------
* Currently, the list based approach that I'm using here is not
  working in a clean way on a few x86 platforms which have multiple
  sleep states, leading to kernel panics. So working on resolving
  that.
* ppc_md.power_save has been replaced by cpuidle_idle_call only for
  pseries. So this needs to be done for all POWER platforms so that
  ppc_md.power_save is completely removed.
Patches included in this series:
---------------------------------------------
1/7 - cleanup drivers/cpuidle/cpuidle.c
2/7 - implement a list based approach to register a set of idle
      routines.
3/7 - refactor x86 idle power management code and remove all instances
      of pm_idle.
4/7 - enable cpuidle for POWER.
5/7 - remove dedicate/shared idle loops, which will be moved to
      arch/powerpc/platforms/pseries/processor_idle.c
6/7 - add a default_idle idle loop for POWER.
7/7 - implement pSeries processor idle module.
Any comments on the design is welcome.
--arun
^ permalink raw reply	[flat|nested] 10+ messages in thread
* [v5 RFC PATCH 1/7]: cpuidle: cleanup drivers/cpuidle/cpuidle.c
  2009-09-22  5:33 [v5 RFC PATCH 0/7]: cpuidle/x86/POWER (REDESIGN): Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER Arun R Bharadwaj
@ 2009-09-22  5:35 ` Arun R Bharadwaj
  2009-09-22  5:36 ` [v5 RFC PATCH 2/7]: cpuidle: implement a list based approach to register a set of idle routines Arun R Bharadwaj
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Arun R Bharadwaj @ 2009-09-22  5:35 UTC (permalink / raw)
  To: Joel Schopp, Benjamin Herrenschmidt, Paul Mackerras,
	Peter Zijlstra, Ingo Molnar, Vaidyanathan Srinivasan,
	Dipankar Sarma, Balbir Singh, Gautham R Shenoy, Arun Bharadwaj
  Cc: linuxppc-dev, linux-kernel
* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-09-22 11:03:14]:
This patch cleans up drivers/cpuidle/cpuidle.c
Earlier cpuidle assumed pm_idle as the default idle loop. Break that
assumption and make it more generic. cpuidle_idle_call() which is the
main idle loop of cpuidle is to be called by architectures which have
registered to cpuidle.
Remove routines cpuidle_install/uninstall_idle_handler() and
cpuidle_kick_cpus() which are not needed anymore.
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
 drivers/cpuidle/cpuidle.c  |   59 ++++++---------------------------------------
 drivers/cpuidle/cpuidle.h  |    1 
 drivers/cpuidle/governor.c |    3 --
 include/linux/cpuidle.h    |    3 ++
 4 files changed, 11 insertions(+), 55 deletions(-)
Index: linux.trees.git/drivers/cpuidle/cpuidle.c
===================================================================
--- linux.trees.git.orig/drivers/cpuidle/cpuidle.c
+++ linux.trees.git/drivers/cpuidle/cpuidle.c
@@ -24,20 +24,6 @@ DEFINE_PER_CPU(struct cpuidle_device *, 
 
 DEFINE_MUTEX(cpuidle_lock);
 LIST_HEAD(cpuidle_detected_devices);
-static void (*pm_idle_old)(void);
-
-static int enabled_devices;
-
-#if defined(CONFIG_ARCH_HAS_CPU_IDLE_WAIT)
-static void cpuidle_kick_cpus(void)
-{
-	cpu_idle_wait();
-}
-#elif defined(CONFIG_SMP)
-# error "Arch needs cpu_idle_wait() equivalent here"
-#else /* !CONFIG_ARCH_HAS_CPU_IDLE_WAIT && !CONFIG_SMP */
-static void cpuidle_kick_cpus(void) {}
-#endif
 
 static int __cpuidle_register_device(struct cpuidle_device *dev);
 
@@ -46,7 +32,7 @@ static int __cpuidle_register_device(str
  *
  * NOTE: no locks or semaphores should be used here
  */
-static void cpuidle_idle_call(void)
+void cpuidle_idle_call(void)
 {
 	struct cpuidle_device *dev = __get_cpu_var(cpuidle_devices);
 	struct cpuidle_state *target_state;
@@ -54,13 +40,10 @@ static void cpuidle_idle_call(void)
 
 	/* check if the device is ready */
 	if (!dev || !dev->enabled) {
-		if (pm_idle_old)
-			pm_idle_old();
-		else
 #if defined(CONFIG_ARCH_HAS_DEFAULT_IDLE)
-			default_idle();
+		default_idle();
 #else
-			local_irq_enable();
+		local_irq_enable();
 #endif
 		return;
 	}
@@ -74,7 +57,11 @@ static void cpuidle_idle_call(void)
 	hrtimer_peek_ahead_timers();
 #endif
 	/* ask the governor for the next state */
-	next_state = cpuidle_curr_governor->select(dev);
+	if (dev->state_count > 1)
+		next_state = cpuidle_curr_governor->select(dev);
+	else
+		next_state = 0;
+
 	if (need_resched())
 		return;
 	target_state = &dev->states[next_state];
@@ -94,35 +81,11 @@ static void cpuidle_idle_call(void)
 }
 
 /**
- * cpuidle_install_idle_handler - installs the cpuidle idle loop handler
- */
-void cpuidle_install_idle_handler(void)
-{
-	if (enabled_devices && (pm_idle != cpuidle_idle_call)) {
-		/* Make sure all changes finished before we switch to new idle */
-		smp_wmb();
-		pm_idle = cpuidle_idle_call;
-	}
-}
-
-/**
- * cpuidle_uninstall_idle_handler - uninstalls the cpuidle idle loop handler
- */
-void cpuidle_uninstall_idle_handler(void)
-{
-	if (enabled_devices && pm_idle_old && (pm_idle != pm_idle_old)) {
-		pm_idle = pm_idle_old;
-		cpuidle_kick_cpus();
-	}
-}
-
-/**
  * cpuidle_pause_and_lock - temporarily disables CPUIDLE
  */
 void cpuidle_pause_and_lock(void)
 {
 	mutex_lock(&cpuidle_lock);
-	cpuidle_uninstall_idle_handler();
 }
 
 EXPORT_SYMBOL_GPL(cpuidle_pause_and_lock);
@@ -132,7 +95,6 @@ EXPORT_SYMBOL_GPL(cpuidle_pause_and_lock
  */
 void cpuidle_resume_and_unlock(void)
 {
-	cpuidle_install_idle_handler();
 	mutex_unlock(&cpuidle_lock);
 }
 
@@ -180,7 +142,6 @@ int cpuidle_enable_device(struct cpuidle
 
 	dev->enabled = 1;
 
-	enabled_devices++;
 	return 0;
 
 fail_sysfs:
@@ -211,7 +172,6 @@ void cpuidle_disable_device(struct cpuid
 		cpuidle_curr_governor->disable(dev);
 
 	cpuidle_remove_state_sysfs(dev);
-	enabled_devices--;
 }
 
 EXPORT_SYMBOL_GPL(cpuidle_disable_device);
@@ -303,7 +263,6 @@ int cpuidle_register_device(struct cpuid
 	}
 
 	cpuidle_enable_device(dev);
-	cpuidle_install_idle_handler();
 
 	mutex_unlock(&cpuidle_lock);
 
@@ -382,8 +341,6 @@ static int __init cpuidle_init(void)
 {
 	int ret;
 
-	pm_idle_old = pm_idle;
-
 	ret = cpuidle_add_class_sysfs(&cpu_sysdev_class);
 	if (ret)
 		return ret;
Index: linux.trees.git/drivers/cpuidle/governor.c
===================================================================
--- linux.trees.git.orig/drivers/cpuidle/governor.c
+++ linux.trees.git/drivers/cpuidle/governor.c
@@ -48,8 +48,6 @@ int cpuidle_switch_governor(struct cpuid
 	if (gov == cpuidle_curr_governor)
 		return 0;
 
-	cpuidle_uninstall_idle_handler();
-
 	if (cpuidle_curr_governor) {
 		list_for_each_entry(dev, &cpuidle_detected_devices, device_list)
 			cpuidle_disable_device(dev);
@@ -63,7 +61,6 @@ int cpuidle_switch_governor(struct cpuid
 			return -EINVAL;
 		list_for_each_entry(dev, &cpuidle_detected_devices, device_list)
 			cpuidle_enable_device(dev);
-		cpuidle_install_idle_handler();
 		printk(KERN_INFO "cpuidle: using governor %s\n", gov->name);
 	}
 
Index: linux.trees.git/include/linux/cpuidle.h
===================================================================
--- linux.trees.git.orig/include/linux/cpuidle.h
+++ linux.trees.git/include/linux/cpuidle.h
@@ -112,6 +112,9 @@ static inline int cpuidle_get_last_resid
 	return dev->last_residency;
 }
 
+extern void cpuidle_idle_call(void);
+extern struct cpuidle_driver *cpuidle_curr_driver;
+
 
 /****************************
  * CPUIDLE DRIVER INTERFACE *
Index: linux.trees.git/drivers/cpuidle/cpuidle.h
===================================================================
--- linux.trees.git.orig/drivers/cpuidle/cpuidle.h
+++ linux.trees.git/drivers/cpuidle/cpuidle.h
@@ -9,7 +9,6 @@
 
 /* For internal use only */
 extern struct cpuidle_governor *cpuidle_curr_governor;
-extern struct cpuidle_driver *cpuidle_curr_driver;
 extern struct list_head cpuidle_governors;
 extern struct list_head cpuidle_detected_devices;
 extern struct mutex cpuidle_lock;
^ permalink raw reply	[flat|nested] 10+ messages in thread
* [v5 RFC PATCH 2/7]: cpuidle: implement a list based approach to register a set of idle routines.
  2009-09-22  5:33 [v5 RFC PATCH 0/7]: cpuidle/x86/POWER (REDESIGN): Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER Arun R Bharadwaj
  2009-09-22  5:35 ` [v5 RFC PATCH 1/7]: cpuidle: cleanup drivers/cpuidle/cpuidle.c Arun R Bharadwaj
@ 2009-09-22  5:36 ` Arun R Bharadwaj
  2009-09-22  5:37 ` [v5 RFC PATCH 3/7]: x86: refactor x86 idle power management code and remove all instances of pm_idle Arun R Bharadwaj
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Arun R Bharadwaj @ 2009-09-22  5:36 UTC (permalink / raw)
  To: Joel Schopp, Benjamin Herrenschmidt, Paul Mackerras,
	Peter Zijlstra, Ingo Molnar, Vaidyanathan Srinivasan,
	Dipankar Sarma, Balbir Singh, Gautham R Shenoy, Arun Bharadwaj
  Cc: linuxppc-dev, linux-kernel
* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-09-22 11:03:14]:
Implement a list based registering mechanism for architectures which
have multiple sets of idle routines which are to be registered.
Currently, in x86 it is done by merely setting pm_idle = idle_routine
and managing this pm_idle pointer is messy.
To give an example of how this mechanism works:
In x86, initially, idle routine is selected from the set of poll/mwait/
c1e/default idle loops. So the selected idle loop is registered in cpuidle
as one idle state cpuidle devices. Once ACPI comes up, it registers
another set of idle states on top of this state. Again, suppose a module
registers another set of idle loops, it is added to this list.
This provides a clean way of registering and unregistering idle state
routines.
In the current implementation, pm_idle is set as the current idle routine
being used and the old idle routine has to be maintained and when a module
registers/unregisters an idle routine, confusion arises.
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
 drivers/cpuidle/cpuidle.c |   50 ++++++++++++++++++++++++++++++++++++++++++----
 include/linux/cpuidle.h   |    1 
 2 files changed, 47 insertions(+), 4 deletions(-)
Index: linux.trees.git/drivers/cpuidle/cpuidle.c
===================================================================
--- linux.trees.git.orig/drivers/cpuidle/cpuidle.c
+++ linux.trees.git/drivers/cpuidle/cpuidle.c
@@ -21,6 +21,7 @@
 #include "cpuidle.h"
 
 DEFINE_PER_CPU(struct cpuidle_device *, cpuidle_devices);
+DEFINE_PER_CPU(struct list_head, cpuidle_devices_list);
 
 DEFINE_MUTEX(cpuidle_lock);
 LIST_HEAD(cpuidle_detected_devices);
@@ -100,6 +101,44 @@ void cpuidle_resume_and_unlock(void)
 
 EXPORT_SYMBOL_GPL(cpuidle_resume_and_unlock);
 
+int cpuidle_add_to_list(struct cpuidle_device *dev)
+{
+	int ret, cpu = dev->cpu;
+	struct cpuidle_device *old_dev;
+
+	if (!list_empty(&per_cpu(cpuidle_devices_list, cpu))) {
+		old_dev = list_first_entry(&per_cpu(cpuidle_devices_list, cpu),
+				struct cpuidle_device, percpu_list[cpu]);
+		cpuidle_remove_state_sysfs(old_dev);
+	}
+
+	list_add(&dev->percpu_list[cpu], &per_cpu(cpuidle_devices_list, cpu));
+	ret = cpuidle_add_state_sysfs(dev);
+	return ret;
+}
+
+void cpuidle_remove_from_list(struct cpuidle_device *dev)
+{
+	struct cpuidle_device *temp_dev;
+	struct list_head *pos;
+	int ret, cpu = dev->cpu;
+
+	list_for_each(pos, &per_cpu(cpuidle_devices_list, cpu)) {
+		temp_dev = container_of(pos, struct cpuidle_device,
+					percpu_list[cpu]);
+		if (dev == temp_dev) {
+			list_del(&temp_dev->percpu_list[cpu]);
+			cpuidle_remove_state_sysfs(temp_dev);
+		}
+	}
+
+	if (!list_empty(&per_cpu(cpuidle_devices_list, cpu))) {
+		temp_dev = list_first_entry(&per_cpu(cpuidle_devices_list, cpu),
+				struct cpuidle_device, percpu_list[cpu]);
+				ret = cpuidle_add_state_sysfs(temp_dev);
+	}
+}
+
 /**
  * cpuidle_enable_device - enables idle PM for a CPU
  * @dev: the CPU
@@ -124,7 +163,7 @@ int cpuidle_enable_device(struct cpuidle
 			return ret;
 	}
 
-	if ((ret = cpuidle_add_state_sysfs(dev)))
+	if ((cpuidle_add_to_list(dev)))
 		return ret;
 
 	if (cpuidle_curr_governor->enable &&
@@ -145,7 +184,7 @@ int cpuidle_enable_device(struct cpuidle
 	return 0;
 
 fail_sysfs:
-	cpuidle_remove_state_sysfs(dev);
+	cpuidle_remove_from_list(dev);
 
 	return ret;
 }
@@ -171,7 +210,7 @@ void cpuidle_disable_device(struct cpuid
 	if (cpuidle_curr_governor->disable)
 		cpuidle_curr_governor->disable(dev);
 
-	cpuidle_remove_state_sysfs(dev);
+	cpuidle_remove_from_list(dev);
 }
 
 EXPORT_SYMBOL_GPL(cpuidle_disable_device);
@@ -339,12 +378,15 @@ static inline void latency_notifier_init
  */
 static int __init cpuidle_init(void)
 {
-	int ret;
+	int ret, cpu;
 
 	ret = cpuidle_add_class_sysfs(&cpu_sysdev_class);
 	if (ret)
 		return ret;
 
+	for_each_possible_cpu(cpu)
+		INIT_LIST_HEAD(&per_cpu(cpuidle_devices_list, cpu));
+
 	latency_notifier_init(&cpuidle_latency_notifier);
 
 	return 0;
Index: linux.trees.git/include/linux/cpuidle.h
===================================================================
--- linux.trees.git.orig/include/linux/cpuidle.h
+++ linux.trees.git/include/linux/cpuidle.h
@@ -93,6 +93,7 @@ struct cpuidle_device {
 	struct cpuidle_state	*last_state;
 
 	struct list_head 	device_list;
+	struct list_head	percpu_list[NR_CPUS];
 	struct kobject		kobj;
 	struct completion	kobj_unregister;
 	void			*governor_data;
^ permalink raw reply	[flat|nested] 10+ messages in thread
* [v5 RFC PATCH 3/7]: x86: refactor x86 idle power management code and remove all instances of pm_idle.
  2009-09-22  5:33 [v5 RFC PATCH 0/7]: cpuidle/x86/POWER (REDESIGN): Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER Arun R Bharadwaj
  2009-09-22  5:35 ` [v5 RFC PATCH 1/7]: cpuidle: cleanup drivers/cpuidle/cpuidle.c Arun R Bharadwaj
  2009-09-22  5:36 ` [v5 RFC PATCH 2/7]: cpuidle: implement a list based approach to register a set of idle routines Arun R Bharadwaj
@ 2009-09-22  5:37 ` Arun R Bharadwaj
  2009-09-22  5:38 ` [v5 RFC PATCH 4/7]: POWER: enable cpuidle for POWER Arun R Bharadwaj
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Arun R Bharadwaj @ 2009-09-22  5:37 UTC (permalink / raw)
  To: Joel Schopp, Benjamin Herrenschmidt, Paul Mackerras,
	Peter Zijlstra, Ingo Molnar, Vaidyanathan Srinivasan,
	Dipankar Sarma, Balbir Singh, Gautham R Shenoy, Arun Bharadwaj
  Cc: linuxppc-dev, linux-kernel
* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-09-22 11:03:14]:
This patch cleans up x86 of all instances of pm_idle.
pm_idle which was earlier called from cpu_idle() idle loop
is replaced by cpuidle_idle_call.
x86 also registers to cpuidle when the idle routine is selected,
by populating the cpuidle_device data structure for each cpu.
This is replicated for apm module and for xen, which also used pm_idle.
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
 arch/x86/kernel/apm_32.c     |   37 +++++++++++++++++++++--
 arch/x86/kernel/process.c    |   69 ++++++++++++++++++++++++++++++++++---------
 arch/x86/kernel/process_32.c |    3 +
 arch/x86/kernel/process_64.c |    3 +
 arch/x86/xen/setup.c         |   22 +++++++++++++
 5 files changed, 114 insertions(+), 20 deletions(-)
Index: linux.trees.git/arch/x86/kernel/process.c
===================================================================
--- linux.trees.git.orig/arch/x86/kernel/process.c
+++ linux.trees.git/arch/x86/kernel/process.c
@@ -9,6 +9,8 @@
 #include <linux/pm.h>
 #include <linux/clockchips.h>
 #include <linux/random.h>
+#include <linux/cpuidle.h>
+
 #include <trace/power.h>
 #include <asm/system.h>
 #include <asm/apic.h>
@@ -247,12 +249,6 @@ int sys_vfork(struct pt_regs *regs)
 unsigned long boot_option_idle_override = 0;
 EXPORT_SYMBOL(boot_option_idle_override);
 
-/*
- * Powermanagement idle function, if any..
- */
-void (*pm_idle)(void);
-EXPORT_SYMBOL(pm_idle);
-
 #ifdef CONFIG_X86_32
 /*
  * This halt magic was a workaround for ancient floppy DMA
@@ -531,15 +527,58 @@ static void c1e_idle(void)
 		default_idle();
 }
 
+static void (*local_idle)(void);
+DEFINE_PER_CPU(struct cpuidle_device, idle_devices);
+
+struct cpuidle_driver cpuidle_default_driver = {
+	.name =         "cpuidle_default",
+};
+
+static int local_idle_loop(struct cpuidle_device *dev, struct cpuidle_state *st)
+{
+	ktime_t t1, t2;
+	s64 diff;
+	int ret;
+
+	t1 = ktime_get();
+	local_idle();
+	t2 = ktime_get();
+
+	diff = ktime_to_us(ktime_sub(t2, t1));
+	if (diff > INT_MAX)
+		diff = INT_MAX;
+	ret = (int) diff;
+
+	return ret;
+}
+static int __cpuinit setup_cpuidle_simple(void)
+{
+	struct cpuidle_device *dev;
+	int cpu;
+
+	if (!cpuidle_curr_driver)
+		cpuidle_register_driver(&cpuidle_default_driver);
+
+	for_each_online_cpu(cpu) {
+		dev = &per_cpu(idle_devices, cpu);
+		dev->cpu = cpu;
+		dev->states[0].enter = local_idle_loop;
+		dev->state_count = 1;
+		cpuidle_register_device(dev);
+	}
+	return 0;
+}
+late_initcall(setup_cpuidle_simple);
+
 void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_SMP
-	if (pm_idle == poll_idle && smp_num_siblings > 1) {
+	if (local_idle == poll_idle && smp_num_siblings > 1) {
 		printk(KERN_WARNING "WARNING: polling idle and HT enabled,"
 			" performance may degrade.\n");
 	}
 #endif
-	if (pm_idle)
+	if (local_idle)
 		return;
 
 	if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
@@ -547,18 +586,20 @@ void __cpuinit select_idle_routine(const
 		 * One CPU supports mwait => All CPUs supports mwait
 		 */
 		printk(KERN_INFO "using mwait in idle threads.\n");
-		pm_idle = mwait_idle;
+		local_idle = mwait_idle;
 	} else if (check_c1e_idle(c)) {
 		printk(KERN_INFO "using C1E aware idle routine\n");
-		pm_idle = c1e_idle;
+		local_idle = c1e_idle;
 	} else
-		pm_idle = default_idle;
+		local_idle = default_idle;
+
+	return;
 }
 
 void __init init_c1e_mask(void)
 {
 	/* If we're using c1e_idle, we need to allocate c1e_mask. */
-	if (pm_idle == c1e_idle) {
+	if (local_idle == c1e_idle) {
 		alloc_cpumask_var(&c1e_mask, GFP_KERNEL);
 		cpumask_clear(c1e_mask);
 	}
@@ -571,7 +612,7 @@ static int __init idle_setup(char *str)
 
 	if (!strcmp(str, "poll")) {
 		printk("using polling idle threads.\n");
-		pm_idle = poll_idle;
+		local_idle = poll_idle;
 	} else if (!strcmp(str, "mwait"))
 		force_mwait = 1;
 	else if (!strcmp(str, "halt")) {
@@ -582,7 +623,7 @@ static int __init idle_setup(char *str)
 		 * To continue to load the CPU idle driver, don't touch
 		 * the boot_option_idle_override.
 		 */
-		pm_idle = default_idle;
+		local_idle = default_idle;
 		idle_halt = 1;
 		return 0;
 	} else if (!strcmp(str, "nomwait")) {
Index: linux.trees.git/arch/x86/kernel/process_32.c
===================================================================
--- linux.trees.git.orig/arch/x86/kernel/process_32.c
+++ linux.trees.git/arch/x86/kernel/process_32.c
@@ -40,6 +40,7 @@
 #include <linux/uaccess.h>
 #include <linux/io.h>
 #include <linux/kdebug.h>
+#include <linux/cpuidle.h>
 
 #include <asm/pgtable.h>
 #include <asm/system.h>
@@ -113,7 +114,7 @@ void cpu_idle(void)
 			local_irq_disable();
 			/* Don't trace irqs off for idle */
 			stop_critical_timings();
-			pm_idle();
+			cpuidle_idle_call();
 			start_critical_timings();
 		}
 		tick_nohz_restart_sched_tick();
Index: linux.trees.git/arch/x86/kernel/process_64.c
===================================================================
--- linux.trees.git.orig/arch/x86/kernel/process_64.c
+++ linux.trees.git/arch/x86/kernel/process_64.c
@@ -39,6 +39,7 @@
 #include <linux/io.h>
 #include <linux/ftrace.h>
 #include <linux/dmi.h>
+#include <linux/cpuidle.h>
 
 #include <asm/pgtable.h>
 #include <asm/system.h>
@@ -142,7 +143,7 @@ void cpu_idle(void)
 			enter_idle();
 			/* Don't trace irqs off for idle */
 			stop_critical_timings();
-			pm_idle();
+			cpuidle_idle_call();
 			start_critical_timings();
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
Index: linux.trees.git/arch/x86/kernel/apm_32.c
===================================================================
--- linux.trees.git.orig/arch/x86/kernel/apm_32.c
+++ linux.trees.git/arch/x86/kernel/apm_32.c
@@ -2257,6 +2257,38 @@ static struct dmi_system_id __initdata a
 	{ }
 };
 
+DEFINE_PER_CPU(struct cpuidle_device, apm_idle_devices);
+
+struct cpuidle_driver cpuidle_apm_driver = {
+	.name =         "cpuidle_apm",
+};
+
+void __cpuinit setup_cpuidle_apm(void)
+{
+	struct cpuidle_device *dev;
+
+	if (!cpuidle_curr_driver)
+		cpuidle_register_driver(&cpuidle_apm_driver);
+
+	dev = &per_cpu(apm_idle_devices, smp_processor_id());
+	dev->cpu = smp_processor_id();
+	dev->states[0].enter = apm_cpu_idle;
+	dev->state_count = 1;
+	cpuidle_register_device(dev);
+}
+
+void exit_cpuidle_apm(void)
+{
+	struct cpuidle_device *dev;
+	int cpu;
+
+	for_each_online_cpu(cpu) {
+		dev = &per_cpu(apm_idle_devices, cpu);
+		cpuidle_unregister_device(dev);
+	}
+}
+
+
 /*
  * Just start the APM thread. We do NOT want to do APM BIOS
  * calls from anything but the APM thread, if for no other reason
@@ -2394,8 +2426,7 @@ static int __init apm_init(void)
 	if (HZ != 100)
 		idle_period = (idle_period * HZ) / 100;
 	if (idle_threshold < 100) {
-		original_pm_idle = pm_idle;
-		pm_idle  = apm_cpu_idle;
+		setup_cpuidle_apm();
 		set_pm_idle = 1;
 	}
 
@@ -2407,7 +2438,7 @@ static void __exit apm_exit(void)
 	int error;
 
 	if (set_pm_idle) {
-		pm_idle = original_pm_idle;
+		exit_cpuidle_apm();
 		/*
 		 * We are about to unload the current idle thread pm callback
 		 * (pm_idle), Wait for all processors to update cached/local
Index: linux.trees.git/arch/x86/xen/setup.c
===================================================================
--- linux.trees.git.orig/arch/x86/xen/setup.c
+++ linux.trees.git/arch/x86/xen/setup.c
@@ -8,6 +8,7 @@
 #include <linux/sched.h>
 #include <linux/mm.h>
 #include <linux/pm.h>
+#include <linux/cpuidle.h>
 
 #include <asm/elf.h>
 #include <asm/vdso.h>
@@ -151,6 +152,25 @@ void __cpuinit xen_enable_syscall(void)
 #endif /* CONFIG_X86_64 */
 }
 
+DEFINE_PER_CPU(struct cpuidle_device, idle_devices);
+struct cpuidle_driver cpuidle_xen_driver = {
+	.name =         "cpuidle_xen",
+};
+
+void __cpuinit setup_cpuidle_xen(void)
+{
+	struct cpuidle_device *dev;
+
+	if (!cpuidle_curr_driver)
+		cpuidle_register_driver(&cpuidle_xen_driver);
+
+	dev = &per_cpu(idle_devices, smp_processor_id());
+	dev->cpu = smp_processor_id();
+	dev->states[0].enter = xen_idle;
+	dev->state_count = 1;
+	cpuidle_register_device(dev);
+}
+
 void __init xen_arch_setup(void)
 {
 	struct physdev_set_iopl set_iopl;
@@ -186,7 +206,7 @@ void __init xen_arch_setup(void)
 	       MAX_GUEST_CMDLINE > COMMAND_LINE_SIZE ?
 	       COMMAND_LINE_SIZE : MAX_GUEST_CMDLINE);
 
-	pm_idle = xen_idle;
+	setup_cpuidle_xen();
 
 	paravirt_disable_iospace();
 
^ permalink raw reply	[flat|nested] 10+ messages in thread
* [v5 RFC PATCH 4/7]: POWER: enable cpuidle for POWER.
  2009-09-22  5:33 [v5 RFC PATCH 0/7]: cpuidle/x86/POWER (REDESIGN): Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER Arun R Bharadwaj
                   ` (2 preceding siblings ...)
  2009-09-22  5:37 ` [v5 RFC PATCH 3/7]: x86: refactor x86 idle power management code and remove all instances of pm_idle Arun R Bharadwaj
@ 2009-09-22  5:38 ` Arun R Bharadwaj
  2009-09-22  5:39 ` [v5 RFC PATCH 5/7]: POWER/pSeries: remove dedicate/shared idle loops, which will be moved to arch/powerpc/platforms/pseries/processor_idle.c Arun R Bharadwaj
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Arun R Bharadwaj @ 2009-09-22  5:38 UTC (permalink / raw)
  To: Joel Schopp, Benjamin Herrenschmidt, Paul Mackerras,
	Peter Zijlstra, Ingo Molnar, Vaidyanathan Srinivasan,
	Dipankar Sarma, Balbir Singh, Gautham R Shenoy, Arun Bharadwaj
  Cc: linuxppc-dev, linux-kernel
* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-09-22 11:03:14]:
This patch enables the cpuidle option in Kconfig for pSeries.
Currently cpuidle infrastructure is enabled only for x86 and ARM.
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
 arch/powerpc/Kconfig |   14 ++++++++++++++
 1 file changed, 14 insertions(+)
Index: linux.trees.git/arch/powerpc/Kconfig
===================================================================
--- linux.trees.git.orig/arch/powerpc/Kconfig
+++ linux.trees.git/arch/powerpc/Kconfig
@@ -243,6 +243,20 @@ source "kernel/Kconfig.freezer"
 source "arch/powerpc/sysdev/Kconfig"
 source "arch/powerpc/platforms/Kconfig"
 
+menu "Power management options"
+
+source "drivers/cpuidle/Kconfig"
+
+config PSERIES_PROCESSOR_IDLE
+	bool "Idle Power Management Support for pSeries"
+	depends on PPC_PSERIES && CPU_IDLE
+	default y
+	help
+	  Idle Power Management Support for pSeries. This hooks onto cpuidle
+	  infrastructure to help in idle cpu power management.
+
+endmenu
+
 menu "Kernel options"
 
 config HIGHMEM
^ permalink raw reply	[flat|nested] 10+ messages in thread
* [v5 RFC PATCH 5/7]: POWER/pSeries: remove dedicate/shared idle loops, which will be moved to arch/powerpc/platforms/pseries/processor_idle.c
  2009-09-22  5:33 [v5 RFC PATCH 0/7]: cpuidle/x86/POWER (REDESIGN): Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER Arun R Bharadwaj
                   ` (3 preceding siblings ...)
  2009-09-22  5:38 ` [v5 RFC PATCH 4/7]: POWER: enable cpuidle for POWER Arun R Bharadwaj
@ 2009-09-22  5:39 ` Arun R Bharadwaj
  2009-09-22  5:40 ` [v5 RFC PATCH 6/7]: POWER: add a default_idle idle loop for POWER Arun R Bharadwaj
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Arun R Bharadwaj @ 2009-09-22  5:39 UTC (permalink / raw)
  To: Joel Schopp, Benjamin Herrenschmidt, Paul Mackerras,
	Peter Zijlstra, Ingo Molnar, Vaidyanathan Srinivasan,
	Dipankar Sarma, Balbir Singh, Gautham R Shenoy, Arun Bharadwaj
  Cc: linuxppc-dev, linux-kernel
* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-09-22 11:03:14]:
This patch removes the routines, pseries_shared_idle_sleep and
pseries_dedicated_idle_sleep, since this is implemented as a part
of arch/powerpc/platform/pseries/processor_idle.c
Also, similar to x86, call cpuidle_idle_call from cpu_idle() idle
loop instead of ppc_md.power_save.
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/idle.c             |   50 +++++++-----------
 arch/powerpc/platforms/pseries/setup.c |   89 ---------------------------------
 2 files changed, 22 insertions(+), 117 deletions(-)
Index: linux.trees.git/arch/powerpc/platforms/pseries/setup.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/pseries/setup.c
+++ linux.trees.git/arch/powerpc/platforms/pseries/setup.c
@@ -75,9 +75,6 @@ EXPORT_SYMBOL(CMO_PageSize);
 
 int fwnmi_active;  /* TRUE if an FWNMI handler is present */
 
-static void pseries_shared_idle_sleep(void);
-static void pseries_dedicated_idle_sleep(void);
-
 static struct device_node *pSeries_mpic_node;
 
 static void pSeries_show_cpuinfo(struct seq_file *m)
@@ -301,18 +298,8 @@ static void __init pSeries_setup_arch(vo
 	pSeries_nvram_init();
 
 	/* Choose an idle loop */
-	if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
+	if (firmware_has_feature(FW_FEATURE_SPLPAR))
 		vpa_init(boot_cpuid);
-		if (get_lppaca()->shared_proc) {
-			printk(KERN_DEBUG "Using shared processor idle loop\n");
-			ppc_md.power_save = pseries_shared_idle_sleep;
-		} else {
-			printk(KERN_DEBUG "Using dedicated idle loop\n");
-			ppc_md.power_save = pseries_dedicated_idle_sleep;
-		}
-	} else {
-		printk(KERN_DEBUG "Using default idle loop\n");
-	}
 
 	if (firmware_has_feature(FW_FEATURE_LPAR))
 		ppc_md.enable_pmcs = pseries_lpar_enable_pmcs;
@@ -500,80 +487,6 @@ static int __init pSeries_probe(void)
 	return 1;
 }
 
-
-DECLARE_PER_CPU(unsigned long, smt_snooze_delay);
-
-static void pseries_dedicated_idle_sleep(void)
-{ 
-	unsigned int cpu = smp_processor_id();
-	unsigned long start_snooze;
-	unsigned long in_purr, out_purr;
-
-	/*
-	 * Indicate to the HV that we are idle. Now would be
-	 * a good time to find other work to dispatch.
-	 */
-	get_lppaca()->idle = 1;
-	get_lppaca()->donate_dedicated_cpu = 1;
-	in_purr = mfspr(SPRN_PURR);
-
-	/*
-	 * We come in with interrupts disabled, and need_resched()
-	 * has been checked recently.  If we should poll for a little
-	 * while, do so.
-	 */
-	if (__get_cpu_var(smt_snooze_delay)) {
-		start_snooze = get_tb() +
-			__get_cpu_var(smt_snooze_delay) * tb_ticks_per_usec;
-		local_irq_enable();
-		set_thread_flag(TIF_POLLING_NRFLAG);
-
-		while (get_tb() < start_snooze) {
-			if (need_resched() || cpu_is_offline(cpu))
-				goto out;
-			ppc64_runlatch_off();
-			HMT_low();
-			HMT_very_low();
-		}
-
-		HMT_medium();
-		clear_thread_flag(TIF_POLLING_NRFLAG);
-		smp_mb();
-		local_irq_disable();
-		if (need_resched() || cpu_is_offline(cpu))
-			goto out;
-	}
-
-	cede_processor();
-
-out:
-	HMT_medium();
-	out_purr = mfspr(SPRN_PURR);
-	get_lppaca()->wait_state_cycles += out_purr - in_purr;
-	get_lppaca()->donate_dedicated_cpu = 0;
-	get_lppaca()->idle = 0;
-}
-
-static void pseries_shared_idle_sleep(void)
-{
-	/*
-	 * Indicate to the HV that we are idle. Now would be
-	 * a good time to find other work to dispatch.
-	 */
-	get_lppaca()->idle = 1;
-
-	/*
-	 * Yield the processor to the hypervisor.  We return if
-	 * an external interrupt occurs (which are driven prior
-	 * to returning here) or if a prod occurs from another
-	 * processor. When returning here, external interrupts
-	 * are enabled.
-	 */
-	cede_processor();
-
-	get_lppaca()->idle = 0;
-}
-
 static int pSeries_pci_probe_mode(struct pci_bus *bus)
 {
 	if (firmware_has_feature(FW_FEATURE_LPAR))
Index: linux.trees.git/arch/powerpc/kernel/idle.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/kernel/idle.c
+++ linux.trees.git/arch/powerpc/kernel/idle.c
@@ -25,6 +25,7 @@
 #include <linux/cpu.h>
 #include <linux/sysctl.h>
 #include <linux/tick.h>
+#include <linux/cpuidle.h>
 
 #include <asm/system.h>
 #include <asm/processor.h>
@@ -60,35 +61,26 @@ void cpu_idle(void)
 		while (!need_resched() && !cpu_should_die()) {
 			ppc64_runlatch_off();
 
-			if (ppc_md.power_save) {
-				clear_thread_flag(TIF_POLLING_NRFLAG);
-				/*
-				 * smp_mb is so clearing of TIF_POLLING_NRFLAG
-				 * is ordered w.r.t. need_resched() test.
-				 */
-				smp_mb();
-				local_irq_disable();
-
-				/* Don't trace irqs off for idle */
-				stop_critical_timings();
-
-				/* check again after disabling irqs */
-				if (!need_resched() && !cpu_should_die())
-					ppc_md.power_save();
-
-				start_critical_timings();
-
-				local_irq_enable();
-				set_thread_flag(TIF_POLLING_NRFLAG);
-
-			} else {
-				/*
-				 * Go into low thread priority and possibly
-				 * low power mode.
-				 */
-				HMT_low();
-				HMT_very_low();
-			}
+			clear_thread_flag(TIF_POLLING_NRFLAG);
+			/*
+			 * smp_mb is so clearing of TIF_POLLING_NRFLAG
+			 * is ordered w.r.t. need_resched() test.
+			 */
+			smp_mb();
+			local_irq_disable();
+
+			/* Don't trace irqs off for idle */
+			stop_critical_timings();
+
+			/* check again after disabling irqs */
+			if (!need_resched() && !cpu_should_die())
+				cpuidle_idle_call();
+
+			start_critical_timings();
+
+			local_irq_enable();
+			set_thread_flag(TIF_POLLING_NRFLAG);
+
 		}
 
 		HMT_medium();
^ permalink raw reply	[flat|nested] 10+ messages in thread
* [v5 RFC PATCH 6/7]: POWER: add a default_idle idle loop for POWER.
  2009-09-22  5:33 [v5 RFC PATCH 0/7]: cpuidle/x86/POWER (REDESIGN): Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER Arun R Bharadwaj
                   ` (4 preceding siblings ...)
  2009-09-22  5:39 ` [v5 RFC PATCH 5/7]: POWER/pSeries: remove dedicate/shared idle loops, which will be moved to arch/powerpc/platforms/pseries/processor_idle.c Arun R Bharadwaj
@ 2009-09-22  5:40 ` Arun R Bharadwaj
  2009-09-22  5:41 ` [v5 RFC PATCH 7/7]: POWER/pSeries: implement pSeries processor idle module Arun R Bharadwaj
  2009-09-22  7:25 ` [v5 RFC PATCH 0/7]: cpuidle/x86/POWER (REDESIGN): Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER Peter Zijlstra
  7 siblings, 0 replies; 10+ messages in thread
From: Arun R Bharadwaj @ 2009-09-22  5:40 UTC (permalink / raw)
  To: Joel Schopp, Benjamin Herrenschmidt, Paul Mackerras,
	Peter Zijlstra, Ingo Molnar, Vaidyanathan Srinivasan,
	Dipankar Sarma, Balbir Singh, Gautham R Shenoy, Arun Bharadwaj
  Cc: linuxppc-dev, linux-kernel
* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-09-22 11:03:14]:
In arch/powerpc/kernel/idle.c create a default_idle() routine by moving
the failover condition of the cpu_idle() idle loop. This is needed by
cpuidle infrastructure to call default_idle when other idle routines
are not yet registered. Functionality remains the same, but the code is
slightly moved around.
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
 arch/powerpc/Kconfig              |    3 +++
 arch/powerpc/include/asm/system.h |    1 +
 arch/powerpc/kernel/idle.c        |    6 ++++++
 3 files changed, 10 insertions(+)
Index: linux.trees.git/arch/powerpc/Kconfig
===================================================================
--- linux.trees.git.orig/arch/powerpc/Kconfig
+++ linux.trees.git/arch/powerpc/Kconfig
@@ -88,6 +88,9 @@ config ARCH_HAS_ILOG2_U64
 	bool
 	default y if 64BIT
 
+config ARCH_HAS_DEFAULT_IDLE
+	def_bool y
+
 config GENERIC_HWEIGHT
 	bool
 	default y
Index: linux.trees.git/arch/powerpc/include/asm/system.h
===================================================================
--- linux.trees.git.orig/arch/powerpc/include/asm/system.h
+++ linux.trees.git/arch/powerpc/include/asm/system.h
@@ -218,6 +218,7 @@ extern unsigned long klimit;
 extern void *alloc_maybe_bootmem(size_t size, gfp_t mask);
 extern void *zalloc_maybe_bootmem(size_t size, gfp_t mask);
 
+extern void default_idle(void);
 extern int powersave_nap;	/* set if nap mode can be used in idle loop */
 
 /*
Index: linux.trees.git/arch/powerpc/kernel/idle.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/kernel/idle.c
+++ linux.trees.git/arch/powerpc/kernel/idle.c
@@ -94,6 +94,12 @@ void cpu_idle(void)
 	}
 }
 
+void default_idle(void)
+{
+	HMT_low();
+	HMT_very_low();
+}
+
 int powersave_nap;
 
 #ifdef CONFIG_SYSCTL
^ permalink raw reply	[flat|nested] 10+ messages in thread
* [v5 RFC PATCH 7/7]: POWER/pSeries: implement pSeries processor idle module.
  2009-09-22  5:33 [v5 RFC PATCH 0/7]: cpuidle/x86/POWER (REDESIGN): Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER Arun R Bharadwaj
                   ` (5 preceding siblings ...)
  2009-09-22  5:40 ` [v5 RFC PATCH 6/7]: POWER: add a default_idle idle loop for POWER Arun R Bharadwaj
@ 2009-09-22  5:41 ` Arun R Bharadwaj
  2009-09-22  7:25 ` [v5 RFC PATCH 0/7]: cpuidle/x86/POWER (REDESIGN): Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER Peter Zijlstra
  7 siblings, 0 replies; 10+ messages in thread
From: Arun R Bharadwaj @ 2009-09-22  5:41 UTC (permalink / raw)
  To: Joel Schopp, Benjamin Herrenschmidt, Paul Mackerras,
	Peter Zijlstra, Ingo Molnar, Vaidyanathan Srinivasan,
	Dipankar Sarma, Balbir Singh, Gautham R Shenoy, Arun Bharadwaj
  Cc: linuxppc-dev, linux-kernel
* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-09-22 11:03:14]:
This patch creates arch/powerpc/platforms/pseries/processor_idle.c,
which implements the cpuidle infrastructure for pseries.
It implements a pseries_cpuidle_loop() which would be the main idle loop
called from cpu_idle(). It makes decision of entering either cede1 or cede2
for dedicated lpar and shared_cede for shared lpar processor based on the
decision taken by the cpuidle governor.
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/Makefile         |    1 
 arch/powerpc/platforms/pseries/processor_idle.c |  191 ++++++++++++++++++++++++
 arch/powerpc/platforms/pseries/pseries.h        |    9 +
 arch/powerpc/platforms/pseries/setup.c          |    8 -
 4 files changed, 207 insertions(+), 2 deletions(-)
Index: linux.trees.git/arch/powerpc/platforms/pseries/Makefile
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/pseries/Makefile
+++ linux.trees.git/arch/powerpc/platforms/pseries/Makefile
@@ -26,3 +26,4 @@ obj-$(CONFIG_HCALL_STATS)	+= hvCall_inst
 obj-$(CONFIG_PHYP_DUMP)	+= phyp_dump.o
 obj-$(CONFIG_CMM)		+= cmm.o
 obj-$(CONFIG_DTL)		+= dtl.o
+obj-$(CONFIG_PSERIES_PROCESSOR_IDLE)	+= processor_idle.o
Index: linux.trees.git/arch/powerpc/platforms/pseries/pseries.h
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/pseries/pseries.h
+++ linux.trees.git/arch/powerpc/platforms/pseries/pseries.h
@@ -10,6 +10,8 @@
 #ifndef _PSERIES_PSERIES_H
 #define _PSERIES_PSERIES_H
 
+#include <linux/cpuidle.h>
+
 extern void __init fw_feature_init(const char *hypertas, unsigned long len);
 
 struct pt_regs;
@@ -40,4 +42,11 @@ extern unsigned long rtas_poweron_auto;
 
 extern void find_udbg_vterm(void);
 
+DECLARE_PER_CPU(unsigned long, smt_snooze_delay);
+
+#ifdef CONFIG_PSERIES_PROCESSOR_IDLE
+int pseries_processor_idle_init(void);
+extern struct cpuidle_driver pseries_idle_driver;
+#endif
+
 #endif /* _PSERIES_PSERIES_H */
Index: linux.trees.git/arch/powerpc/platforms/pseries/processor_idle.c
===================================================================
--- /dev/null
+++ linux.trees.git/arch/powerpc/platforms/pseries/processor_idle.c
@@ -0,0 +1,191 @@
+/*
+ *  processor_idle - idle state cpuidle driver.
+ *  Adapted from drivers/acpi/processor_idle.c
+ *
+ *  Arun R Bharadwaj <arun@linux.vnet.ibm.com>
+ *
+ *  Copyright (C) 2009 IBM Corporation.
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or (at
+ *  your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *  General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License along
+ *  with this program; if not, write to the Free Software Foundation, Inc.,
+ *  59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/moduleparam.h>
+#include <linux/cpuidle.h>
+
+#include <asm/paca.h>
+#include <asm/reg.h>
+#include <asm/system.h>
+#include <asm/machdep.h>
+#include <asm/firmware.h>
+
+#include "plpar_wrappers.h"
+#include "pseries.h"
+
+MODULE_AUTHOR("Arun R Bharadwaj");
+MODULE_DESCRIPTION("pSeries Idle State Driver");
+MODULE_LICENSE("GPL");
+
+struct cpuidle_driver pseries_idle_driver = {
+	.name =		"pseries_idle",
+	.owner =	THIS_MODULE,
+};
+
+DEFINE_PER_CPU(struct cpuidle_device, pseries_dev);
+
+#define IDLE_STATE_COUNT	2
+
+static int pseries_idle_init(struct cpuidle_device *dev)
+{
+	return cpuidle_register_device(dev);
+}
+
+static void shared_cede(void)
+{
+	get_lppaca()->idle = 1;
+	cede_processor();
+	get_lppaca()->idle = 0;
+}
+
+static void cede1(void)
+{
+	local_irq_enable();
+	set_thread_flag(TIF_POLLING_NRFLAG);
+	while (!need_resched()) {
+		ppc64_runlatch_off();
+		HMT_low();
+		HMT_very_low();
+	}
+	HMT_medium();
+	clear_thread_flag(TIF_POLLING_NRFLAG);
+	smp_mb();
+	local_irq_disable();
+}
+
+static void cede2(void)
+{
+	ppc64_runlatch_off();
+	HMT_medium();
+	cede_processor();
+}
+
+static int pseries_cpuidle_loop(struct cpuidle_device *dev,
+				struct cpuidle_state *st)
+{
+	ktime_t t1, t2;
+	s64 diff;
+	int ret;
+	unsigned long in_purr, out_purr;
+
+	get_lppaca()->idle = 1;
+	get_lppaca()->donate_dedicated_cpu = 1;
+	in_purr = mfspr(SPRN_PURR);
+
+	t1 = ktime_get();
+
+	if (strcmp(st->desc, "shared_cede") == 0)
+		shared_cede();
+	else if (strcmp(st->desc, "cede1") == 0)
+		cede1();
+	else
+		cede2();
+
+	t2 = ktime_get();
+	diff = ktime_to_us(ktime_sub(t2, t1));
+	if (diff > INT_MAX)
+		diff = INT_MAX;
+
+	ret = (int) diff;
+
+	out_purr = mfspr(SPRN_PURR);
+	get_lppaca()->wait_state_cycles += out_purr - in_purr;
+	get_lppaca()->donate_dedicated_cpu = 0;
+	get_lppaca()->idle = 0;
+
+	return ret;
+}
+
+static int pseries_setup_cpuidle(struct cpuidle_device *dev, int cpu)
+{
+	int i;
+	struct cpuidle_state *state;
+
+	dev->cpu = cpu;
+
+	if (get_lppaca()->shared_proc) {
+		state = &dev->states[0];
+		snprintf(state->name, CPUIDLE_NAME_LEN, "IDLE");
+		state->enter = pseries_cpuidle_loop;
+		strncpy(state->desc, "shared_cede", CPUIDLE_DESC_LEN);
+		state->exit_latency = 0;
+		state->target_residency = 0;
+		return 0;
+	}
+
+	for (i = 0; i < IDLE_STATE_COUNT; i++) {
+		state = &dev->states[i];
+
+		snprintf(state->name, CPUIDLE_NAME_LEN, "IDLE%d", i);
+		state->enter = pseries_cpuidle_loop;
+
+		switch (i) {
+		case 0:
+			strncpy(state->desc, "cede1", CPUIDLE_DESC_LEN);
+			state->exit_latency = 0;
+			state->target_residency = 0;
+			break;
+
+		case 1:
+			strncpy(state->desc, "cede2", CPUIDLE_DESC_LEN);
+			state->exit_latency = 1;
+			state->target_residency =
+					__get_cpu_var(smt_snooze_delay);
+			break;
+		}
+	}
+	dev->state_count = IDLE_STATE_COUNT;
+
+	return 0;
+}
+
+int __init pseries_processor_idle_init(void)
+{
+	int cpu;
+	int result = cpuidle_register_driver(&pseries_idle_driver);
+
+	if (result < 0)
+		return result;
+
+	printk(KERN_DEBUG "pSeries idle driver registered\n");
+
+	if (!firmware_has_feature(FW_FEATURE_SPLPAR)) {
+		printk(KERN_DEBUG "Using default idle\n");
+		return 0;
+	}
+
+	for_each_online_cpu(cpu) {
+		pseries_setup_cpuidle(&per_cpu(pseries_dev, cpu), cpu);
+		pseries_idle_init(&per_cpu(pseries_dev, cpu));
+	}
+
+	printk(KERN_DEBUG "Using cpuidle idle loop\n");
+
+	return 0;
+}
Index: linux.trees.git/arch/powerpc/platforms/pseries/setup.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/platforms/pseries/setup.c
+++ linux.trees.git/arch/powerpc/platforms/pseries/setup.c
@@ -297,9 +297,13 @@ static void __init pSeries_setup_arch(vo
 
 	pSeries_nvram_init();
 
-	/* Choose an idle loop */
-	if (firmware_has_feature(FW_FEATURE_SPLPAR))
+	/* Register an idle loop with cpuidle */
+	if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
 		vpa_init(boot_cpuid);
+#ifdef CONFIG_PSERIES_PROCESSOR_IDLE
+		pseries_processor_idle_init();
+#endif
+	}
 
 	if (firmware_has_feature(FW_FEATURE_LPAR))
 		ppc_md.enable_pmcs = pseries_lpar_enable_pmcs;
^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [v5 RFC PATCH 0/7]: cpuidle/x86/POWER (REDESIGN): Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER.
  2009-09-22  5:33 [v5 RFC PATCH 0/7]: cpuidle/x86/POWER (REDESIGN): Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER Arun R Bharadwaj
                   ` (6 preceding siblings ...)
  2009-09-22  5:41 ` [v5 RFC PATCH 7/7]: POWER/pSeries: implement pSeries processor idle module Arun R Bharadwaj
@ 2009-09-22  7:25 ` Peter Zijlstra
  2009-09-22  8:56   ` Arun R Bharadwaj
  7 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2009-09-22  7:25 UTC (permalink / raw)
  To: arun
  Cc: Gautham R Shenoy, linux-kernel, Paul Mackerras, Ingo Molnar,
	linuxppc-dev
Much better :-)
But I'm puzzled by all the per-cpu-ish-ness of the stuff. Why would you
need to register things on a per-cpu basis?
Also:
+       list_for_each(pos, &per_cpu(cpuidle_devices_list, cpu)) {
+               temp_dev = container_of(pos, struct cpuidle_device,
+                                       percpu_list[cpu]);
+               if (dev == temp_dev) {
+                       list_del(&temp_dev->percpu_list[cpu]);
+                       cpuidle_remove_state_sysfs(temp_dev);
+               }
+       }
Looks buggy, either you want to break out of the loop on dev ==
temp_dev, or you want to use list_for_each_safe().
^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [v5 RFC PATCH 0/7]: cpuidle/x86/POWER (REDESIGN): Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER.
  2009-09-22  7:25 ` [v5 RFC PATCH 0/7]: cpuidle/x86/POWER (REDESIGN): Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER Peter Zijlstra
@ 2009-09-22  8:56   ` Arun R Bharadwaj
  0 siblings, 0 replies; 10+ messages in thread
From: Arun R Bharadwaj @ 2009-09-22  8:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gautham R Shenoy, linux-kernel, Paul Mackerras, Arun Bharadwaj,
	Ingo Molnar, linuxppc-dev
* Peter Zijlstra <a.p.zijlstra@chello.nl> [2009-09-22 09:25:59]:
> 
> 
> Much better :-)
> 
> 
> But I'm puzzled by all the per-cpu-ish-ness of the stuff. Why would you
> need to register things on a per-cpu basis?
> 
> Also:
> 
> 
> +       list_for_each(pos, &per_cpu(cpuidle_devices_list, cpu)) {
> +               temp_dev = container_of(pos, struct cpuidle_device,
> +                                       percpu_list[cpu]);
> +               if (dev == temp_dev) {
> +                       list_del(&temp_dev->percpu_list[cpu]);
> +                       cpuidle_remove_state_sysfs(temp_dev);
> +               }
> +       }
> 
> Looks buggy, either you want to break out of the loop on dev ==
> temp_dev, or you want to use list_for_each_safe().
> 
> 
> 
Hi Peter,
There were a couple of buggy issues, which i have cleaned up for the
next iteration.
        * As you pointed out above, the loop is buggy.
        * Also, the percpu_list[NR_CPUS] which i am defining inside
          struct cpuidle_device is wrong. It does not need to be an
          array.
Thanks for the quick turnaround
arun
^ permalink raw reply	[flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-09-22  8:56 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-22  5:33 [v5 RFC PATCH 0/7]: cpuidle/x86/POWER (REDESIGN): Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER Arun R Bharadwaj
2009-09-22  5:35 ` [v5 RFC PATCH 1/7]: cpuidle: cleanup drivers/cpuidle/cpuidle.c Arun R Bharadwaj
2009-09-22  5:36 ` [v5 RFC PATCH 2/7]: cpuidle: implement a list based approach to register a set of idle routines Arun R Bharadwaj
2009-09-22  5:37 ` [v5 RFC PATCH 3/7]: x86: refactor x86 idle power management code and remove all instances of pm_idle Arun R Bharadwaj
2009-09-22  5:38 ` [v5 RFC PATCH 4/7]: POWER: enable cpuidle for POWER Arun R Bharadwaj
2009-09-22  5:39 ` [v5 RFC PATCH 5/7]: POWER/pSeries: remove dedicate/shared idle loops, which will be moved to arch/powerpc/platforms/pseries/processor_idle.c Arun R Bharadwaj
2009-09-22  5:40 ` [v5 RFC PATCH 6/7]: POWER: add a default_idle idle loop for POWER Arun R Bharadwaj
2009-09-22  5:41 ` [v5 RFC PATCH 7/7]: POWER/pSeries: implement pSeries processor idle module Arun R Bharadwaj
2009-09-22  7:25 ` [v5 RFC PATCH 0/7]: cpuidle/x86/POWER (REDESIGN): Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER Peter Zijlstra
2009-09-22  8:56   ` Arun R Bharadwaj
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).