public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
* PowerOp Design and working patch
@ 2006-07-28 22:31 david singleton
  2006-07-28 23:38 ` Greg KH
  0 siblings, 1 reply; 45+ messages in thread
From: david singleton @ 2006-07-28 22:31 UTC (permalink / raw)
  To: linux-pm; +Cc: David Singleton

[-- Attachment #1: Type: text/plain, Size: 2026 bytes --]

Here is a patch that implements a version of the PowerOp concept.

The goal of PowerOp power management is to provide a framework that 
unifies
and simplifies the various power management infrastructures in Linux.  
The
three infrastructures Power Op is concerned with are:

         1) basic suspend/resume power management (CONFIG_PM)

         2) basic processor frequency management (CONFIG_CPUFREQ)

         3) SourceForge's Dynamic Power Management (CONFIG_DPM)

All three power management infrastructures are concerned with 
controlling
power states of the system, and interestingly enough they all perform 
the
same basic operational steps to control changes in power state.

PowerOp uses the existing power management sysfs infrastructure and 
extends it
to perform cpufreq and dynamic power management operations. The 
traditional
suspend to memory or disk (or swap) infrastructure has the correct 
operational
structure that supports all types of power state change.

The CPUFREQ table based frequency control makes controlling cpu 
frequency
simple and straight forward.  The user doesn't get to set the cpu to
any speed, but only to supported speeds that have been provided by
the hardware vendor and validated.

Dynamic Power Management eventually treats all types of power states as 
operating points,
wether it's a suspend operating point, a particular frequency, or a 
specific
voltage.

By combining the best of all of these power management infrastructures
PowerOp uses the operational structure of tradition CONFIG_PM power
management and converts all power states, frequency, voltage, idle or
suspend to the CPUFREQ concept of only supported and validated operating
points.

PowerOp allows setting any operating point supported by the system by 
setting
the name of the operating point into /sys/power/state.

I apologize for being so late to the table, but I find a working patch 
easier to discuss
and I needed to see if I could unify and simplify all three power 
management infrastructures.

David



[-- Attachment #2: powerop.patch --]
[-- Type: application/octet-stream, Size: 24610 bytes --]

 Documentation/power/powerop.txt                   |  168 +++++++++++++++++++
 Makefile                                          |    2 
 arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c |  108 ++++++++++++
 drivers/cpufreq/cpufreq.c                         |   34 +++
 include/linux/cpufreq.h                           |    2 
 include/linux/pm.h                                |   26 ++-
 kernel/power/main.c                               |  189 +++++++++++++++++-----
 kernel/power/power.h                              |    2 
 8 files changed, 486 insertions(+), 45 deletions(-)

Index: linux-2.6.17/kernel/power/main.c
===================================================================
--- linux-2.6.17.orig/kernel/power/main.c
+++ linux-2.6.17/kernel/power/main.c
@@ -49,7 +49,7 @@ void pm_set_ops(struct pm_ops * ops)
  *	the platform can enter the requested state.
  */
 
-static int suspend_prepare(suspend_state_t state)
+static int suspend_prepare(struct power_op * state)
 {
 	int error = 0;
 	unsigned int free_pages;
@@ -82,7 +82,7 @@ static int suspend_prepare(suspend_state
 	}
 
 	if (pm_ops->prepare) {
-		if ((error = pm_ops->prepare(state)))
+		if ((error = pm_ops->prepare(state->type)))
 			goto Thaw;
 	}
 
@@ -94,7 +94,7 @@ static int suspend_prepare(suspend_state
 	return 0;
  Finish:
 	if (pm_ops->finish)
-		pm_ops->finish(state);
+		pm_ops->finish(state->type);
  Thaw:
 	thaw_processes();
  Enable_cpu:
@@ -104,7 +104,7 @@ static int suspend_prepare(suspend_state
 }
 
 
-int suspend_enter(suspend_state_t state)
+int suspend_enter(struct power_op * state)
 {
 	int error = 0;
 	unsigned long flags;
@@ -115,7 +115,7 @@ int suspend_enter(suspend_state_t state)
 		printk(KERN_ERR "Some devices failed to power down\n");
 		goto Done;
 	}
-	error = pm_ops->enter(state);
+	error = pm_ops->enter(state->type);
 	device_power_up();
  Done:
 	local_irq_restore(flags);
@@ -131,36 +131,94 @@ int suspend_enter(suspend_state_t state)
  *	console that we've allocated. This is not called for suspend-to-disk.
  */
 
-static void suspend_finish(suspend_state_t state)
+static void suspend_finish(struct power_op * state)
 {
 	device_resume();
 	resume_console();
 	thaw_processes();
 	enable_nonboot_cpus();
 	if (pm_ops && pm_ops->finish)
-		pm_ops->finish(state);
+		pm_ops->finish(state->type);
 	pm_restore_console();
 }
 
 
+struct power_op *current_state;
+struct power_op pm_states = {
+	.name = "default",
+	.type = PM_SUSPEND_ON,
+};
 
-
-static const char * const pm_states[PM_SUSPEND_MAX] = {
-	[PM_SUSPEND_STANDBY]	= "standby",
-	[PM_SUSPEND_MEM]	= "mem",
+static struct power_op standby = {
+	.name = "standby",
+	.description = "Power-On Suspend ACPI State: S1",
+	.type = PM_SUSPEND_STANDBY,
+};
+static struct power_op mem = {
+	.name = "mem   ",
+	.description = "Suspend-to-RAM ACPI State: S3",
+	.type = PM_SUSPEND_MEM,
+};
 #ifdef CONFIG_SOFTWARE_SUSPEND
-	[PM_SUSPEND_DISK]	= "disk",
-#endif
+static struct power_op disk = {
+	.name = "disk  ",
+	.description = "Suspend-to-disk ACPI State: S4",
+	.type = PM_SUSPEND_DISK,
 };
+#endif
 
-static inline int valid_state(suspend_state_t state)
+/*
+ *
+ */
+static int pm_change_state(struct power_op *state)
+{
+	int error = -EINVAL;
+	int len = strlen(state->name);
+	struct power_op *this, *next;
+	struct list_head *head = &pm_states.list;
+
+	/*
+	 * list_find new operating point.
+	 * compare to current operating point.
+	 * if different change to new operating point.
+	 */
+	list_for_each_entry_safe(this, next, head, list) {
+		if (strncmp(state->name, this->name, len) == 0) {
+			if ((strcmp(current_state->name, this->name)) == 0) {
+				return 0;
+			}
+
+			if (this->prepare_transition(current_state, this)) {
+				break;
+			}
+
+			if (this->transition(current_state, this)) {
+				break;
+			}
+
+			/*
+			 * now lets wait for the transition latency
+			 */
+			udelay(this->latency);
+
+			error = this->finish_transition(current_state, this);
+
+			if (error == 0)
+				current_state = this;
+			break;
+		}
+	}
+	return error;
+}
+
+static inline int valid_state(struct power_op * state)
 {
 	/* Suspend-to-disk does not really need low-level support.
 	 * It can work with reboot if needed. */
-	if (state == PM_SUSPEND_DISK)
+	if (state->type == PM_SUSPEND_DISK)
 		return 1;
 
-	if (pm_ops && pm_ops->valid && !pm_ops->valid(state))
+	if (pm_ops && pm_ops->valid && !pm_ops->valid(state->type))
 		return 0;
 	return 1;
 }
@@ -168,7 +226,7 @@ static inline int valid_state(suspend_st
 
 /**
  *	enter_state - Do common work of entering low-power state.
- *	@state:		pm_state structure for state we're entering.
+ *	@state:		power_op structure for state we're entering.
  *
  *	Make sure we're the only ones trying to enter a sleep state. Fail
  *	if someone has beat us to it, since we don't want anything weird to
@@ -177,7 +235,7 @@ static inline int valid_state(suspend_st
  *	we've woken up).
  */
 
-static int enter_state(suspend_state_t state)
+static int enter_state(struct power_op *state)
 {
 	int error;
 
@@ -186,16 +244,21 @@ static int enter_state(suspend_state_t s
 	if (down_trylock(&pm_sem))
 		return -EBUSY;
 
-	if (state == PM_SUSPEND_DISK) {
+	if (state->type == PM_SUSPEND_DISK) {
 		error = pm_suspend_disk();
 		goto Unlock;
 	}
 
-	pr_debug("PM: Preparing system for %s sleep\n", pm_states[state]);
+	if (state->type == PM_FREQ_CHANGE) {
+		error = pm_change_state(state);
+		goto Unlock;
+	}
+
+	pr_debug("PM: Preparing system for %s sleep\n", state->name);
 	if ((error = suspend_prepare(state)))
 		goto Unlock;
 
-	pr_debug("PM: Entering %s sleep\n", pm_states[state]);
+	pr_debug("PM: Entering %s sleep\n", state->name);
 	error = suspend_enter(state);
 
 	pr_debug("PM: Finishing wakeup.\n");
@@ -211,7 +274,15 @@ static int enter_state(suspend_state_t s
  */
 int software_suspend(void)
 {
-	return enter_state(PM_SUSPEND_DISK);
+	struct power_op *this, *next;
+	struct list_head *head = &pm_states.list;
+	int error = 0;
+
+	list_for_each_entry_safe(this, next, head, list) {
+		if (this->type == PM_SUSPEND_DISK)
+			error= enter_state(this);
+	}
+	return error;
 }
 
 
@@ -223,16 +294,48 @@ int software_suspend(void)
  *	structure, and enter (above).
  */
 
-int pm_suspend(suspend_state_t state)
+int pm_suspend(struct power_op * state)
 {
-	if (state > PM_SUSPEND_ON && state <= PM_SUSPEND_MAX)
+	if (state->type > PM_SUSPEND_ON && state->type <= PM_SUSPEND_MAX)
 		return enter_state(state);
 	return -EINVAL;
 }
 
+decl_subsys(power,NULL,NULL);
 
+/**
+ *	supported_states - control system power state.
+ *
+ *	show() returns what states are supported, which are no longer
+ * 	hard-coded to just 'standby' (Power-On Suspend), 'mem' (Suspend-to-RAM),
+ *	and *'disk' (Suspend-to-Disk), but show all the power states.
+ *
+ *	store() unwritable
+ */
 
-decl_subsys(power,NULL,NULL);
+static ssize_t supported_states_show(struct subsystem * subsys, char * buf)
+{
+	struct power_op *this, *next;
+	struct list_head *head = &pm_states.list;
+	const char *header =  "< Name >    <Frequency>  <Voltage>  <Transition Latency>  < Description >\n";
+	char * s = buf;
+
+	s += sprintf(s, "%s", header);
+	list_for_each_entry_safe(this, next, head, list) {
+		s += sprintf(s,"%s %dKHz %dmV %dus %s\n", this->name,
+		   this->frequency, this->voltage, this->latency,
+		   this->description);
+	}
+
+	return (s - buf);
+}
+
+static ssize_t supported_states_store(struct subsystem * subsys, const char *buf, size_t n)
+{
+	return -EINVAL;
+}
+
+power_attr(supported_states);
 
 
 /**
@@ -248,36 +351,28 @@ decl_subsys(power,NULL,NULL);
 
 static ssize_t state_show(struct subsystem * subsys, char * buf)
 {
-	int i;
 	char * s = buf;
 
-	for (i = 0; i < PM_SUSPEND_MAX; i++) {
-		if (pm_states[i] && valid_state(i))
-			s += sprintf(s,"%s ", pm_states[i]);
-	}
-	s += sprintf(s,"\n");
+	s += sprintf(s,"%s\n", current_state->name);
 	return (s - buf);
 }
 
 static ssize_t state_store(struct subsystem * subsys, const char * buf, size_t n)
 {
-	suspend_state_t state = PM_SUSPEND_STANDBY;
-	const char * const *s;
+	struct power_op *this, *next;
+	struct list_head *head = &pm_states.list;
 	char *p;
-	int error;
+	int error = -EINVAL;
 	int len;
 
 	p = memchr(buf, '\n', n);
 	len = p ? p - buf : n;
-
-	for (s = &pm_states[state]; state < PM_SUSPEND_MAX; s++, state++) {
-		if (*s && !strncmp(buf, *s, len))
+	list_for_each_entry_safe(this, next, head, list) {
+		if (!strncmp(buf, this->name, len)) {
+			error = enter_state(this);
 			break;
+		}
 	}
-	if (state < PM_SUSPEND_MAX && *s)
-		error = enter_state(state);
-	else
-		error = -EINVAL;
 	return error ? error : n;
 }
 
@@ -285,6 +380,7 @@ power_attr(state);
 
 static struct attribute * g[] = {
 	&state_attr.attr,
+	&supported_states_attr.attr,
 	NULL,
 };
 
@@ -295,9 +391,20 @@ static struct attribute_group attr_group
 
 static int __init pm_init(void)
 {
+
 	int error = subsystem_register(&power_subsys);
 	if (!error)
 		error = sysfs_create_group(&power_subsys.kset.kobj,&attr_group);
+
+	INIT_LIST_HEAD(&pm_states.list);
+
+#ifdef CONFIG_SOFTWARE_SUSPEND
+	list_add(&disk.list, &pm_states.list);
+#endif
+	list_add(&mem.list, &pm_states.list);
+	list_add(&standby.list, &pm_states.list);
+	current_state = &pm_states;
+
 	return error;
 }
 
Index: linux-2.6.17/include/linux/pm.h
===================================================================
--- linux-2.6.17.orig/include/linux/pm.h
+++ linux-2.6.17/include/linux/pm.h
@@ -108,7 +108,29 @@ typedef int __bitwise suspend_state_t;
 #define PM_SUSPEND_STANDBY	((__force suspend_state_t) 1)
 #define PM_SUSPEND_MEM		((__force suspend_state_t) 3)
 #define PM_SUSPEND_DISK		((__force suspend_state_t) 4)
-#define PM_SUSPEND_MAX		((__force suspend_state_t) 5)
+#define PM_FREQ_CHANGE		((__force suspend_state_t) 5)
+#define PM_VOLT_CHANGE		((__force suspend_state_t) 6)
+#define PM_SUSPEND_MAX		((__force suspend_state_t) 7)
+
+#define PM_NAME_SIZE            16
+#define PM_DESCRIPTION_SIZE     48
+
+struct power_op {
+	struct list_head list;
+	suspend_state_t type;
+	char name[PM_NAME_SIZE];
+	char description[PM_DESCRIPTION_SIZE];
+	unsigned int frequency;		/* in KHz */
+	unsigned int voltage;		/* mV */
+	unsigned int latency;		/* transition latency in us */
+	int     (*prepare_transition)(struct power_op *cur, struct power_op *new);
+	int     (*transition)(struct power_op *cur, struct power_op *new);
+	int     (*finish_transition)(struct power_op *cur, struct power_op *new);
+
+	void *md_data;			/* arch dependent data (dpm_opt) */
+};
+extern struct power_op pm_states;
+extern struct power_op *current_state;
 
 typedef int __bitwise suspend_disk_method_t;
 
@@ -128,7 +150,7 @@ struct pm_ops {
 
 extern void pm_set_ops(struct pm_ops *);
 extern struct pm_ops *pm_ops;
-extern int pm_suspend(suspend_state_t state);
+extern int pm_suspend(struct power_op *state);
 
 
 /*
Index: linux-2.6.17/kernel/power/power.h
===================================================================
--- linux-2.6.17.orig/kernel/power/power.h
+++ linux-2.6.17/kernel/power/power.h
@@ -113,4 +113,4 @@ extern int swsusp_resume(void);
 extern int swsusp_read(void);
 extern int swsusp_write(void);
 extern void swsusp_close(void);
-extern int suspend_enter(suspend_state_t state);
+extern int suspend_enter(struct power_op * state);
Index: linux-2.6.17/Makefile
===================================================================
--- linux-2.6.17.orig/Makefile
+++ linux-2.6.17/Makefile
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 6
 SUBLEVEL = 18
-EXTRAVERSION = -rc1
+EXTRAVERSION = -rc1-PowerOp
 NAME=Crazed Snow-Weasel
 
 # *DOCUMENTATION*
Index: linux-2.6.17/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
===================================================================
--- linux-2.6.17.orig/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
+++ linux-2.6.17/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
@@ -20,6 +20,7 @@
 #include <linux/sched.h>	/* current */
 #include <linux/delay.h>
 #include <linux/compiler.h>
+#include <linux/pm.h>
 
 #ifdef CONFIG_X86_SPEEDSTEP_CENTRINO_ACPI
 #include <linux/acpi.h>
@@ -160,6 +161,80 @@ static struct cpufreq_frequency_table ba
 	OP(1400, 1484),
 	{ .frequency = CPUFREQ_TABLE_END }
 };
+#ifdef CONFIG_PM
+/*
+ * this is the formula for OP.  We need the perfctl
+ * msr value to change to a new frequency.
+ * We'll save it in the machine dependent variable md_data.
+ */
+static void set_perfctl_msr(struct power_op *pm)
+{
+	unsigned int msr = pm->frequency/100;
+	unsigned int v = pm->voltage - 700;
+
+	msr = (unsigned int)(msr << 8);
+	msr |= (unsigned int)(v / 16);
+	pm->md_data = (void *)msr;
+}
+
+static int centrino_transition(struct power_op *cur, struct power_op *new);
+
+static struct power_op mhz600 = {
+        .name = "600MHz",
+        .description = "Low Frequency state",
+        .type = PM_FREQ_CHANGE,
+        .frequency = 600,
+        .voltage = 956,
+	.latency = 100,
+	.prepare_transition  = cpufreq_prepare_transition,
+	.transition = centrino_transition,
+	.finish_transition = cpufreq_finish_transition,
+};
+static struct power_op mhz800 = {
+        .name = "800MHz",
+        .description = "Lower Frequency state",
+        .type = PM_FREQ_CHANGE,
+        .frequency = 800,
+        .voltage = 1180,
+	.latency = 100,
+	.prepare_transition  = cpufreq_prepare_transition,
+	.transition = centrino_transition,
+	.finish_transition = cpufreq_finish_transition,
+};
+static struct power_op ghz1 = {
+        .name = "1GHz",
+        .description = "Med Frequency state",
+        .type = PM_FREQ_CHANGE,
+        .frequency = 1000,
+        .voltage = 1308,
+	.latency = 100,
+	.prepare_transition  = cpufreq_prepare_transition,
+	.transition = centrino_transition,
+	.finish_transition = cpufreq_finish_transition,
+};
+static struct power_op ghz12 = {
+        .name = "1.2GHz",
+        .description = "High Frequency state",
+        .type = PM_FREQ_CHANGE,
+        .frequency = 1200,
+        .voltage = 1436,
+	.latency = 100,
+	.prepare_transition  = cpufreq_prepare_transition,
+	.transition = centrino_transition,
+	.finish_transition = cpufreq_finish_transition,
+};
+static struct power_op ghz14 = {
+        .name = "1.4GHz",
+        .description = "Highest Frequency state",
+        .type = PM_FREQ_CHANGE,
+        .frequency = 1400,
+        .voltage = 1484,
+	.latency = 100,
+	.prepare_transition  = cpufreq_prepare_transition,
+	.transition = centrino_transition,
+	.finish_transition = cpufreq_finish_transition,
+};
+#endif
 
 /* Intel Pentium M processor 1.50GHz (Banias) */
 static struct cpufreq_frequency_table banias_1500[] =
@@ -266,6 +341,19 @@ static int centrino_cpu_init_table(struc
 	dprintk("found \"%s\": max frequency: %dkHz\n",
 	       model->model_name, model->max_freq);
 
+#ifdef CONFIG_PM
+	list_add_tail(&mhz600.list, &pm_states.list);
+	set_perfctl_msr(&mhz600);
+	list_add_tail(&mhz800.list, &pm_states.list);
+	set_perfctl_msr(&mhz800);
+	list_add_tail(&ghz1.list, &pm_states.list);
+	set_perfctl_msr(&ghz1);
+	list_add_tail(&ghz12.list, &pm_states.list);
+	set_perfctl_msr(&ghz12);
+	list_add_tail(&ghz14.list, &pm_states.list);
+	set_perfctl_msr(&ghz14);
+	current_state = &ghz14;
+#endif
 	return 0;
 }
 
@@ -620,6 +708,26 @@ static int centrino_verify (struct cpufr
 	return cpufreq_frequency_table_verify(policy, centrino_model[policy->cpu]->op_points);
 }
 
+static int centrino_transition(struct power_op *cur, struct power_op *new)
+{
+	unsigned int msr, oldmsr = 0, h = 0;
+
+	if (cur == new)
+		return 0;
+
+	msr = (unsigned int)new->md_data;
+	rdmsr(MSR_IA32_PERF_CTL, oldmsr, h);
+
+	/* all but 16 LSB are reserved, treat them with care */
+	oldmsr &= ~0xffff;
+	msr &= 0xffff;
+	oldmsr |= msr;
+
+	wrmsr(MSR_IA32_PERF_CTL, oldmsr, h);
+
+	return 0;
+}
+
 /**
  * centrino_setpolicy - set a new CPUFreq policy
  * @policy: new policy
Index: linux-2.6.17/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-2.6.17.orig/drivers/cpufreq/cpufreq.c
+++ linux-2.6.17/drivers/cpufreq/cpufreq.c
@@ -226,6 +226,33 @@ static void adjust_jiffies(unsigned long
 static inline void adjust_jiffies(unsigned long val, struct cpufreq_freqs *ci) { return; }
 #endif
 
+int cpufreq_prepare_transition(struct power_op *cur, struct power_op *new)
+{
+	struct cpufreq_freqs freqs;
+
+	freqs.old = cur->frequency;
+	freqs.new = new->frequency;
+	freqs.cpu = 0;
+	freqs.flags = cpufreq_driver->flags;
+	blocking_notifier_call_chain(&cpufreq_transition_notifier_list,
+			CPUFREQ_PRECHANGE, &freqs);
+	adjust_jiffies(CPUFREQ_PRECHANGE, &freqs);
+	return 0;
+}
+
+int cpufreq_finish_transition(struct power_op *cur, struct power_op *new)
+{
+	struct cpufreq_freqs freqs;
+
+	freqs.old = cur->frequency;
+	freqs.new = new->frequency;
+	freqs.cpu = 0;
+	freqs.flags = cpufreq_driver->flags;
+	adjust_jiffies(CPUFREQ_POSTCHANGE, &freqs);
+	blocking_notifier_call_chain(&cpufreq_transition_notifier_list,
+			CPUFREQ_POSTCHANGE, &freqs);
+	return 0;
+}
 
 /**
  * cpufreq_notify_transition - call notifier chain and adjust_jiffies
@@ -884,6 +911,12 @@ static void cpufreq_out_of_sync(unsigned
 }
 
 
+#ifdef CONFIG_PM
+unsigned int cpufreq_quick_get(unsigned int cpu)
+{
+	return (current_state->frequency * 1000);
+}
+#else
 /**
  * cpufreq_quick_get - get the CPU frequency (in kHz) frpm policy->cur
  * @cpu: CPU number
@@ -905,6 +938,7 @@ unsigned int cpufreq_quick_get(unsigned 
 
 	return (ret);
 }
+#endif
 EXPORT_SYMBOL(cpufreq_quick_get);
 
 
Index: linux-2.6.17/include/linux/cpufreq.h
===================================================================
--- linux-2.6.17.orig/include/linux/cpufreq.h
+++ linux-2.6.17/include/linux/cpufreq.h
@@ -271,6 +271,8 @@ static inline unsigned int cpufreq_quick
 	return 0;
 }
 #endif
+int cpufreq_prepare_transition(struct power_op *cur, struct power_op *new);
+int cpufreq_finish_transition(struct power_op *cur, struct power_op *new);
 
 
 /*********************************************************************
Index: linux-2.6.17/Documentation/power/powerop.txt
===================================================================
--- /dev/null
+++ linux-2.6.17/Documentation/power/powerop.txt
@@ -0,0 +1,168 @@
+
+The PowerOp Power Management infrastructure.
+
+David Singleton <dsingleton@mvista.com>
+
+25 July 2006
+
+Copyright (c) 2006 MontaVista Software Inc.
+
+0. Introduction
+
+The goal of PowerOp power management is to provide a framework that unifies
+and simplifies the various power management infrastructures in Linux.  The
+three infrastructures Power Op is concerned with are:
+
+	1) basic suspend/resume power management (CONFIG_PM)
+
+	2) basic processor frequency management (CONFIG_CPUFREQ)
+
+	3) SourceForge's Dynamic Power Management (CONFIG_DPM)
+
+All three power management infrastructures are concerned with controlling
+power states of the system, and interestingly enough they all perform the
+same basic operational steps to control changes in power state.
+
+PowerOp uses the existing power management sysfs infrastructure and extends it
+to perform cpufreq and dynamic power management operations. The traditional
+suspend to memory or disk (or swap) infrastructure has the correct operational
+structure that supports all types of power state change.
+
+The CPUFREQ table based frequency control makes controlling cpu frequency
+simple and straight forward.  The user doesn't get to set the cpu to
+any speed, but only to supported speeds that have been provided by
+the hardware vendor and validated.
+
+Dynamic Power Management treats all types of power states as operating points,
+wether it's a suspend operating point, a particular frequency, or a specific
+voltage.
+
+By combining the best of all of these power management infrastructures
+PowerOp uses the operational structure of tradition CONFIG_PM power
+management and converts all power states, frequency, voltage, idle or
+suspend to the CPUFREQ concept of only supported and validated operating
+points.
+
+PowerOp then becomes a simplified power management infrastructure in that
+only operating points that are supported and validated are available
+to the user.  Control of all operating points are done by the operating
+point name.  The user cannot supply invalid, or malicious,
+parameters that would hang or crash the system.
+
+1) PowerOp interface.
+
+To simplify power management all operations take place through two sysfs
+files, /sys/power/state and /sys/power/supported_states.  The 'state' file
+shows the current operating point of the system.  The readonly
+'supported_states' file shows the operating points the system supports.
+
+Supported operating points are displayed in tuple format of:
+
+<name, frequency, voltage, transition latency, description>
+
+The supported_states file contains rows of tuples with each
+tuple describing a supported operating point of the system.
+The supported_states file looks like a merge between the old
+/sys/power/state file and a cpufreq table.
+
+The system can transition to any of the supported states by simply
+storing the operating point name in the /sys/power/state file.
+
+To allow user space notification of events, like low battery, lid of
+the notebook being closed, etc.  PowerOp notifies the user through
+the hotplug interface.
+
+
+2) PowerOP Operating Points.
+
+An operating point is represented by the power_op struct which contains:
+
+struct power_op {
+        struct list_head list;
+        suspend_state_t type;
+        char name[PM_NAME_SIZE];
+        char description[PM_DESCRIPTION_SIZE];
+        unsigned int frequency;         /* in KHz */
+        unsigned int voltage;           /* mV */
+        unsigned int latency;           /* transition latency in us */
+        int     (*prepare_transition)(struct power_op *cur, struct power_op *new);
+        int     (*transition)(struct power_op *cur, struct power_op *new);
+        int     (*finish_transition)(struct power_op *cur, struct power_op *new);
+
+        void *md_data;                  /* arch dependent data  */
+};
+
+Each operating point has its own functions for preparing to transition,
+transitioning and finishing transition.  Cpu frequency operating points
+will probably share their op vectors, idle and suspend operating points my have
+different op vectors.
+
+
+3) Traditional Operation of Power Management Code.
+
+All three power management infrastructures have the same operational model.
+All three follow the PM model of preparing to suspend,  suspending,
+and finish the state change.  It was easiest to follow the model
+enforced by the traditional power management and use the three step process of:
+
+ 	1) get ready to change state
+	2) change state
+	3) finish changes
+
+Cpufreq infrastructure makes three calls to change the frequency of the
+processor:
+
+	1) cpufreq_notify_transition(&freq, CPUFREQ_PRECHANGE);
+
+	2) acpi_processor_set_performance (data, j, next_state);
+
+	3) cpufreq_notify_transition(&freq, CPUFREQ_POSTCHANGE);
+
+DPM uses these three calls to change frequency and/or voltage:
+
+	1) dpm_driver_scale(SCALE_PRECHANGE, new);
+
+	2) clk_set_rate(prcm_set, new->md_opt.prcm_clock);
+
+	3) dpm_driver_scale(SCALE_POStCHANGE, new);
+
+PM uses these three calls to suspend:
+
+	1) suspend_prepare(state);
+
+	2) suspend_enter(state->type);
+
+	3) suspend_finish(state);
+
+
+4) PowerOP Operation.
+
+PowerOP uses the following three calls to transition to a new operating
+point.
+
+	prepare_to_transition(cur_state, new_state);
+
+	transition(cur_state, new_state);
+
+	finish_transistion(cur_state, new_state);
+
+The parameters are pointers to operating point structures, struct power_op.
+
+Power OP is a simplified version of all three of these infrastructures in
+that it only deals with operating points, and more specifically with
+supported operating points.  Power Op presents a set of supported operating
+points to the user.  This is similar to the cpufreq table concept in that
+only supported and validated frequencies are avaliable.
+
+The definition of the operating point is done in a manner similar to cpufreqs
+in that the supported operating frequency, voltage and transition latency,
+are predefined (by the hardware vendor) and validated.
+
+The user maninuplates the operting points of the system by the
+name of the operating points.  This simplifies both the code and the
+control of the system's operating points in the PowerOp daemon.
+
+All supported operating points are defined at compile time and
+the user sets the system to different operating points by
+the operating point name.
+

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-07-28 22:31 PowerOp Design and working patch david singleton
@ 2006-07-28 23:38 ` Greg KH
  2006-07-29  0:26   ` david singleton
  2006-07-29  0:38   ` david singleton
  0 siblings, 2 replies; 45+ messages in thread
From: Greg KH @ 2006-07-28 23:38 UTC (permalink / raw)
  To: david singleton; +Cc: David Singleton, linux-pm

On Fri, Jul 28, 2006 at 03:31:41PM -0700, david singleton wrote:
> Here is a patch that implements a version of the PowerOp concept.

Any chance of breaking this up into logical patches that do one thing at
a time so it can be reviewed better?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-07-28 23:38 ` Greg KH
@ 2006-07-29  0:26   ` david singleton
  2006-07-29  0:38   ` david singleton
  1 sibling, 0 replies; 45+ messages in thread
From: david singleton @ 2006-07-29  0:26 UTC (permalink / raw)
  To: Greg KH; +Cc: David Singleton, linux-pm


On Jul 28, 2006, at 4:38 PM, Greg KH wrote:

> On Fri, Jul 28, 2006 at 03:31:41PM -0700, david singleton wrote:
>> Here is a patch that implements a version of the PowerOp concept.
>
> Any chance of breaking this up into logical patches that do one thing 
> at
> a time so it can be reviewed better?

Sure.  It was so small I just put them all together:

linux-2.6.17> quilt files
kernel/power/main.c
include/linux/pm.h
kernel/power/power.h
Makefile
arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
drivers/cpufreq/cpufreq.c
include/linux/cpufreq.h
Documentation/power/powerop.txt


I'll make a core patch and an x86-centrino patch.

David
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-07-28 23:38 ` Greg KH
  2006-07-29  0:26   ` david singleton
@ 2006-07-29  0:38   ` david singleton
  2006-07-29  0:45     ` Greg KH
  2006-08-07 22:06     ` Pavel Machek
  1 sibling, 2 replies; 45+ messages in thread
From: david singleton @ 2006-07-29  0:38 UTC (permalink / raw)
  To: Greg KH; +Cc: David Singleton, linux-pm

[-- Attachment #1: Type: text/plain, Size: 419 bytes --]


On Jul 28, 2006, at 4:38 PM, Greg KH wrote:

> On Fri, Jul 28, 2006 at 03:31:41PM -0700, david singleton wrote:
>> Here is a patch that implements a version of the PowerOp concept.
>
> Any chance of breaking this up into logical patches that do one thing 
> at
> a time so it can be reviewed better?
>
> thanks,
>
> greg k-h
>
Here's powerop-core.patch,  powerop-cpufreq.patch and 
powerop-x86-centrino.patch.

David


[-- Attachment #2: powerop-core.patch --]
[-- Type: application/octet-stream, Size: 17616 bytes --]

 Documentation/power/powerop.txt |  168 +++++++++++++++++++++++++++++++++++
 include/linux/pm.h              |   26 +++++
 kernel/power/main.c             |  189 +++++++++++++++++++++++++++++++---------
 kernel/power/power.h            |    2 
 4 files changed, 341 insertions(+), 44 deletions(-)

Index: linux-2.6.17/kernel/power/main.c
===================================================================
--- linux-2.6.17.orig/kernel/power/main.c
+++ linux-2.6.17/kernel/power/main.c
@@ -49,7 +49,7 @@ void pm_set_ops(struct pm_ops * ops)
  *	the platform can enter the requested state.
  */
 
-static int suspend_prepare(suspend_state_t state)
+static int suspend_prepare(struct power_op * state)
 {
 	int error = 0;
 	unsigned int free_pages;
@@ -82,7 +82,7 @@ static int suspend_prepare(suspend_state
 	}
 
 	if (pm_ops->prepare) {
-		if ((error = pm_ops->prepare(state)))
+		if ((error = pm_ops->prepare(state->type)))
 			goto Thaw;
 	}
 
@@ -94,7 +94,7 @@ static int suspend_prepare(suspend_state
 	return 0;
  Finish:
 	if (pm_ops->finish)
-		pm_ops->finish(state);
+		pm_ops->finish(state->type);
  Thaw:
 	thaw_processes();
  Enable_cpu:
@@ -104,7 +104,7 @@ static int suspend_prepare(suspend_state
 }
 
 
-int suspend_enter(suspend_state_t state)
+int suspend_enter(struct power_op * state)
 {
 	int error = 0;
 	unsigned long flags;
@@ -115,7 +115,7 @@ int suspend_enter(suspend_state_t state)
 		printk(KERN_ERR "Some devices failed to power down\n");
 		goto Done;
 	}
-	error = pm_ops->enter(state);
+	error = pm_ops->enter(state->type);
 	device_power_up();
  Done:
 	local_irq_restore(flags);
@@ -131,36 +131,94 @@ int suspend_enter(suspend_state_t state)
  *	console that we've allocated. This is not called for suspend-to-disk.
  */
 
-static void suspend_finish(suspend_state_t state)
+static void suspend_finish(struct power_op * state)
 {
 	device_resume();
 	resume_console();
 	thaw_processes();
 	enable_nonboot_cpus();
 	if (pm_ops && pm_ops->finish)
-		pm_ops->finish(state);
+		pm_ops->finish(state->type);
 	pm_restore_console();
 }
 
 
+struct power_op *current_state;
+struct power_op pm_states = {
+	.name = "default",
+	.type = PM_SUSPEND_ON,
+};
 
-
-static const char * const pm_states[PM_SUSPEND_MAX] = {
-	[PM_SUSPEND_STANDBY]	= "standby",
-	[PM_SUSPEND_MEM]	= "mem",
+static struct power_op standby = {
+	.name = "standby",
+	.description = "Power-On Suspend ACPI State: S1",
+	.type = PM_SUSPEND_STANDBY,
+};
+static struct power_op mem = {
+	.name = "mem   ",
+	.description = "Suspend-to-RAM ACPI State: S3",
+	.type = PM_SUSPEND_MEM,
+};
 #ifdef CONFIG_SOFTWARE_SUSPEND
-	[PM_SUSPEND_DISK]	= "disk",
-#endif
+static struct power_op disk = {
+	.name = "disk  ",
+	.description = "Suspend-to-disk ACPI State: S4",
+	.type = PM_SUSPEND_DISK,
 };
+#endif
 
-static inline int valid_state(suspend_state_t state)
+/*
+ *
+ */
+static int pm_change_state(struct power_op *state)
+{
+	int error = -EINVAL;
+	int len = strlen(state->name);
+	struct power_op *this, *next;
+	struct list_head *head = &pm_states.list;
+
+	/*
+	 * list_find new operating point.
+	 * compare to current operating point.
+	 * if different change to new operating point.
+	 */
+	list_for_each_entry_safe(this, next, head, list) {
+		if (strncmp(state->name, this->name, len) == 0) {
+			if ((strcmp(current_state->name, this->name)) == 0) {
+				return 0;
+			}
+
+			if (this->prepare_transition(current_state, this)) {
+				break;
+			}
+
+			if (this->transition(current_state, this)) {
+				break;
+			}
+
+			/*
+			 * now lets wait for the transition latency
+			 */
+			udelay(this->latency);
+
+			error = this->finish_transition(current_state, this);
+
+			if (error == 0)
+				current_state = this;
+			break;
+		}
+	}
+	return error;
+}
+
+static inline int valid_state(struct power_op * state)
 {
 	/* Suspend-to-disk does not really need low-level support.
 	 * It can work with reboot if needed. */
-	if (state == PM_SUSPEND_DISK)
+	if (state->type == PM_SUSPEND_DISK)
 		return 1;
 
-	if (pm_ops && pm_ops->valid && !pm_ops->valid(state))
+	if (pm_ops && pm_ops->valid && !pm_ops->valid(state->type))
 		return 0;
 	return 1;
 }
@@ -168,7 +226,7 @@ static inline int valid_state(suspend_st
 
 /**
  *	enter_state - Do common work of entering low-power state.
- *	@state:		pm_state structure for state we're entering.
+ *	@state:		power_op structure for state we're entering.
  *
  *	Make sure we're the only ones trying to enter a sleep state. Fail
  *	if someone has beat us to it, since we don't want anything weird to
@@ -177,7 +235,7 @@ static inline int valid_state(suspend_st
  *	we've woken up).
  */
 
-static int enter_state(suspend_state_t state)
+static int enter_state(struct power_op *state)
 {
 	int error;
 
@@ -186,16 +244,21 @@ static int enter_state(suspend_state_t s
 	if (down_trylock(&pm_sem))
 		return -EBUSY;
 
-	if (state == PM_SUSPEND_DISK) {
+	if (state->type == PM_SUSPEND_DISK) {
 		error = pm_suspend_disk();
 		goto Unlock;
 	}
 
-	pr_debug("PM: Preparing system for %s sleep\n", pm_states[state]);
+	if (state->type == PM_FREQ_CHANGE) {
+		error = pm_change_state(state);
+		goto Unlock;
+	}
+
+	pr_debug("PM: Preparing system for %s sleep\n", state->name);
 	if ((error = suspend_prepare(state)))
 		goto Unlock;
 
-	pr_debug("PM: Entering %s sleep\n", pm_states[state]);
+	pr_debug("PM: Entering %s sleep\n", state->name);
 	error = suspend_enter(state);
 
 	pr_debug("PM: Finishing wakeup.\n");
@@ -211,7 +274,15 @@ static int enter_state(suspend_state_t s
  */
 int software_suspend(void)
 {
-	return enter_state(PM_SUSPEND_DISK);
+	struct power_op *this, *next;
+	struct list_head *head = &pm_states.list;
+	int error = 0;
+
+	list_for_each_entry_safe(this, next, head, list) {
+		if (this->type == PM_SUSPEND_DISK)
+			error= enter_state(this);
+	}
+	return error;
 }
 
 
@@ -223,16 +294,48 @@ int software_suspend(void)
  *	structure, and enter (above).
  */
 
-int pm_suspend(suspend_state_t state)
+int pm_suspend(struct power_op * state)
 {
-	if (state > PM_SUSPEND_ON && state <= PM_SUSPEND_MAX)
+	if (state->type > PM_SUSPEND_ON && state->type <= PM_SUSPEND_MAX)
 		return enter_state(state);
 	return -EINVAL;
 }
 
+decl_subsys(power,NULL,NULL);
 
+/**
+ *	supported_states - control system power state.
+ *
+ *	show() returns what states are supported, which are no longer
+ * 	hard-coded to just 'standby' (Power-On Suspend), 'mem' (Suspend-to-RAM),
+ *	and *'disk' (Suspend-to-Disk), but show all the power states.
+ *
+ *	store() unwritable
+ */
 
-decl_subsys(power,NULL,NULL);
+static ssize_t supported_states_show(struct subsystem * subsys, char * buf)
+{
+	struct power_op *this, *next;
+	struct list_head *head = &pm_states.list;
+	const char *header =  "< Name >    <Frequency>  <Voltage>  <Transition Latency>  < Description >\n";
+	char * s = buf;
+
+	s += sprintf(s, "%s", header);
+	list_for_each_entry_safe(this, next, head, list) {
+		s += sprintf(s,"%s %dKHz %dmV %dus %s\n", this->name,
+		   this->frequency, this->voltage, this->latency,
+		   this->description);
+	}
+
+	return (s - buf);
+}
+
+static ssize_t supported_states_store(struct subsystem * subsys, const char *buf, size_t n)
+{
+	return -EINVAL;
+}
+
+power_attr(supported_states);
 
 
 /**
@@ -248,36 +351,28 @@ decl_subsys(power,NULL,NULL);
 
 static ssize_t state_show(struct subsystem * subsys, char * buf)
 {
-	int i;
 	char * s = buf;
 
-	for (i = 0; i < PM_SUSPEND_MAX; i++) {
-		if (pm_states[i] && valid_state(i))
-			s += sprintf(s,"%s ", pm_states[i]);
-	}
-	s += sprintf(s,"\n");
+	s += sprintf(s,"%s\n", current_state->name);
 	return (s - buf);
 }
 
 static ssize_t state_store(struct subsystem * subsys, const char * buf, size_t n)
 {
-	suspend_state_t state = PM_SUSPEND_STANDBY;
-	const char * const *s;
+	struct power_op *this, *next;
+	struct list_head *head = &pm_states.list;
 	char *p;
-	int error;
+	int error = -EINVAL;
 	int len;
 
 	p = memchr(buf, '\n', n);
 	len = p ? p - buf : n;
-
-	for (s = &pm_states[state]; state < PM_SUSPEND_MAX; s++, state++) {
-		if (*s && !strncmp(buf, *s, len))
+	list_for_each_entry_safe(this, next, head, list) {
+		if (!strncmp(buf, this->name, len)) {
+			error = enter_state(this);
 			break;
+		}
 	}
-	if (state < PM_SUSPEND_MAX && *s)
-		error = enter_state(state);
-	else
-		error = -EINVAL;
 	return error ? error : n;
 }
 
@@ -285,6 +380,7 @@ power_attr(state);
 
 static struct attribute * g[] = {
 	&state_attr.attr,
+	&supported_states_attr.attr,
 	NULL,
 };
 
@@ -295,9 +391,20 @@ static struct attribute_group attr_group
 
 static int __init pm_init(void)
 {
+
 	int error = subsystem_register(&power_subsys);
 	if (!error)
 		error = sysfs_create_group(&power_subsys.kset.kobj,&attr_group);
+
+	INIT_LIST_HEAD(&pm_states.list);
+
+#ifdef CONFIG_SOFTWARE_SUSPEND
+	list_add(&disk.list, &pm_states.list);
+#endif
+	list_add(&mem.list, &pm_states.list);
+	list_add(&standby.list, &pm_states.list);
+	current_state = &pm_states;
+
 	return error;
 }
 
Index: linux-2.6.17/include/linux/pm.h
===================================================================
--- linux-2.6.17.orig/include/linux/pm.h
+++ linux-2.6.17/include/linux/pm.h
@@ -108,7 +108,29 @@ typedef int __bitwise suspend_state_t;
 #define PM_SUSPEND_STANDBY	((__force suspend_state_t) 1)
 #define PM_SUSPEND_MEM		((__force suspend_state_t) 3)
 #define PM_SUSPEND_DISK		((__force suspend_state_t) 4)
-#define PM_SUSPEND_MAX		((__force suspend_state_t) 5)
+#define PM_FREQ_CHANGE		((__force suspend_state_t) 5)
+#define PM_VOLT_CHANGE		((__force suspend_state_t) 6)
+#define PM_SUSPEND_MAX		((__force suspend_state_t) 7)
+
+#define PM_NAME_SIZE            16
+#define PM_DESCRIPTION_SIZE     48
+
+struct power_op {
+	struct list_head list;
+	suspend_state_t type;
+	char name[PM_NAME_SIZE];
+	char description[PM_DESCRIPTION_SIZE];
+	unsigned int frequency;		/* in KHz */
+	unsigned int voltage;		/* mV */
+	unsigned int latency;		/* transition latency in us */
+	int     (*prepare_transition)(struct power_op *cur, struct power_op *new);
+	int     (*transition)(struct power_op *cur, struct power_op *new);
+	int     (*finish_transition)(struct power_op *cur, struct power_op *new);
+
+	void *md_data;			/* arch dependent data (dpm_opt) */
+};
+extern struct power_op pm_states;
+extern struct power_op *current_state;
 
 typedef int __bitwise suspend_disk_method_t;
 
@@ -128,7 +150,7 @@ struct pm_ops {
 
 extern void pm_set_ops(struct pm_ops *);
 extern struct pm_ops *pm_ops;
-extern int pm_suspend(suspend_state_t state);
+extern int pm_suspend(struct power_op *state);
 
 
 /*
Index: linux-2.6.17/kernel/power/power.h
===================================================================
--- linux-2.6.17.orig/kernel/power/power.h
+++ linux-2.6.17/kernel/power/power.h
@@ -113,4 +113,4 @@ extern int swsusp_resume(void);
 extern int swsusp_read(void);
 extern int swsusp_write(void);
 extern void swsusp_close(void);
-extern int suspend_enter(suspend_state_t state);
+extern int suspend_enter(struct power_op * state);
Index: linux-2.6.17/Documentation/power/powerop.txt
===================================================================
--- /dev/null
+++ linux-2.6.17/Documentation/power/powerop.txt
@@ -0,0 +1,168 @@
+
+The PowerOp Power Management infrastructure.
+
+David Singleton <dsingleton@mvista.com>
+
+25 July 2006
+
+Copyright (c) 2006 MontaVista Software Inc.
+
+0. Introduction
+
+The goal of PowerOp power management is to provide a framework that unifies
+and simplifies the various power management infrastructures in Linux.  The
+three infrastructures Power Op is concerned with are:
+
+	1) basic suspend/resume power management (CONFIG_PM)
+
+	2) basic processor frequency management (CONFIG_CPUFREQ)
+
+	3) SourceForge's Dynamic Power Management (CONFIG_DPM)
+
+All three power management infrastructures are concerned with controlling
+power states of the system, and interestingly enough they all perform the
+same basic operational steps to control changes in power state.
+
+PowerOp uses the existing power management sysfs infrastructure and extends it
+to perform cpufreq and dynamic power management operations. The traditional
+suspend to memory or disk (or swap) infrastructure has the correct operational
+structure that supports all types of power state change.
+
+The CPUFREQ table based frequency control makes controlling cpu frequency
+simple and straight forward.  The user doesn't get to set the cpu to
+any speed, but only to supported speeds that have been provided by
+the hardware vendor and validated.
+
+Dynamic Power Management treats all types of power states as operating points,
+wether it's a suspend operating point, a particular frequency, or a specific
+voltage.
+
+By combining the best of all of these power management infrastructures
+PowerOp uses the operational structure of tradition CONFIG_PM power
+management and converts all power states, frequency, voltage, idle or
+suspend to the CPUFREQ concept of only supported and validated operating
+points.
+
+PowerOp then becomes a simplified power management infrastructure in that
+only operating points that are supported and validated are available
+to the user.  Control of all operating points are done by the operating
+point name.  The user cannot supply invalid, or malicious,
+parameters that would hang or crash the system.
+
+1) PowerOp interface.
+
+To simplify power management all operations take place through two sysfs
+files, /sys/power/state and /sys/power/supported_states.  The 'state' file
+shows the current operating point of the system.  The readonly
+'supported_states' file shows the operating points the system supports.
+
+Supported operating points are displayed in tuple format of:
+
+<name, frequency, voltage, transition latency, description>
+
+The supported_states file contains rows of tuples with each
+tuple describing a supported operating point of the system.
+The supported_states file looks like a merge between the old
+/sys/power/state file and a cpufreq table.
+
+The system can transition to any of the supported states by simply
+storing the operating point name in the /sys/power/state file.
+
+To allow user space notification of events, like low battery, lid of
+the notebook being closed, etc.  PowerOp notifies the user through
+the hotplug interface.
+
+
+2) PowerOP Operating Points.
+
+An operating point is represented by the power_op struct which contains:
+
+struct power_op {
+        struct list_head list;
+        suspend_state_t type;
+        char name[PM_NAME_SIZE];
+        char description[PM_DESCRIPTION_SIZE];
+        unsigned int frequency;         /* in KHz */
+        unsigned int voltage;           /* mV */
+        unsigned int latency;           /* transition latency in us */
+        int     (*prepare_transition)(struct power_op *cur, struct power_op *new);
+        int     (*transition)(struct power_op *cur, struct power_op *new);
+        int     (*finish_transition)(struct power_op *cur, struct power_op *new);
+
+        void *md_data;                  /* arch dependent data  */
+};
+
+Each operating point has its own functions for preparing to transition,
+transitioning and finishing transition.  Cpu frequency operating points
+will probably share their op vectors, idle and suspend operating points my have
+different op vectors.
+
+
+3) Traditional Operation of Power Management Code.
+
+All three power management infrastructures have the same operational model.
+All three follow the PM model of preparing to suspend,  suspending,
+and finish the state change.  It was easiest to follow the model
+enforced by the traditional power management and use the three step process of:
+
+ 	1) get ready to change state
+	2) change state
+	3) finish changes
+
+Cpufreq infrastructure makes three calls to change the frequency of the
+processor:
+
+	1) cpufreq_notify_transition(&freq, CPUFREQ_PRECHANGE);
+
+	2) acpi_processor_set_performance (data, j, next_state);
+
+	3) cpufreq_notify_transition(&freq, CPUFREQ_POSTCHANGE);
+
+DPM uses these three calls to change frequency and/or voltage:
+
+	1) dpm_driver_scale(SCALE_PRECHANGE, new);
+
+	2) clk_set_rate(prcm_set, new->md_opt.prcm_clock);
+
+	3) dpm_driver_scale(SCALE_POStCHANGE, new);
+
+PM uses these three calls to suspend:
+
+	1) suspend_prepare(state);
+
+	2) suspend_enter(state->type);
+
+	3) suspend_finish(state);
+
+
+4) PowerOP Operation.
+
+PowerOP uses the following three calls to transition to a new operating
+point.
+
+	prepare_to_transition(cur_state, new_state);
+
+	transition(cur_state, new_state);
+
+	finish_transistion(cur_state, new_state);
+
+The parameters are pointers to operating point structures, struct power_op.
+
+Power OP is a simplified version of all three of these infrastructures in
+that it only deals with operating points, and more specifically with
+supported operating points.  Power Op presents a set of supported operating
+points to the user.  This is similar to the cpufreq table concept in that
+only supported and validated frequencies are avaliable.
+
+The definition of the operating point is done in a manner similar to cpufreqs
+in that the supported operating frequency, voltage and transition latency,
+are predefined (by the hardware vendor) and validated.
+
+The user maninuplates the operting points of the system by the
+name of the operating points.  This simplifies both the code and the
+control of the system's operating points in the PowerOp daemon.
+
+All supported operating points are defined at compile time and
+the user sets the system to different operating points by
+the operating point name.
+

[-- Attachment #3: Type: text/plain, Size: 2 bytes --]




[-- Attachment #4: powerop-cpufreq.patch --]
[-- Type: application/octet-stream, Size: 2295 bytes --]

 drivers/cpufreq/cpufreq.c |   34 ++++++++++++++++++++++++++++++++++
 include/linux/cpufreq.h   |    2 ++
 2 files changed, 36 insertions(+)

Index: linux-2.6.17/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-2.6.17.orig/drivers/cpufreq/cpufreq.c
+++ linux-2.6.17/drivers/cpufreq/cpufreq.c
@@ -226,6 +226,33 @@ static void adjust_jiffies(unsigned long
 static inline void adjust_jiffies(unsigned long val, struct cpufreq_freqs *ci) { return; }
 #endif
 
+int cpufreq_prepare_transition(struct power_op *cur, struct power_op *new)
+{
+	struct cpufreq_freqs freqs;
+
+	freqs.old = cur->frequency;
+	freqs.new = new->frequency;
+	freqs.cpu = 0;
+	freqs.flags = cpufreq_driver->flags;
+	blocking_notifier_call_chain(&cpufreq_transition_notifier_list,
+			CPUFREQ_PRECHANGE, &freqs);
+	adjust_jiffies(CPUFREQ_PRECHANGE, &freqs);
+	return 0;
+}
+
+int cpufreq_finish_transition(struct power_op *cur, struct power_op *new)
+{
+	struct cpufreq_freqs freqs;
+
+	freqs.old = cur->frequency;
+	freqs.new = new->frequency;
+	freqs.cpu = 0;
+	freqs.flags = cpufreq_driver->flags;
+	adjust_jiffies(CPUFREQ_POSTCHANGE, &freqs);
+	blocking_notifier_call_chain(&cpufreq_transition_notifier_list,
+			CPUFREQ_POSTCHANGE, &freqs);
+	return 0;
+}
 
 /**
  * cpufreq_notify_transition - call notifier chain and adjust_jiffies
@@ -884,6 +911,12 @@ static void cpufreq_out_of_sync(unsigned
 }
 
 
+#ifdef CONFIG_PM
+unsigned int cpufreq_quick_get(unsigned int cpu)
+{
+	return (current_state->frequency * 1000);
+}
+#else
 /**
  * cpufreq_quick_get - get the CPU frequency (in kHz) frpm policy->cur
  * @cpu: CPU number
@@ -905,6 +938,7 @@ unsigned int cpufreq_quick_get(unsigned 
 
 	return (ret);
 }
+#endif
 EXPORT_SYMBOL(cpufreq_quick_get);
 
 
Index: linux-2.6.17/include/linux/cpufreq.h
===================================================================
--- linux-2.6.17.orig/include/linux/cpufreq.h
+++ linux-2.6.17/include/linux/cpufreq.h
@@ -271,6 +271,8 @@ static inline unsigned int cpufreq_quick
 	return 0;
 }
 #endif
+int cpufreq_prepare_transition(struct power_op *cur, struct power_op *new);
+int cpufreq_finish_transition(struct power_op *cur, struct power_op *new);
 
 
 /*********************************************************************

[-- Attachment #5: Type: text/plain, Size: 2 bytes --]




[-- Attachment #6: powerop-x86-centrino.patch --]
[-- Type: application/octet-stream, Size: 4363 bytes --]


 arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c |  108 ++++++++++++++++++++++
 1 files changed, 108 insertions(+)

Index: linux-2.6.17/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
===================================================================
--- linux-2.6.17.orig/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
+++ linux-2.6.17/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
@@ -20,6 +20,7 @@
 #include <linux/sched.h>	/* current */
 #include <linux/delay.h>
 #include <linux/compiler.h>
+#include <linux/pm.h>
 
 #ifdef CONFIG_X86_SPEEDSTEP_CENTRINO_ACPI
 #include <linux/acpi.h>
@@ -160,6 +161,80 @@ static struct cpufreq_frequency_table ba
 	OP(1400, 1484),
 	{ .frequency = CPUFREQ_TABLE_END }
 };
+#ifdef CONFIG_PM
+/*
+ * this is the formula for OP.  We need the perfctl
+ * msr value to change to a new frequency.
+ * We'll save it in the machine dependent variable md_data.
+ */
+static void set_perfctl_msr(struct power_op *pm)
+{
+	unsigned int msr = pm->frequency/100;
+	unsigned int v = pm->voltage - 700;
+
+	msr = (unsigned int)(msr << 8);
+	msr |= (unsigned int)(v / 16);
+	pm->md_data = (void *)msr;
+}
+
+static int centrino_transition(struct power_op *cur, struct power_op *new);
+
+static struct power_op mhz600 = {
+        .name = "600MHz",
+        .description = "Low Frequency state",
+        .type = PM_FREQ_CHANGE,
+        .frequency = 600,
+        .voltage = 956,
+	.latency = 100,
+	.prepare_transition  = cpufreq_prepare_transition,
+	.transition = centrino_transition,
+	.finish_transition = cpufreq_finish_transition,
+};
+static struct power_op mhz800 = {
+        .name = "800MHz",
+        .description = "Lower Frequency state",
+        .type = PM_FREQ_CHANGE,
+        .frequency = 800,
+        .voltage = 1180,
+	.latency = 100,
+	.prepare_transition  = cpufreq_prepare_transition,
+	.transition = centrino_transition,
+	.finish_transition = cpufreq_finish_transition,
+};
+static struct power_op ghz1 = {
+        .name = "1GHz",
+        .description = "Med Frequency state",
+        .type = PM_FREQ_CHANGE,
+        .frequency = 1000,
+        .voltage = 1308,
+	.latency = 100,
+	.prepare_transition  = cpufreq_prepare_transition,
+	.transition = centrino_transition,
+	.finish_transition = cpufreq_finish_transition,
+};
+static struct power_op ghz12 = {
+        .name = "1.2GHz",
+        .description = "High Frequency state",
+        .type = PM_FREQ_CHANGE,
+        .frequency = 1200,
+        .voltage = 1436,
+	.latency = 100,
+	.prepare_transition  = cpufreq_prepare_transition,
+	.transition = centrino_transition,
+	.finish_transition = cpufreq_finish_transition,
+};
+static struct power_op ghz14 = {
+        .name = "1.4GHz",
+        .description = "Highest Frequency state",
+        .type = PM_FREQ_CHANGE,
+        .frequency = 1400,
+        .voltage = 1484,
+	.latency = 100,
+	.prepare_transition  = cpufreq_prepare_transition,
+	.transition = centrino_transition,
+	.finish_transition = cpufreq_finish_transition,
+};
+#endif
 
 /* Intel Pentium M processor 1.50GHz (Banias) */
 static struct cpufreq_frequency_table banias_1500[] =
@@ -266,6 +341,19 @@ static int centrino_cpu_init_table(struc
 	dprintk("found \"%s\": max frequency: %dkHz\n",
 	       model->model_name, model->max_freq);
 
+#ifdef CONFIG_PM
+	list_add_tail(&mhz600.list, &pm_states.list);
+	set_perfctl_msr(&mhz600);
+	list_add_tail(&mhz800.list, &pm_states.list);
+	set_perfctl_msr(&mhz800);
+	list_add_tail(&ghz1.list, &pm_states.list);
+	set_perfctl_msr(&ghz1);
+	list_add_tail(&ghz12.list, &pm_states.list);
+	set_perfctl_msr(&ghz12);
+	list_add_tail(&ghz14.list, &pm_states.list);
+	set_perfctl_msr(&ghz14);
+	current_state = &ghz14;
+#endif
 	return 0;
 }
 
@@ -620,6 +708,26 @@ static int centrino_verify (struct cpufr
 	return cpufreq_frequency_table_verify(policy, centrino_model[policy->cpu]->op_points);
 }
 
+static int centrino_transition(struct power_op *cur, struct power_op *new)
+{
+	unsigned int msr, oldmsr = 0, h = 0;
+
+	if (cur == new)
+		return 0;
+
+	msr = (unsigned int)new->md_data;
+	rdmsr(MSR_IA32_PERF_CTL, oldmsr, h);
+
+	/* all but 16 LSB are reserved, treat them with care */
+	oldmsr &= ~0xffff;
+	msr &= 0xffff;
+	oldmsr |= msr;
+
+	wrmsr(MSR_IA32_PERF_CTL, oldmsr, h);
+
+	return 0;
+}
+
 /**
  * centrino_setpolicy - set a new CPUFreq policy
  * @policy: new policy

[-- Attachment #7: Type: text/plain, Size: 1 bytes --]



[-- Attachment #8: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-07-29  0:38   ` david singleton
@ 2006-07-29  0:45     ` Greg KH
  2006-07-29  5:12       ` david singleton
  2006-08-07 22:06     ` Pavel Machek
  1 sibling, 1 reply; 45+ messages in thread
From: Greg KH @ 2006-07-29  0:45 UTC (permalink / raw)
  To: david singleton; +Cc: David Singleton, linux-pm

On Fri, Jul 28, 2006 at 05:38:11PM -0700, david singleton wrote:
> 
> On Jul 28, 2006, at 4:38 PM, Greg KH wrote:
> 
> >On Fri, Jul 28, 2006 at 03:31:41PM -0700, david singleton wrote:
> >>Here is a patch that implements a version of the PowerOp concept.
> >
> >Any chance of breaking this up into logical patches that do one thing 
> >at
> >a time so it can be reviewed better?
> >
> >thanks,
> >
> >greg k-h
> >
> Here's powerop-core.patch,  powerop-cpufreq.patch and 
> powerop-x86-centrino.patch.

Um, no, that's not how kernel patches are submitted.  How about one per
email, with a description of what they do, inline so we can quote them
in a message (and actually read them in the original message...)

See patches posted here by others as examples of what is expected, and
see Documentation/SubmittingPatches for more details.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-07-29  0:45     ` Greg KH
@ 2006-07-29  5:12       ` david singleton
  2006-07-29 19:07         ` Eugeny S. Mints
  2006-07-29 22:09         ` Greg KH
  0 siblings, 2 replies; 45+ messages in thread
From: david singleton @ 2006-07-29  5:12 UTC (permalink / raw)
  To: Greg KH; +Cc: David Singleton, linux-pm


Greg,
	perhaps I need to back up a bit.  I wasn't submitting these patches
for inclusion into Linux.  I was presenting them to the people 
discussing
how power management might evolve in Linux.

	This patch is just a toy prototype to use as a strawman to discuss
how power management infrastructures in Linux might evolve to be more:
	
	a) unified

and
	b) simplified for both kernel and user space.

	The Documentation/powerop.txt included in the powerop-core.patch
tries to describe what the patch is attempting to do and how it works.

David


On Jul 28, 2006, at 5:45 PM, Greg KH wrote:

> On Fri, Jul 28, 2006 at 05:38:11PM -0700, david singleton wrote:
>>
>> On Jul 28, 2006, at 4:38 PM, Greg KH wrote:
>>
>>> On Fri, Jul 28, 2006 at 03:31:41PM -0700, david singleton wrote:
>>>> Here is a patch that implements a version of the PowerOp concept.
>>>
>>> Any chance of breaking this up into logical patches that do one thing
>>> at
>>> a time so it can be reviewed better?
>>>
>>> thanks,
>>>
>>> greg k-h
>>>
>> Here's powerop-core.patch,  powerop-cpufreq.patch and
>> powerop-x86-centrino.patch.
>
> Um, no, that's not how kernel patches are submitted.  How about one per
> email, with a description of what they do, inline so we can quote them
> in a message (and actually read them in the original message...)
>
> See patches posted here by others as examples of what is expected, and
> see Documentation/SubmittingPatches for more details.
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-07-29  5:12       ` david singleton
@ 2006-07-29 19:07         ` Eugeny S. Mints
  2006-07-30  4:43           ` david singleton
  2006-07-29 22:09         ` Greg KH
  1 sibling, 1 reply; 45+ messages in thread
From: Eugeny S. Mints @ 2006-07-29 19:07 UTC (permalink / raw)
  To: david singleton; +Cc: David Singleton, linux-pm

david singleton wrote:
> Greg,
> 	perhaps I need to back up a bit.  I wasn't submitting these patches
> for inclusion into Linux.  I was presenting them to the people 
> discussing
> how power management might evolve in Linux.
>
> 	This patch is just a toy prototype to use as a strawman to discuss
> how power management infrastructures in Linux might evolve to be more:
> 	
> 	a) unified
>   
 From high level POV I can read this patch set as an approach to
design a glue for suspend/resume management and frequency
changes management in the system but  I can hardly get from the
code and supplied documentation how the patch set addresses the
following  issues in question towards Linux power management
unification.

The ideal goal of ongoing efforts is to design a unified power
management framework which allows to build power management
for systems of different types (desktops, embedded, etc) on top
of the framework, customizing power management of a target
system with help of plugins implementation on framework layers.

I'd like to refer to the ongoing discussion on this list based on the
patches I sent out a week ago and ask for comments on how
your patch set addresses:

1) embedded system needs in question including but not
   limited to:
   - runtime operating points creation from userspace
   - standardization of API to control clock and voltage domains
   - integration with dynamic clock(and voltage) management
      (clock/voltage framework)

2) interface (kernel as well as userspace(sysfs)) for the rest of power
    parameters except cpu voltage and frequency

3) per platform nature of an operating point rather than per
    a pm control layer (cpufreq for ex.):
    - you have cpu freq and voltage defined in common code
       while it's still possible that on a certain platform one would
       not be interested in control of these parameters
   - it's cpufreq driver which allocates memory for operating
     points in your patches. I should not duplicating the code if I'm
     implementing another pm control layer instance (policy manager)
     which is actually a plugin for pm framework and can share operating
     points
   - most probably all operating points [at least] of the same type (
     sleeping or frequency) would have the same transition
     call back and since this it seems transition callback might be
     platform specific thing
   - assuming several instances on pm control layer (several
      policy managers) what would be the code which is
      responsible for accessing hardware to handle a certain
      policy manager decision? With the current approach you
     would need to duplicate code of pre/transition/finish
     routines per a pm control layer instance  

4) you introduced second sysfs interface for cpufreq and
    have not removed original one. Do you expect cpufreq
    sysfs interface to be changed from original one?

Thanks,
Eugeny
> and
> 	b) simplified for both kernel and user space.
>
> 	The Documentation/powerop.txt included in the powerop-core.patch
> tries to describe what the patch is attempting to do and how it works.
>
> David
>
>
> On Jul 28, 2006, at 5:45 PM, Greg KH wrote:
>
>   
>> On Fri, Jul 28, 2006 at 05:38:11PM -0700, david singleton wrote:
>>     
>>> On Jul 28, 2006, at 4:38 PM, Greg KH wrote:
>>>
>>>       
>>>> On Fri, Jul 28, 2006 at 03:31:41PM -0700, david singleton wrote:
>>>>         
>>>>> Here is a patch that implements a version of the PowerOp concept.
>>>>>           
>>>> Any chance of breaking this up into logical patches that do one thing
>>>> at
>>>> a time so it can be reviewed better?
>>>>
>>>> thanks,
>>>>
>>>> greg k-h
>>>>
>>>>         
>>> Here's powerop-core.patch,  powerop-cpufreq.patch and
>>> powerop-x86-centrino.patch.
>>>       
>> Um, no, that's not how kernel patches are submitted.  How about one per
>> email, with a description of what they do, inline so we can quote them
>> in a message (and actually read them in the original message...)
>>
>> See patches posted here by others as examples of what is expected, and
>> see Documentation/SubmittingPatches for more details.
>>
>> thanks,
>>
>> greg k-h
>>     
>
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>
>   

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-07-29  5:12       ` david singleton
  2006-07-29 19:07         ` Eugeny S. Mints
@ 2006-07-29 22:09         ` Greg KH
  2006-08-01  0:36           ` david singleton
  2006-08-01  1:27           ` david singleton
  1 sibling, 2 replies; 45+ messages in thread
From: Greg KH @ 2006-07-29 22:09 UTC (permalink / raw)
  To: david singleton; +Cc: David Singleton, linux-pm

On Fri, Jul 28, 2006 at 10:12:26PM -0700, david singleton wrote:
> 
> Greg,
> 	perhaps I need to back up a bit.  I wasn't submitting these patches
> for inclusion into Linux.  I was presenting them to the people 
> discussing
> how power management might evolve in Linux.

That's fine, but you should at least post them so we can read and
comment on them from within our email clients then :)

And also by breaking up the patches into small logical changes, it helps
us to understand what they are doing and would allow us to evaluate them
much better.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-07-29 19:07         ` Eugeny S. Mints
@ 2006-07-30  4:43           ` david singleton
  2006-07-30 11:02             ` Vitaly Wool
  2006-08-06 22:05             ` Pavel Machek
  0 siblings, 2 replies; 45+ messages in thread
From: david singleton @ 2006-07-30  4:43 UTC (permalink / raw)
  To: Eugeny S. Mints; +Cc: David Singleton, linux-pm


On Jul 29, 2006, at 12:07 PM, Eugeny S. Mints wrote:

> david singleton wrote:
>> Greg,
>> 	perhaps I need to back up a bit.  I wasn't submitting these patches
>> for inclusion into Linux.  I was presenting them to the people 
>> discussing
>> how power management might evolve in Linux.
>>
>> 	This patch is just a toy prototype to use as a strawman to discuss
>> how power management infrastructures in Linux might evolve to be more:
>> 	
>> 	a) unified
>>
> From high level POV I can read this patch set as an approach to
> design a glue for suspend/resume management and frequency
> changes management in the system but  I can hardly get from the
> code and supplied documentation how the patch set addresses the
> following  issues in question towards Linux power management
> unification.
>
> The ideal goal of ongoing efforts is to design a unified power
> management framework which allows to build power management
> for systems of different types (desktops, embedded, etc) on top
> of the framework, customizing power management of a target
> system with help of plugins implementation on framework layers.

That's the point of powerop.  The system designer implements
the operating points that best take advantage of the hardware on the
platform.

Powerop just does it in a simple and straight forward manner.  All
the supported states a system can be in are operating points.  Once
the different states can be expressed as operating points they
can be transitioned from one state to another simply
through the name of the operating point.

>
> I'd like to refer to the ongoing discussion on this list based on the
> patches I sent out a week ago and ask for comments on how
> your patch set addresses:
>
> 1) embedded system needs in question including but not
>   limited to:
>   - runtime operating points creation from userspace
>   - standardization of API to control clock and voltage domains
>   - integration with dynamic clock(and voltage) management
>      (clock/voltage framework)


That's one of the simple parts of the concept.  There aren't any 
runtime operating
point creation.  It's one of the things I like best about cpufreq,  the 
frequency
and voltages are taken from the hardware vendor data sheet and 
validated.

The user just gets to use the operating points supported by the system, 
not
choose the frequency or voltage to transition to.

By just presenting the supported operating points to the user it 
removes the
need for new APIs.  The user just reads the supported operating points
and decides the best use of the supported operating points.


>
> 2) interface (kernel as well as userspace(sysfs)) for the rest of power
>    parameters except cpu voltage and frequency


The /sys/power/supported_states file shows the supported operating 
points
and their parameters.

The platform specific information is hidden through the md_data pointer,
which in the case of embedded systems with complex clocking schemes,
contains the clock divisor and multiplier information that the system 
needs
to perform frequency and voltage scaling and clock manipulation.

The machine dependent portion of a centrino operating point
is only the perfctl msr bits for each frequency/voltage.  For
a system with 5 power domains and various clocks the
machine dependent portion contains the whole array
of information for the different power domains and their clocks.

>
> 3) per platform nature of an operating point rather than per
>    a pm control layer (cpufreq for ex.):
>    - you have cpu freq and voltage defined in common code
>       while it's still possible that on a certain platform one would
>       not be interested in control of these parameters

Correct, but on all of the hardware with which I'm familiar cpu 
frequency
and voltage are common components to power management.

The point the example is trying to make is that the different power
management infrastructures can be unified and simplified.

>   - it's cpufreq driver which allocates memory for operating
>     points in your patches. I should not duplicating the code if I'm
>     implementing another pm control layer instance (policy manager)
>     which is actually a plugin for pm framework and can share operating
>     points

The cpufreq driver doesn't allocate any memory for any powerop
operating points.   I leave the cpufreq driver alone.  I make powerop
operating points that mirror the cpufreq table and link them
into the other supported operating points of the system.

I've purposely left the cpufreq code unchanged.


>   - most probably all operating points [at least] of the same type (
>     sleeping or frequency) would have the same transition
>     call back and since this it seems transition callback might be
>     platform specific thing

The prepare_transition, transition, and finish transistion are all
platform dependent.  That's why they are function pointers
in the powerop struct.

Each platform would define its own functions and different types
of operating points might have different functions.  The powerop struct
has the correct prepare_trransition, transition, finish_transition 
routines for the
specific platform and operating point.

>   - assuming several instances on pm control layer (several
>      policy managers) what would be the code which is
>      responsible for accessing hardware to handle a certain
>      policy manager decision?

I'm sorrry,  I don't understand that question.

> With the current approach you
>     would need to duplicate code of pre/transition/finish
>     routines per a pm control layer instance

If I understand correctly, you wouldn't.   For the centrino example I 
used
the different frequency operating points all share the same prepare, 
transition, and
finish routines.

> 4) you introduced second sysfs interface for cpufreq and
>    have not removed original one. Do you expect cpufreq
>    sysfs interface to be changed from original one?

No.  I've left the cpufreq code and interfaces alone for my example.
The resultant example can perform the same cpu frequency scaling 
operations through
either interface.

David
>
> Thanks,
> Eugeny
>> and
>> 	b) simplified for both kernel and user space.
>>
>> 	The Documentation/powerop.txt included in the powerop-core.patch
>> tries to describe what the patch is attempting to do and how it works.
>>
>> David
>>
>>
>> On Jul 28, 2006, at 5:45 PM, Greg KH wrote:
>>
>>
>>> On Fri, Jul 28, 2006 at 05:38:11PM -0700, david singleton wrote:
>>>
>>>> On Jul 28, 2006, at 4:38 PM, Greg KH wrote:
>>>>
>>>>
>>>>> On Fri, Jul 28, 2006 at 03:31:41PM -0700, david singleton wrote:
>>>>>
>>>>>> Here is a patch that implements a version of the PowerOp concept.
>>>>>>
>>>>> Any chance of breaking this up into logical patches that do one 
>>>>> thing
>>>>> at
>>>>> a time so it can be reviewed better?
>>>>>
>>>>> thanks,
>>>>>
>>>>> greg k-h
>>>>>
>>>>>
>>>> Here's powerop-core.patch,  powerop-cpufreq.patch and
>>>> powerop-x86-centrino.patch.
>>>>
>>> Um, no, that's not how kernel patches are submitted.  How about one 
>>> per
>>> email, with a description of what they do, inline so we can quote 
>>> them
>>> in a message (and actually read them in the original message...)
>>>
>>> See patches posted here by others as examples of what is expected, 
>>> and
>>> see Documentation/SubmittingPatches for more details.
>>>
>>> thanks,
>>>
>>> greg k-h
>>>
>>
>> _______________________________________________
>> linux-pm mailing list
>> linux-pm@lists.osdl.org
>> https://lists.osdl.org/mailman/listinfo/linux-pm
>>
>>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-07-30  4:43           ` david singleton
@ 2006-07-30 11:02             ` Vitaly Wool
  2006-08-01  0:59               ` david singleton
  2006-08-01 18:02               ` Tim Bird
  2006-08-06 22:05             ` Pavel Machek
  1 sibling, 2 replies; 45+ messages in thread
From: Vitaly Wool @ 2006-07-30 11:02 UTC (permalink / raw)
  To: david singleton; +Cc: David Singleton, linux-pm

David,

On 7/30/06, david singleton <dsingleton@mvista.com> wrote:

> That's one of the simple parts of the concept.  There aren't any
> runtime operating
> point creation.  It's one of the things I like best about cpufreq,  the
> frequency
> and voltages are taken from the hardware vendor data sheet and
> validated.
>
> The user just gets to use the operating points supported by the system,
> not
> choose the frequency or voltage to transition to.
>
> By just presenting the supported operating points to the user it
> removes the
> need for new APIs.  The user just reads the supported operating points
> and decides the best use of the supported operating points.

I see this approach as fundamentally wrong at least because it will
produce very long and hard to manage lists of operating points.
Suppose you have 20 hardware vendor approved core CPU frequency
values, 3 possible voltage values and 10 approved DSP CPU frequency
values (which are derived from the other PLL). Not too impossible is
that almost all combinations are available which makes is almost 600
operating points. I find it absolutely unreal that anyone enters all
that stuff without mistakes; managing those lists/searching thru them
will take significant time which will slow down the state transitions;
and, finally, it's gonna increase the kernel footprint  quite a bit.

It looks to me that the concept that the kernel can implement
rules/restrictions for operating points but shouldn't define them with
possible exception for the most essential ones far better suits both
embedded and non-embedded use cases.

> > 2) interface (kernel as well as userspace(sysfs)) for the rest of power
> >    parameters except cpu voltage and frequency
>
>
> The /sys/power/supported_states file shows the supported operating
> points
> and their parameters.
>
> The platform specific information is hidden through the md_data pointer,
> which in the case of embedded systems with complex clocking schemes,
> contains the clock divisor and multiplier information that the system
> needs
> to perform frequency and voltage scaling and clock manipulation.
>
> The machine dependent portion of a centrino operating point
> is only the perfctl msr bits for each frequency/voltage.  For
> a system with 5 power domains and various clocks the
> machine dependent portion contains the whole array
> of information for the different power domains and their clocks.

Basically I don't see too much sense in your definition of
PM_FREQ_CHANGE and PM_VOLT_CHANGE. The latter one just isn't used
anywhere although the voltage differs between the operating points for
your centrino example. And it's quite a common thing when frequency
and voltage are changed within the same transition; so those either
should be bitfields or something like PM_STATE_CHANGE.

> >
> > 3) per platform nature of an operating point rather than per
> >    a pm control layer (cpufreq for ex.):
> >    - you have cpu freq and voltage defined in common code
> >       while it's still possible that on a certain platform one would
> >       not be interested in control of these parameters
>
> Correct, but on all of the hardware with which I'm familiar cpu
> frequency
> and voltage are common components to power management.

I do agree, but there might be different voltages and different CPU
frequencies within the same SoC, so it will mean that you separate,
say, two CPU frequencies between common code and SoC-specific code.
Maybe it's still the way to go, but it makes things quite complicated
to understand from scratch.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-07-29 22:09         ` Greg KH
@ 2006-08-01  0:36           ` david singleton
  2006-08-01  1:27           ` david singleton
  1 sibling, 0 replies; 45+ messages in thread
From: david singleton @ 2006-08-01  0:36 UTC (permalink / raw)
  To: Greg KH; +Cc: David Singleton, linux-pm


On Jul 29, 2006, at 3:09 PM, Greg KH wrote:

> On Fri, Jul 28, 2006 at 10:12:26PM -0700, david singleton wrote:
>>
>> Greg,
>> 	perhaps I need to back up a bit.  I wasn't submitting these patches
>> for inclusion into Linux.  I was presenting them to the people
>> discussing
>> how power management might evolve in Linux.
>
> That's fine, but you should at least post them so we can read and
> comment on them from within our email clients then :)
>
> And also by breaking up the patches into small logical changes, it 
> helps
> us to understand what they are doing and would allow us to evaluate 
> them
> much better.
>
> thanks,
>
> greg k-h
>

        Greg,
                 these emails describe a concept called power op that
         shows how the various power management infrastructures in
         Linux could be unified and simplified.  This email describes
         the concept of what power op is trying to do and how it works.

                 The following emails break a prototype patch into
         three logical parts.  The first patch is the powerop-core.patch
         that adds support for an operating point in the standard linux
         power management infrastructure (CONFIG_PM) and adds a new
         function to perform transitioning to operating points other
         than suspend to memory or disk.

                 The second patch adds the cpufreq portion of the patch
         that makes cpufreq table a set of operating points.

                 The third patch adds some operating points for
         the centrino-speedstep (the notebook I'm doing the prototype on)
         and a centrino specific transition function.

                 The goal of these patches is not to show how the
         concept of supported operating points should be implemented,
         rather its intended to show that they can be unified and
         greatly simplified.

                 All the power management infrastructures operate on
         what can be conceptualized as an operating point.  Once they
         get down to the point where they start controlling power, 
frequency
         or voltage, they are transitioning from one operating point
         to another.  They all also follow the same steps to transition
         from one operating point to another.

                 Powerop uses the traditional power management interface
         of /sys/power/state but changes it a bit. When it's read it 
shows
         the current operating point of the system.  Powerop uses a 
second
         sysfs file to present the list of supported operating points, 
their
         name, frequency, voltage and description,  to the user.

                 The user sets the operating point of the system by 
writing
         the name of the operating point into the /sys/power/state file.

                 The goal of powerop is to both unify and simplify the
         different power management infrastructures.  It does this
         by treating all supported states the system can be in as
         different operating points that can be simply transitioned
         to via the name of the operating point.

                 It simplifies power management by following the CPUFREQ
         concept of a vendor specified and validated table of supported
         operating points.  In CPUFREQ the user is not allowed to
         set the frequency or voltage of an operating point that might
         hang the system, either inadverdantly or maliciously.

                 It also simplifies the code  by putting all the 
decisions
         about when to transition to a new operating point in the
         hands of the user.  With this simple example patch the user
         can manually switch between all the supported states of the
         centrino speedstep.


                 The powerop-core.patch has more details in the
         Documentation/power/power.txt file.


David

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-07-30 11:02             ` Vitaly Wool
@ 2006-08-01  0:59               ` david singleton
  2006-08-01 10:09                 ` Matthew Locke
  2006-08-01 12:23                 ` Vitaly Wool
  2006-08-01 18:02               ` Tim Bird
  1 sibling, 2 replies; 45+ messages in thread
From: david singleton @ 2006-08-01  0:59 UTC (permalink / raw)
  To: Vitaly Wool; +Cc: David Singleton, linux-pm


On Jul 30, 2006, at 4:02 AM, Vitaly Wool wrote:

> David,
>
> On 7/30/06, david singleton <dsingleton@mvista.com> wrote:
>
>> That's one of the simple parts of the concept.  There aren't any
>> runtime operating
>> point creation.  It's one of the things I like best about cpufreq,  
>> the
>> frequency
>> and voltages are taken from the hardware vendor data sheet and
>> validated.
>>
>> The user just gets to use the operating points supported by the 
>> system,
>> not
>> choose the frequency or voltage to transition to.
>>
>> By just presenting the supported operating points to the user it
>> removes the
>> need for new APIs.  The user just reads the supported operating points
>> and decides the best use of the supported operating points.
>
> I see this approach as fundamentally wrong at least because it will
> produce very long and hard to manage lists of operating points.
> Suppose you have 20 hardware vendor approved core CPU frequency
> values, 3 possible voltage values and 10 approved DSP CPU frequency
> values (which are derived from the other PLL). Not too impossible is
> that almost all combinations are available which makes is almost 600
> operating points. I find it absolutely unreal that anyone enters all
> that stuff without mistakes; managing those lists/searching thru them
> will take significant time which will slow down the state transitions;
> and, finally, it's gonna increase the kernel footprint  quite a bit.

Actually in practice there aren't that many supported operating
points, even on the hardware you and I are familiar with.  I've yet
to construct a case where there are more than 16 to 20
operating points.

And the Linux device model allows the system to be set at
a particular operating point and then suspending the LCD
or unused USB if so desired.  So the combination flexibility
is still available.

If there were 600 supported operating points that would be a
very good reason to use PowerOp.   I'm not sure I'd want
the user passing all the frequencies, voltages, clock
divisor and clock multiplier for all those operating points.

List manipulation takes place at compile time and list traversal
is simple.  If a powerop were to become a kobject management
and traversal would still be simple.

The foot print actually shrinks if you take into account all the
class, policy and governor code that wouldn't be needed if
all supported states were simple operating points.

>
> It looks to me that the concept that the kernel can implement
> rules/restrictions for operating points but shouldn't define them with
> possible exception for the most essential ones far better suits both
> embedded and non-embedded use cases.

CPUFREQ shows that it can, and I believe should, define the operating
points the system supports.  CPUFREQ does NOT let the user pass
frequency or voltage values into the kernel.  It shows the hardware
vendor certified and validated frequencies and voltages.

I really like that concept.  It simplifies things greatly.

>
>> > 2) interface (kernel as well as userspace(sysfs)) for the rest of 
>> power
>> >    parameters except cpu voltage and frequency
>>
>>
>> The /sys/power/supported_states file shows the supported operating
>> points
>> and their parameters.
>>
>> The platform specific information is hidden through the md_data 
>> pointer,
>> which in the case of embedded systems with complex clocking schemes,
>> contains the clock divisor and multiplier information that the system
>> needs
>> to perform frequency and voltage scaling and clock manipulation.
>>
>> The machine dependent portion of a centrino operating point
>> is only the perfctl msr bits for each frequency/voltage.  For
>> a system with 5 power domains and various clocks the
>> machine dependent portion contains the whole array
>> of information for the different power domains and their clocks.
>
> Basically I don't see too much sense in your definition of
> PM_FREQ_CHANGE and PM_VOLT_CHANGE. The latter one just isn't used
> anywhere although the voltage differs between the operating points for
> your centrino example. And it's quite a common thing when frequency
> and voltage are changed within the same transition; so those either
> should be bitfields or something like PM_STATE_CHANGE.


The example patch isn't provided to show how it should be implemented.

I've added a separate PowerOp state of PM_VOLT_CHANGE for
hardware that may be changing states by changing a voltage rather
than having the voltage changed as a side effect of changing the
frequency explicitly.

>
>> >
>> > 3) per platform nature of an operating point rather than per
>> >    a pm control layer (cpufreq for ex.):
>> >    - you have cpu freq and voltage defined in common code
>> >       while it's still possible that on a certain platform one would
>> >       not be interested in control of these parameters
>>
>> Correct, but on all of the hardware with which I'm familiar cpu
>> frequency
>> and voltage are common components to power management.
>
> I do agree, but there might be different voltages and different CPU
> frequencies within the same SoC, so it will mean that you separate,
> say, two CPU frequencies between common code and SoC-specific code.
> Maybe it's still the way to go, but it makes things quite complicated
> to understand from scratch.
>

After digging through all the PM,  CPUFREQ and Dynamic Power Management
code it became apparent that when they get down to touching hardware
they are just dealing with an operating point.  And they all change from
one opeating point to another in the same manner.

Once you view all the states a system can be in as an operating point, 
wether
its a suspend or frequency change,  things get much simpler.

And

David

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-07-29 22:09         ` Greg KH
  2006-08-01  0:36           ` david singleton
@ 2006-08-01  1:27           ` david singleton
  1 sibling, 0 replies; 45+ messages in thread
From: david singleton @ 2006-08-01  1:27 UTC (permalink / raw)
  To: Greg KH; +Cc: David Singleton, linux-pm


On Jul 29, 2006, at 3:09 PM, Greg KH wrote:

> On Fri, Jul 28, 2006 at 10:12:26PM -0700, david singleton wrote:
>>
>> Greg,
>> 	perhaps I need to back up a bit.  I wasn't submitting these patches
>> for inclusion into Linux.  I was presenting them to the people
>> discussing
>> how power management might evolve in Linux.
>
> That's fine, but you should at least post them so we can read and
> comment on them from within our email clients then :)
>
> And also by breaking up the patches into small logical changes, it 
> helps
> us to understand what they are doing and would allow us to evaluate 
> them
> much better.
>
> thanks,
>
> greg k-h
>

        Greg,
                 the three patches I just sent,  meant to send this 
first,
         describe a concept called power op that
         shows how the various power management infrastructures in
         Linux could be unified and simplified.  This email describes
         the concept of what power op is trying to do and how it works.

                 The following emails break a prototype patch into
         three logical parts.  The first patch is the powerop-core.patch
         that adds support for an operating point in the standard linux
         power management infrastructure (CONFIG_PM) and adds a new
         function to perform transitioning to operating points other
         than suspend to memory or disk.

                 The second patch adds the cpufreq portion of the patch
         that makes cpufreq table a set of operating points.

                 The third patch adds some operating points for
         the centrino-speedstep (the notebook I'm doing the prototype on)
         and a centrino specific transition function.

                 The goal of these patches is not to show how the
         concept of supported operating points should be implemented,
         rather its intended to show that they can be unified and
         greatly simplified.

                 All the power management infrastructures operate on
         what can be conceptualized as an operating point.  Once they
         get down to the point where they start controlling power, 
frequency
         or voltage, they are transitioning from one operating point
         to another.  They all also follow the same steps to transition
         from one operating point to another.

                 Powerop uses the traditional power management interface
         of /sys/power/state but changes it a bit. When it's read it 
shows
         the current operating point of the system.  Powerop uses a 
second
         sysfs file to present the list of supported operating points, 
their
         name, frequency, voltage and description,  to the user.

                 The user sets the operating point of the system by 
writing
         the name of the operating point into the /sys/power/state file.

                 The goal of powerop is to both unify and simplify the
         different power management infrastructures.  It does this
         by treating all supported states the system can be in as
         different operating points that can be simply transitioned
         to via the name of the operating point.

                 It simplifies power management by following the CPUFREQ
         concept of a vendor specified and validated table of supported
         operating points.  In CPUFREQ the user is not allowed to
         set the frequency or voltage of an operating point that might
         hang the system, either inadverdantly or maliciously.

                 It also simplifies the code  by putting all the 
decisions
         about when to transition to a new operating point in the
         hands of the user.  With this simple example patch the user
         can manually switch between all the supported states of the
         centrino speedstep.


                 The powerop-core.patch has more details in the
         Documentation/power/power.txt file.


David

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-01  0:59               ` david singleton
@ 2006-08-01 10:09                 ` Matthew Locke
  2006-08-01 10:22                   ` Matthew Locke
  2006-08-01 18:31                   ` david singleton
  2006-08-01 12:23                 ` Vitaly Wool
  1 sibling, 2 replies; 45+ messages in thread
From: Matthew Locke @ 2006-08-01 10:09 UTC (permalink / raw)
  To: david singleton; +Cc: David Singleton, linux-pm


On Jul 31, 2006, at 5:59 PM, david singleton wrote:

>
> On Jul 30, 2006, at 4:02 AM, Vitaly Wool wrote:
>
>> David,
>>
>> On 7/30/06, david singleton <dsingleton@mvista.com> wrote:
>>
>>> That's one of the simple parts of the concept.  There aren't any
>>> runtime operating
>>> point creation.  It's one of the things I like best about cpufreq,
>>> the
>>> frequency
>>> and voltages are taken from the hardware vendor data sheet and
>>> validated.
>>>
>>> The user just gets to use the operating points supported by the
>>> system,
>>> not
>>> choose the frequency or voltage to transition to.
>>>
>>> By just presenting the supported operating points to the user it
>>> removes the
>>> need for new APIs.  The user just reads the supported operating 
>>> points
>>> and decides the best use of the supported operating points.
>>
>> I see this approach as fundamentally wrong at least because it will
>> produce very long and hard to manage lists of operating points.
>> Suppose you have 20 hardware vendor approved core CPU frequency
>> values, 3 possible voltage values and 10 approved DSP CPU frequency
>> values (which are derived from the other PLL). Not too impossible is
>> that almost all combinations are available which makes is almost 600
>> operating points. I find it absolutely unreal that anyone enters all
>> that stuff without mistakes; managing those lists/searching thru them
>> will take significant time which will slow down the state transitions;
>> and, finally, it's gonna increase the kernel footprint  quite a bit.
>
> Actually in practice there aren't that many supported operating
> points, even on the hardware you and I are familiar with.  I've yet
> to construct a case where there are more than 16 to 20
> operating points.

Its not the number of operating points driving the need for run time 
creation.  Please read the thread that took place early last week on 
this topic.  Start from my post here: 
http://lists.osdl.org/pipermail/linux-pm/2006-July/003065.html and read 
backwards.

Its really the embedded device development and silicon vendor model 
driving it.  Run time creation is required and enabling run time 
creation doesn't prevent some architectures/board ports from hard 
coding their points.

>
> And the Linux device model allows the system to be set at
> a particular operating point and then suspending the LCD
> or unused USB if so desired.  So the combination flexibility
> is still available.
>
> If there were 600 supported operating points that would be a
> very good reason to use PowerOp.   I'm not sure I'd want
> the user passing all the frequencies, voltages, clock
> divisor and clock multiplier for all those operating points.

Well, no one is suggesting a user define and install that info.  
Operating point creation will be done by someone who understands the 
system (system designer) regardless of the method used to get the 
operating points in the kernel.

>
> List manipulation takes place at compile time and list traversal
> is simple.  If a powerop were to become a kobject management
> and traversal would still be simple.
>
> The foot print actually shrinks if you take into account all the
> class, policy and governor code that wouldn't be needed if
> all supported states were simple operating points.
>
>>
>> It looks to me that the concept that the kernel can implement
>> rules/restrictions for operating points but shouldn't define them with
>> possible exception for the most essential ones far better suits both
>> embedded and non-embedded use cases.
>
> CPUFREQ shows that it can, and I believe should, define the operating
> points the system supports.  CPUFREQ does NOT let the user pass
> frequency or voltage values into the kernel.  It shows the hardware
> vendor certified and validated frequencies and voltages.
>
> I really like that concept.  It simplifies things greatly.
>
>>
>>>> 2) interface (kernel as well as userspace(sysfs)) for the rest of
>>> power
>>>>    parameters except cpu voltage and frequency
>>>
>>>
>>> The /sys/power/supported_states file shows the supported operating
>>> points
>>> and their parameters.
>>>
>>> The platform specific information is hidden through the md_data
>>> pointer,
>>> which in the case of embedded systems with complex clocking schemes,
>>> contains the clock divisor and multiplier information that the system
>>> needs
>>> to perform frequency and voltage scaling and clock manipulation.
>>>
>>> The machine dependent portion of a centrino operating point
>>> is only the perfctl msr bits for each frequency/voltage.  For
>>> a system with 5 power domains and various clocks the
>>> machine dependent portion contains the whole array
>>> of information for the different power domains and their clocks.
>>
>> Basically I don't see too much sense in your definition of
>> PM_FREQ_CHANGE and PM_VOLT_CHANGE. The latter one just isn't used
>> anywhere although the voltage differs between the operating points for
>> your centrino example. And it's quite a common thing when frequency
>> and voltage are changed within the same transition; so those either
>> should be bitfields or something like PM_STATE_CHANGE.
>
>
> The example patch isn't provided to show how it should be implemented.
>
> I've added a separate PowerOp state of PM_VOLT_CHANGE for
> hardware that may be changing states by changing a voltage rather
> than having the voltage changed as a side effect of changing the
> frequency explicitly.
>
>>
>>>>
>>>> 3) per platform nature of an operating point rather than per
>>>>    a pm control layer (cpufreq for ex.):
>>>>    - you have cpu freq and voltage defined in common code
>>>>       while it's still possible that on a certain platform one would
>>>>       not be interested in control of these parameters
>>>
>>> Correct, but on all of the hardware with which I'm familiar cpu
>>> frequency
>>> and voltage are common components to power management.
>>
>> I do agree, but there might be different voltages and different CPU
>> frequencies within the same SoC, so it will mean that you separate,
>> say, two CPU frequencies between common code and SoC-specific code.
>> Maybe it's still the way to go, but it makes things quite complicated
>> to understand from scratch.
>>
>
> After digging through all the PM,  CPUFREQ and Dynamic Power Management
> code it became apparent that when they get down to touching hardware
> they are just dealing with an operating point.  And they all change 
> from
> one opeating point to another in the same manner.
>
> Once you view all the states a system can be in as an operating point,
> wether
> its a suspend or frequency change,  things get much simpler.


> And
>
> David
>
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
@ 2006-08-01 10:09 Matthew Locke
  0 siblings, 0 replies; 45+ messages in thread
From: Matthew Locke @ 2006-08-01 10:09 UTC (permalink / raw)
  To: david singleton; +Cc: David Singleton, linux-pm


On Jul 31, 2006, at 5:59 PM, david singleton wrote:

>
> On Jul 30, 2006, at 4:02 AM, Vitaly Wool wrote:
>
>> David,
>>
>> On 7/30/06, david singleton <dsingleton@mvista.com> wrote:
>>
>>> That's one of the simple parts of the concept.  There aren't any
>>> runtime operating
>>> point creation.  It's one of the things I like best about cpufreq,
>>> the
>>> frequency
>>> and voltages are taken from the hardware vendor data sheet and
>>> validated.
>>>
>>> The user just gets to use the operating points supported by the
>>> system,
>>> not
>>> choose the frequency or voltage to transition to.
>>>
>>> By just presenting the supported operating points to the user it
>>> removes the
>>> need for new APIs.  The user just reads the supported operating 
>>> points
>>> and decides the best use of the supported operating points.
>>
>> I see this approach as fundamentally wrong at least because it will
>> produce very long and hard to manage lists of operating points.
>> Suppose you have 20 hardware vendor approved core CPU frequency
>> values, 3 possible voltage values and 10 approved DSP CPU frequency
>> values (which are derived from the other PLL). Not too impossible is
>> that almost all combinations are available which makes is almost 600
>> operating points. I find it absolutely unreal that anyone enters all
>> that stuff without mistakes; managing those lists/searching thru them
>> will take significant time which will slow down the state transitions;
>> and, finally, it's gonna increase the kernel footprint  quite a bit.
>
> Actually in practice there aren't that many supported operating
> points, even on the hardware you and I are familiar with.  I've yet
> to construct a case where there are more than 16 to 20
> operating points.

Its not the number of operating points driving the need for run time 
creation.  Please read the thread that took place early last week on 
this topic.  Start from my post here: 
http://lists.osdl.org/pipermail/linux-pm/2006-July/003065.html and read 
backwards.

Its really the embedded device development and silicon vendor model 
driving it.  Run time creation is required and enabling run time 
creation doesn't prevent some architectures/board ports from hard 
coding their points.

>
> And the Linux device model allows the system to be set at
> a particular operating point and then suspending the LCD
> or unused USB if so desired.  So the combination flexibility
> is still available.
>
> If there were 600 supported operating points that would be a
> very good reason to use PowerOp.   I'm not sure I'd want
> the user passing all the frequencies, voltages, clock
> divisor and clock multiplier for all those operating points.

Well, no one is suggesting a user define and install that info.  
Operating point creation will be done by someone who understands the 
system (system designer) regardless of the method used to get the 
operating points in the kernel.

>
> List manipulation takes place at compile time and list traversal
> is simple.  If a powerop were to become a kobject management
> and traversal would still be simple.
>
> The foot print actually shrinks if you take into account all the
> class, policy and governor code that wouldn't be needed if
> all supported states were simple operating points.
>
>>
>> It looks to me that the concept that the kernel can implement
>> rules/restrictions for operating points but shouldn't define them with
>> possible exception for the most essential ones far better suits both
>> embedded and non-embedded use cases.
>
> CPUFREQ shows that it can, and I believe should, define the operating
> points the system supports.  CPUFREQ does NOT let the user pass
> frequency or voltage values into the kernel.  It shows the hardware
> vendor certified and validated frequencies and voltages.
>
> I really like that concept.  It simplifies things greatly.
>
>>
>>>> 2) interface (kernel as well as userspace(sysfs)) for the rest of
>>> power
>>>>    parameters except cpu voltage and frequency
>>>
>>>
>>> The /sys/power/supported_states file shows the supported operating
>>> points
>>> and their parameters.
>>>
>>> The platform specific information is hidden through the md_data
>>> pointer,
>>> which in the case of embedded systems with complex clocking schemes,
>>> contains the clock divisor and multiplier information that the system
>>> needs
>>> to perform frequency and voltage scaling and clock manipulation.
>>>
>>> The machine dependent portion of a centrino operating point
>>> is only the perfctl msr bits for each frequency/voltage.  For
>>> a system with 5 power domains and various clocks the
>>> machine dependent portion contains the whole array
>>> of information for the different power domains and their clocks.
>>
>> Basically I don't see too much sense in your definition of
>> PM_FREQ_CHANGE and PM_VOLT_CHANGE. The latter one just isn't used
>> anywhere although the voltage differs between the operating points for
>> your centrino example. And it's quite a common thing when frequency
>> and voltage are changed within the same transition; so those either
>> should be bitfields or something like PM_STATE_CHANGE.
>
>
> The example patch isn't provided to show how it should be implemented.
>
> I've added a separate PowerOp state of PM_VOLT_CHANGE for
> hardware that may be changing states by changing a voltage rather
> than having the voltage changed as a side effect of changing the
> frequency explicitly.
>
>>
>>>>
>>>> 3) per platform nature of an operating point rather than per
>>>>    a pm control layer (cpufreq for ex.):
>>>>    - you have cpu freq and voltage defined in common code
>>>>       while it's still possible that on a certain platform one would
>>>>       not be interested in control of these parameters
>>>
>>> Correct, but on all of the hardware with which I'm familiar cpu
>>> frequency
>>> and voltage are common components to power management.
>>
>> I do agree, but there might be different voltages and different CPU
>> frequencies within the same SoC, so it will mean that you separate,
>> say, two CPU frequencies between common code and SoC-specific code.
>> Maybe it's still the way to go, but it makes things quite complicated
>> to understand from scratch.
>>
>
> After digging through all the PM,  CPUFREQ and Dynamic Power Management
> code it became apparent that when they get down to touching hardware
> they are just dealing with an operating point.  And they all change 
> from
> one opeating point to another in the same manner.
>
> Once you view all the states a system can be in as an operating point,
> wether
> its a suspend or frequency change,  things get much simpler.


> And
>
> David
>
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-01 10:09                 ` Matthew Locke
@ 2006-08-01 10:22                   ` Matthew Locke
  2006-08-01 18:31                   ` david singleton
  1 sibling, 0 replies; 45+ messages in thread
From: Matthew Locke @ 2006-08-01 10:22 UTC (permalink / raw)
  To: david singleton; +Cc: David Singleton, linux-pm


On Aug 1, 2006, at 3:09 AM, Matthew Locke wrote:

>
> On Jul 31, 2006, at 5:59 PM, david singleton wrote:
>
>>
>> On Jul 30, 2006, at 4:02 AM, Vitaly Wool wrote:
>>
>>> David,
>>>
>>> On 7/30/06, david singleton <dsingleton@mvista.com> wrote:
>>>
>>>> That's one of the simple parts of the concept.  There aren't any
>>>> runtime operating
>>>> point creation.  It's one of the things I like best about cpufreq,
>>>> the
>>>> frequency
>>>> and voltages are taken from the hardware vendor data sheet and
>>>> validated.
>>>>
>>>> The user just gets to use the operating points supported by the
>>>> system,
>>>> not
>>>> choose the frequency or voltage to transition to.
>>>>
>>>> By just presenting the supported operating points to the user it
>>>> removes the
>>>> need for new APIs.  The user just reads the supported operating
>>>> points
>>>> and decides the best use of the supported operating points.
>>>
>>> I see this approach as fundamentally wrong at least because it will
>>> produce very long and hard to manage lists of operating points.
>>> Suppose you have 20 hardware vendor approved core CPU frequency
>>> values, 3 possible voltage values and 10 approved DSP CPU frequency
>>> values (which are derived from the other PLL). Not too impossible is
>>> that almost all combinations are available which makes is almost 600
>>> operating points. I find it absolutely unreal that anyone enters all
>>> that stuff without mistakes; managing those lists/searching thru them
>>> will take significant time which will slow down the state 
>>> transitions;
>>> and, finally, it's gonna increase the kernel footprint  quite a bit.
>>
>> Actually in practice there aren't that many supported operating
>> points, even on the hardware you and I are familiar with.  I've yet
>> to construct a case where there are more than 16 to 20
>> operating points.
>
> Its not the number of operating points driving the need for run time
> creation.  Please read the thread that took place early last week on
> this topic.  Start from my post here:
> http://lists.osdl.org/pipermail/linux-pm/2006-July/003065.html and read
> backwards.
>
> Its really the embedded device development and silicon vendor model
> driving it.  Run time creation is required and enabling run time
> creation doesn't prevent some architectures/board ports from hard
> coding their points.
>
>>
>> And the Linux device model allows the system to be set at
>> a particular operating point and then suspending the LCD
>> or unused USB if so desired.  So the combination flexibility
>> is still available.
>>
>> If there were 600 supported operating points that would be a
>> very good reason to use PowerOp.   I'm not sure I'd want
>> the user passing all the frequencies, voltages, clock
>> divisor and clock multiplier for all those operating points.
>
> Well, no one is suggesting a user define and install that info.
> Operating point creation will be done by someone who understands the
> system (system designer) regardless of the method used to get the
> operating points in the kernel.
>
>>
>> List manipulation takes place at compile time and list traversal
>> is simple.  If a powerop were to become a kobject management
>> and traversal would still be simple.
>>
>> The foot print actually shrinks if you take into account all the
>> class, policy and governor code that wouldn't be needed if
>> all supported states were simple operating points.
>>
>>>
>>> It looks to me that the concept that the kernel can implement
>>> rules/restrictions for operating points but shouldn't define them 
>>> with
>>> possible exception for the most essential ones far better suits both
>>> embedded and non-embedded use cases.
>>
>> CPUFREQ shows that it can, and I believe should, define the operating
>> points the system supports.  CPUFREQ does NOT let the user pass
>> frequency or voltage values into the kernel.  It shows the hardware
>> vendor certified and validated frequencies and voltages.
>>
>> I really like that concept.  It simplifies things greatly.
>>
>>>
>>>>> 2) interface (kernel as well as userspace(sysfs)) for the rest of
>>>> power
>>>>>    parameters except cpu voltage and frequency
>>>>
>>>>
>>>> The /sys/power/supported_states file shows the supported operating
>>>> points
>>>> and their parameters.
>>>>
>>>> The platform specific information is hidden through the md_data
>>>> pointer,
>>>> which in the case of embedded systems with complex clocking schemes,
>>>> contains the clock divisor and multiplier information that the 
>>>> system
>>>> needs
>>>> to perform frequency and voltage scaling and clock manipulation.
>>>>
>>>> The machine dependent portion of a centrino operating point
>>>> is only the perfctl msr bits for each frequency/voltage.  For
>>>> a system with 5 power domains and various clocks the
>>>> machine dependent portion contains the whole array
>>>> of information for the different power domains and their clocks.
>>>
>>> Basically I don't see too much sense in your definition of
>>> PM_FREQ_CHANGE and PM_VOLT_CHANGE. The latter one just isn't used
>>> anywhere although the voltage differs between the operating points 
>>> for
>>> your centrino example. And it's quite a common thing when frequency
>>> and voltage are changed within the same transition; so those either
>>> should be bitfields or something like PM_STATE_CHANGE.
>>
>>
>> The example patch isn't provided to show how it should be implemented.
>>
>> I've added a separate PowerOp state of PM_VOLT_CHANGE for
>> hardware that may be changing states by changing a voltage rather
>> than having the voltage changed as a side effect of changing the
>> frequency explicitly.
>>
>>>
>>>>>
>>>>> 3) per platform nature of an operating point rather than per
>>>>>    a pm control layer (cpufreq for ex.):
>>>>>    - you have cpu freq and voltage defined in common code
>>>>>       while it's still possible that on a certain platform one 
>>>>> would
>>>>>       not be interested in control of these parameters
>>>>
>>>> Correct, but on all of the hardware with which I'm familiar cpu
>>>> frequency
>>>> and voltage are common components to power management.
>>>
>>> I do agree, but there might be different voltages and different CPU
>>> frequencies within the same SoC, so it will mean that you separate,
>>> say, two CPU frequencies between common code and SoC-specific code.
>>> Maybe it's still the way to go, but it makes things quite complicated
>>> to understand from scratch.
>>>
>>
>> After digging through all the PM,  CPUFREQ and Dynamic Power 
>> Management
>> code it became apparent that when they get down to touching hardware
>> they are just dealing with an operating point.  And they all change
>> from
>> one opeating point to another in the same manner.
>>
>> Once you view all the states a system can be in as an operating point,
>> wether
>> its a suspend or frequency change,  things get much simpler.
>
>
>> And
>>
>> David
>>
>> _______________________________________________
>> linux-pm mailing list
>> linux-pm@lists.osdl.org
>> https://lists.osdl.org/mailman/listinfo/linux-pm
>>
>
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-01  0:59               ` david singleton
  2006-08-01 10:09                 ` Matthew Locke
@ 2006-08-01 12:23                 ` Vitaly Wool
  2006-08-01 18:25                   ` Tim Bird
  1 sibling, 1 reply; 45+ messages in thread
From: Vitaly Wool @ 2006-08-01 12:23 UTC (permalink / raw)
  To: david singleton; +Cc: David Singleton, linux-pm

On 8/1/06, david singleton <dsingleton@mvista.com> wrote:
> Actually in practice there aren't that many supported operating
> points, even on the hardware you and I are familiar with.  I've yet
> to construct a case where there are more than 16 to 20
> operating points.

If there's no way for runtime OP creation, you'll have to specify ALL
the _valid_ OPs otherwise you'll be artificially narrowing the
applicability down. I hope that's clear.
And the number of theoretically possible (ie valid) OPs might well be
more than 100 - at least on some PXA boards, on PNX4008, on some
high-end ARMv6 ones etc.

> And the Linux device model allows the system to be set at
> a particular operating point and then suspending the LCD
> or unused USB if so desired.  So the combination flexibility
> is still available.

I wasn't talking about that; I've constructed a (weird yet possible)
example that concerned only frequency/voltage changing, not turning
on/off the devices.

> If there were 600 supported operating points that would be a
> very good reason to use PowerOp.   I'm not sure I'd want
> the user passing all the frequencies, voltages, clock
> divisor and clock multiplier for all those operating points.

Within the given use case for a particular board, given the
possibility of runtime OP creation, there shouldn't be any need to
specify ALL the OPs.

> > It looks to me that the concept that the kernel can implement
> > rules/restrictions for operating points but shouldn't define them with
> > possible exception for the most essential ones far better suits both
> > embedded and non-embedded use cases.
> CPUFREQ shows that it can, and I believe should, define the operating
> points the system supports.  CPUFREQ does NOT let the user pass
> frequency or voltage values into the kernel.  It shows the hardware
> vendor certified and validated frequencies and voltages.

Again, IMO you should specify what OPs are valid to be able to
validate runtime-created OPs, but you shouldn't limit OP creation
kernel-wise.

Vitaly

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-07-30 11:02             ` Vitaly Wool
  2006-08-01  0:59               ` david singleton
@ 2006-08-01 18:02               ` Tim Bird
  1 sibling, 0 replies; 45+ messages in thread
From: Tim Bird @ 2006-08-01 18:02 UTC (permalink / raw)
  To: Vitaly Wool; +Cc: David Singleton, linux-pm, david singleton

Vitaly Wool wrote:
> On 7/30/06, david singleton <dsingleton@mvista.com> wrote:
   > By just presenting the supported operating points to the user it
>  > removes the
>  > need for new APIs.  The user just reads the supported operating points
>  > and decides the best use of the supported operating points.
> 
> I see this approach as fundamentally wrong at least because it will
> produce very long and hard to manage lists of operating points.
> Suppose you have 20 hardware vendor approved core CPU frequency
> values, 3 possible voltage values and 10 approved DSP CPU frequency
> values (which are derived from the other PLL). Not too impossible is
> that almost all combinations are available which makes is almost 600
> operating points. I find it absolutely unreal that anyone enters all
> that stuff without mistakes; managing those lists/searching thru them
> will take significant time which will slow down the state transitions;
> and, finally, it's gonna increase the kernel footprint  quite a bit.
> 
> It looks to me that the concept that the kernel can implement
> rules/restrictions for operating points but shouldn't define them with
> possible exception for the most essential ones far better suits both
> embedded and non-embedded use cases.

I don't think anyone expects that the list of operating points
for a machine would be comprehensive (ie every possible combination
of parameters that are known to work).  This hearkens back to the
bad old days of X11 configuration, where users were expected to
set their own dot clocks and mode lines.  While you can still do
this, giving yourself a resolution of, say, 1139 by 582, almost
everyone now just selects from a pre-defined set
of X resolutions.  I think the same is expected to be true of
power operating points.  There would be a set number of
operating points, which would derive from the vendor data
sheets, that would be smallish and not necessarily all-inclusive.

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Electronics
=============================

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-01 12:23                 ` Vitaly Wool
@ 2006-08-01 18:25                   ` Tim Bird
  0 siblings, 0 replies; 45+ messages in thread
From: Tim Bird @ 2006-08-01 18:25 UTC (permalink / raw)
  To: Vitaly Wool; +Cc: David Singleton, linux-pm, david singleton

Vitaly Wool wrote:
> Within the given use case for a particular board, given the
> possibility of runtime OP creation, there shouldn't be any need to
> specify ALL the OPs.
...
> Again, IMO you should specify what OPs are valid to be able to
> validate runtime-created OPs, but you shouldn't limit OP creation
> kernel-wise.

OK, I agree completely. Please ignore my last message.
I didn't understand your previous comment in the
context of runtime-created OPs.

Sorry,
  -- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Electronics
=============================

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-01 10:09                 ` Matthew Locke
  2006-08-01 10:22                   ` Matthew Locke
@ 2006-08-01 18:31                   ` david singleton
  2006-08-01 18:52                     ` Tim Bird
  1 sibling, 1 reply; 45+ messages in thread
From: david singleton @ 2006-08-01 18:31 UTC (permalink / raw)
  To: Matthew Locke; +Cc: David Singleton, linux-pm


On Aug 1, 2006, at 3:09 AM, Matthew Locke wrote:

>
> On Jul 31, 2006, at 5:59 PM, david singleton wrote:
>
>>
>> On Jul 30, 2006, at 4:02 AM, Vitaly Wool wrote:
>>
>>> David,
>>>
>>> On 7/30/06, david singleton <dsingleton@mvista.com> wrote:
>>>
>>>> That's one of the simple parts of the concept.  There aren't any
>>>> runtime operating
>>>> point creation.  It's one of the things I like best about cpufreq,
>>>> the
>>>> frequency
>>>> and voltages are taken from the hardware vendor data sheet and
>>>> validated.
>>>>
>>>> The user just gets to use the operating points supported by the
>>>> system,
>>>> not
>>>> choose the frequency or voltage to transition to.
>>>>
>>>> By just presenting the supported operating points to the user it
>>>> removes the
>>>> need for new APIs.  The user just reads the supported operating 
>>>> points
>>>> and decides the best use of the supported operating points.
>>>
>>> I see this approach as fundamentally wrong at least because it will
>>> produce very long and hard to manage lists of operating points.
>>> Suppose you have 20 hardware vendor approved core CPU frequency
>>> values, 3 possible voltage values and 10 approved DSP CPU frequency
>>> values (which are derived from the other PLL). Not too impossible is
>>> that almost all combinations are available which makes is almost 600
>>> operating points. I find it absolutely unreal that anyone enters all
>>> that stuff without mistakes; managing those lists/searching thru them
>>> will take significant time which will slow down the state 
>>> transitions;
>>> and, finally, it's gonna increase the kernel footprint  quite a bit.
>>
>> Actually in practice there aren't that many supported operating
>> points, even on the hardware you and I are familiar with.  I've yet
>> to construct a case where there are more than 16 to 20
>> operating points.
>
> Its not the number of operating points driving the need for run time 
> creation.  Please read the thread that took place early last week on 
> this topic.  Start from my post here: 
> http://lists.osdl.org/pipermail/linux-pm/2006-July/003065.html and 
> read backwards.
>
> Its really the embedded device development and silicon vendor model 
> driving it.  Run time creation is required and enabling run time 
> creation doesn't prevent some architectures/board ports from hard 
> coding their points.
>
>>
>> And the Linux device model allows the system to be set at
>> a particular operating point and then suspending the LCD
>> or unused USB if so desired.  So the combination flexibility
>> is still available.
>>
>> If there were 600 supported operating points that would be a
>> very good reason to use PowerOp.   I'm not sure I'd want
>> the user passing all the frequencies, voltages, clock
>> divisor and clock multiplier for all those operating points.
>
> Well, no one is suggesting a user define and install that info.  
> Operating point creation will be done by someone who understands the 
> system (system designer) regardless of the method used to get the 
> operating points in the kernel.
>


It sounds to me like they don't want to have to change kernel code and 
recompile the kernel
to get a new operating point.

It sounds like they are talking about a dynamic operating point as a 
loadable
module, which would fit perfectly with the PowerOp scheme, since it's 
the
system designer who would be creating the new  dynamic operating point,
not the user.

The point of PowerOp is that the system designer creates (and validates)
the operating points that the hardware vendor supports, not the user.

A system designer creating a new operating point as a loadable
module would satisfy this requirement, and the user would not
be able to put the system into an undefined state, either by accident
or maliciously.

That's the point of having the user just use the supported operating
points by setting the operating point name into the /sys/power/state 
file.

The next step I'm going to take is to start writing a PowerOp daemon
which can read and parse the supported operating points.  If I could
figure out some nifty, unified name space for operating points
then the PowerOp daemon could run on most systems right
out of the box.

It could have some simple rules about when to transition to different 
states,
e.g. if the battery were 50% or less, or the system has been 50% idle
for X seconds, etc.

David


>>
>> List manipulation takes place at compile time and list traversal
>> is simple.  If a powerop were to become a kobject management
>> and traversal would still be simple.
>>
>> The foot print actually shrinks if you take into account all the
>> class, policy and governor code that wouldn't be needed if
>> all supported states were simple operating points.
>>
>>>
>>> It looks to me that the concept that the kernel can implement
>>> rules/restrictions for operating points but shouldn't define them 
>>> with
>>> possible exception for the most essential ones far better suits both
>>> embedded and non-embedded use cases.
>>
>> CPUFREQ shows that it can, and I believe should, define the operating
>> points the system supports.  CPUFREQ does NOT let the user pass
>> frequency or voltage values into the kernel.  It shows the hardware
>> vendor certified and validated frequencies and voltages.
>>
>> I really like that concept.  It simplifies things greatly.
>>
>>>
>>>>> 2) interface (kernel as well as userspace(sysfs)) for the rest of
>>>> power
>>>>>    parameters except cpu voltage and frequency
>>>>
>>>>
>>>> The /sys/power/supported_states file shows the supported operating
>>>> points
>>>> and their parameters.
>>>>
>>>> The platform specific information is hidden through the md_data
>>>> pointer,
>>>> which in the case of embedded systems with complex clocking schemes,
>>>> contains the clock divisor and multiplier information that the 
>>>> system
>>>> needs
>>>> to perform frequency and voltage scaling and clock manipulation.
>>>>
>>>> The machine dependent portion of a centrino operating point
>>>> is only the perfctl msr bits for each frequency/voltage.  For
>>>> a system with 5 power domains and various clocks the
>>>> machine dependent portion contains the whole array
>>>> of information for the different power domains and their clocks.
>>>
>>> Basically I don't see too much sense in your definition of
>>> PM_FREQ_CHANGE and PM_VOLT_CHANGE. The latter one just isn't used
>>> anywhere although the voltage differs between the operating points 
>>> for
>>> your centrino example. And it's quite a common thing when frequency
>>> and voltage are changed within the same transition; so those either
>>> should be bitfields or something like PM_STATE_CHANGE.
>>
>>
>> The example patch isn't provided to show how it should be implemented.
>>
>> I've added a separate PowerOp state of PM_VOLT_CHANGE for
>> hardware that may be changing states by changing a voltage rather
>> than having the voltage changed as a side effect of changing the
>> frequency explicitly.
>>
>>>
>>>>>
>>>>> 3) per platform nature of an operating point rather than per
>>>>>    a pm control layer (cpufreq for ex.):
>>>>>    - you have cpu freq and voltage defined in common code
>>>>>       while it's still possible that on a certain platform one 
>>>>> would
>>>>>       not be interested in control of these parameters
>>>>
>>>> Correct, but on all of the hardware with which I'm familiar cpu
>>>> frequency
>>>> and voltage are common components to power management.
>>>
>>> I do agree, but there might be different voltages and different CPU
>>> frequencies within the same SoC, so it will mean that you separate,
>>> say, two CPU frequencies between common code and SoC-specific code.
>>> Maybe it's still the way to go, but it makes things quite complicated
>>> to understand from scratch.
>>>
>>
>> After digging through all the PM,  CPUFREQ and Dynamic Power 
>> Management
>> code it became apparent that when they get down to touching hardware
>> they are just dealing with an operating point.  And they all change 
>> from
>> one opeating point to another in the same manner.
>>
>> Once you view all the states a system can be in as an operating point,
>> wether
>> its a suspend or frequency change,  things get much simpler.
>
>
>> And
>>
>> David
>>
>> _______________________________________________
>> linux-pm mailing list
>> linux-pm@lists.osdl.org
>> https://lists.osdl.org/mailman/listinfo/linux-pm
>>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-01 18:31                   ` david singleton
@ 2006-08-01 18:52                     ` Tim Bird
  2006-08-01 18:59                       ` david singleton
                                         ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Tim Bird @ 2006-08-01 18:52 UTC (permalink / raw)
  To: david singleton; +Cc: David Singleton, linux-pm

david singleton wrote:
> On Aug 1, 2006, at 3:09 AM, Matthew Locke wrote:
>>Well, no one is suggesting a user define and install that info.  
>>Operating point creation will be done by someone who understands the 
>>system (system designer) regardless of the method used to get the 
>>operating points in the kernel.
> 
> It sounds to me like they don't want to have to change kernel code and 
> recompile the kernel
> to get a new operating point.
> 
> It sounds like they are talking about a dynamic operating point as a 
> loadable
> module, which would fit perfectly with the PowerOp scheme, since it's 
> the
> system designer who would be creating the new  dynamic operating point,
> not the user.

Often, in the embedded world, the person defining the operating
states will not be a kernel developer, and may not be comfortable
with, or capable of, creating a kernel module.  (There are
significant sections of the embedded space where modules are
not used at all, and no module support is compiled into the
kernel.)  In these cases, requiring loadable module support
for runtime OPs would be a problem.

> 
> The point of PowerOp is that the system designer creates (and validates)
> the operating points that the hardware vendor supports, not the user.
> 
> A system designer creating a new operating point as a loadable
> module would satisfy this requirement, and the user would not
> be able to put the system into an undefined state, either by accident
> or maliciously.
>

OK, I think I understand better your objection to user-space
created operating points.   In embedded projects, it is often
assumed that no one but the system designer has access to
arbitrary user space programs.  Hence, it sometimes doesn't
register that an end user could or would utilize a particular
interface, just because it existed.

Would not the normal Unix permissions system prevent the
"bad state" problem, in the non-embedded case?

  -- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Electronics
=============================

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-01 18:52                     ` Tim Bird
@ 2006-08-01 18:59                       ` david singleton
  2006-08-01 19:17                         ` Vitaly Wool
  2006-08-01 19:28                       ` Vitaly Wool
  2006-08-06 22:11                       ` Pavel Machek
  2 siblings, 1 reply; 45+ messages in thread
From: david singleton @ 2006-08-01 18:59 UTC (permalink / raw)
  To: Tim Bird; +Cc: David Singleton, linux-pm


On Aug 1, 2006, at 11:52 AM, Tim Bird wrote:

> david singleton wrote:
>> On Aug 1, 2006, at 3:09 AM, Matthew Locke wrote:
>>> Well, no one is suggesting a user define and install that info.  
>>> Operating point creation will be done by someone who understands the 
>>> system (system designer) regardless of the method used to get the 
>>> operating points in the kernel.
>> It sounds to me like they don't want to have to change kernel code 
>> and recompile the kernel
>> to get a new operating point.
>> It sounds like they are talking about a dynamic operating point as a 
>> loadable
>> module, which would fit perfectly with the PowerOp scheme, since it's 
>> the
>> system designer who would be creating the new  dynamic operating 
>> point,
>> not the user.
>
> Often, in the embedded world, the person defining the operating
> states will not be a kernel developer, and may not be comfortable
> with, or capable of, creating a kernel module.  (There are
> significant sections of the embedded space where modules are
> not used at all, and no module support is compiled into the
> kernel.)  In these cases, requiring loadable module support
> for runtime OPs would be a problem.

As long as they can express them as supported operating points
the way that CPUFREQ does those values can easily
be translated into an operating point.  A CPUFREQ table
is a good example of this (see the powerop-x86-centrino.patch).

We could provide a template powerop module where they'd
just have to fill in the frequency, voltage information and they'd
have a working module.


>
>> The point of PowerOp is that the system designer creates (and 
>> validates)
>> the operating points that the hardware vendor supports, not the user.
>> A system designer creating a new operating point as a loadable
>> module would satisfy this requirement, and the user would not
>> be able to put the system into an undefined state, either by accident
>> or maliciously.
>>
>
> OK, I think I understand better your objection to user-space
> created operating points.   In embedded projects, it is often
> assumed that no one but the system designer has access to
> arbitrary user space programs.  Hence, it sometimes doesn't
> register that an end user could or would utilize a particular
> interface, just because it existed.
>
> Would not the normal Unix permissions system prevent the
> "bad state" problem, in the non-embedded case?

Not necessarily, the problem occurs when the user (root even)
passes in a frequency or voltage that isn't quite supported.  The value
passes the ranges checks but when the clocks are set to the passed
value the voltage is just below the edge where things still work. The
system gets into an undefined state and usually hangs.  This is the
most common problem we have today with the Dynamic Power
Management solution.

David

>
>  -- Tim
>
> =============================
> Tim Bird
> Architecture Group Chair, CE Linux Forum
> Senior Staff Engineer, Sony Electronics
> =============================
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-01 18:59                       ` david singleton
@ 2006-08-01 19:17                         ` Vitaly Wool
  0 siblings, 0 replies; 45+ messages in thread
From: Vitaly Wool @ 2006-08-01 19:17 UTC (permalink / raw)
  To: david singleton; +Cc: David Singleton, linux-pm

> > Would not the normal Unix permissions system prevent the
> > "bad state" problem, in the non-embedded case?
>
> Not necessarily, the problem occurs when the user (root even)
> passes in a frequency or voltage that isn't quite supported.  The value
> passes the ranges checks but when the clocks are set to the passed
> value the voltage is just below the edge where things still work. The
> system gets into an undefined state and usually hangs.  This is the
> most common problem we have today with the Dynamic Power
> Management solution.

Oops.
That's the problem of *BAD* restrictions, and not of the approach.

Vitaly

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-01 18:52                     ` Tim Bird
  2006-08-01 18:59                       ` david singleton
@ 2006-08-01 19:28                       ` Vitaly Wool
  2006-08-06 22:11                       ` Pavel Machek
  2 siblings, 0 replies; 45+ messages in thread
From: Vitaly Wool @ 2006-08-01 19:28 UTC (permalink / raw)
  To: Tim Bird; +Cc: David Singleton, linux-pm, david singleton

On 8/1/06, Tim Bird <tim.bird@am.sony.com> wrote:
> david singleton wrote:
> > On Aug 1, 2006, at 3:09 AM, Matthew Locke wrote:
> >>Well, no one is suggesting a user define and install that info.
> >>Operating point creation will be done by someone who understands the
> >>system (system designer) regardless of the method used to get the
> >>operating points in the kernel.
> >
> > It sounds to me like they don't want to have to change kernel code and
> > recompile the kernel
> > to get a new operating point.
> >
> > It sounds like they are talking about a dynamic operating point as a
> > loadable
> > module, which would fit perfectly with the PowerOp scheme, since it's
> > the
> > system designer who would be creating the new  dynamic operating point,
> > not the user.
>
> Often, in the embedded world, the person defining the operating
> states will not be a kernel developer, and may not be comfortable
> with, or capable of, creating a kernel module.  (There are
> significant sections of the embedded space where modules are
> not used at all, and no module support is compiled into the
> kernel.)  In these cases, requiring loadable module support
> for runtime OPs would be a problem.

In my understanding, runtime OPs should be registered via configfs.
Modules with definitions are worse yet acceptable.

> > The point of PowerOp is that the system designer creates (and validates)
> > the operating points that the hardware vendor supports, not the user.
> >
> > A system designer creating a new operating point as a loadable
> > module would satisfy this requirement, and the user would not
> > be able to put the system into an undefined state, either by accident
> > or maliciously.
> >
>
> OK, I think I understand better your objection to user-space
> created operating points.   In embedded projects, it is often
> assumed that no one but the system designer has access to
> arbitrary user space programs.  Hence, it sometimes doesn't
> register that an end user could or would utilize a particular
> interface, just because it existed.

Well, why Unix allows root to do 'rm -rf'? It can put the system in an
undefined state... :P

Speaking seriously, if restrictions are comprehensive, then only
correct OPs will be created  run-time or whenever. And being able to
create OPs runtime is a very important requirement for PowerOP, for
many reasons.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-07-30  4:43           ` david singleton
  2006-07-30 11:02             ` Vitaly Wool
@ 2006-08-06 22:05             ` Pavel Machek
  2006-08-07  3:52               ` david singleton
  1 sibling, 1 reply; 45+ messages in thread
From: Pavel Machek @ 2006-08-06 22:05 UTC (permalink / raw)
  To: david singleton; +Cc: David Singleton, linux-pm

Hi!

> > 2) interface (kernel as well as userspace(sysfs)) for the rest of power
> >    parameters except cpu voltage and frequency
> 
> 
> The /sys/power/supported_states file shows the supported operating 
> points
> and their parameters.

We have one-value-per-file rule for sysfs.
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-01 18:52                     ` Tim Bird
  2006-08-01 18:59                       ` david singleton
  2006-08-01 19:28                       ` Vitaly Wool
@ 2006-08-06 22:11                       ` Pavel Machek
  2006-08-07 10:34                         ` Igor Stoppa
  2006-08-07 19:45                         ` Tim Bird
  2 siblings, 2 replies; 45+ messages in thread
From: Pavel Machek @ 2006-08-06 22:11 UTC (permalink / raw)
  To: Tim Bird; +Cc: David Singleton, linux-pm, david singleton

Hi!

> >>Well, no one is suggesting a user define and install that info.  
> >>Operating point creation will be done by someone who understands the 
> >>system (system designer) regardless of the method used to get the 
> >>operating points in the kernel.
> > 
> > It sounds to me like they don't want to have to change kernel code and 
> > recompile the kernel
> > to get a new operating point.
> > 
> > It sounds like they are talking about a dynamic operating point as a 
> > loadable
> > module, which would fit perfectly with the PowerOp scheme, since it's 
> > the
> > system designer who would be creating the new  dynamic operating point,
> > not the user.
> 
> Often, in the embedded world, the person defining the operating
> states will not be a kernel developer, and may not be comfortable
> with, or capable of, creating a kernel module.  (There are

That's okay, we can create HOWTO or something. 'System designed is too
stupid to hack kernel' should not be an argument.

Maybe operating points need to be changed at runtime -- I do not think
so -- but *please* get the basic stuff in/merged ASAP and then add
features one-by-one.
						Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-06 22:05             ` Pavel Machek
@ 2006-08-07  3:52               ` david singleton
  2006-08-07  4:17                 ` Greg KH
  0 siblings, 1 reply; 45+ messages in thread
From: david singleton @ 2006-08-07  3:52 UTC (permalink / raw)
  To: Pavel Machek; +Cc: David Singleton, linux-pm


On Aug 6, 2006, at 3:05 PM, Pavel Machek wrote:

> Hi!
>
>>> 2) interface (kernel as well as userspace(sysfs)) for the rest of 
>>> power
>>>    parameters except cpu voltage and frequency
>>
>>
>> The /sys/power/supported_states file shows the supported operating
>> points
>> and their parameters.
>
> We have one-value-per-file rule for sysfs.

Yes, the supported_states file would be better off in /proc.  I'll
move it.  I just put it in /sys because it was so easy.


> -- 
> Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-07  3:52               ` david singleton
@ 2006-08-07  4:17                 ` Greg KH
  2006-08-07  4:32                   ` Vitaly Wool
  0 siblings, 1 reply; 45+ messages in thread
From: Greg KH @ 2006-08-07  4:17 UTC (permalink / raw)
  To: david singleton; +Cc: David Singleton, linux-pm, Pavel Machek

On Sun, Aug 06, 2006 at 08:52:15PM -0700, david singleton wrote:
> 
> On Aug 6, 2006, at 3:05 PM, Pavel Machek wrote:
> 
> > Hi!
> >
> >>> 2) interface (kernel as well as userspace(sysfs)) for the rest of 
> >>> power
> >>>    parameters except cpu voltage and frequency
> >>
> >>
> >> The /sys/power/supported_states file shows the supported operating
> >> points
> >> and their parameters.
> >
> > We have one-value-per-file rule for sysfs.
> 
> Yes, the supported_states file would be better off in /proc.  I'll
> move it.

No, don't put non-process stuff in /proc, that's not where it belongs.

> I just put it in /sys because it was so easy.

Then use the proper interface with sysfs please.

Just listing the supported states is acceptable, there is prior-usages
like this.  Just not the paramaters, this doesn't need to be
self-documented...

Or use configfs, that might be good for something like this.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-07  4:17                 ` Greg KH
@ 2006-08-07  4:32                   ` Vitaly Wool
  0 siblings, 0 replies; 45+ messages in thread
From: Vitaly Wool @ 2006-08-07  4:32 UTC (permalink / raw)
  To: Greg KH; +Cc: David Singleton, linux-pm, david singleton, Pavel Machek

> Or use configfs, that might be good for something like this.

It's even better for runtime OPs creation ;)

Vitaly

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-06 22:11                       ` Pavel Machek
@ 2006-08-07 10:34                         ` Igor Stoppa
  2006-08-07 19:45                         ` Tim Bird
  1 sibling, 0 replies; 45+ messages in thread
From: Igor Stoppa @ 2006-08-07 10:34 UTC (permalink / raw)
  To: ext Pavel Machek; +Cc: David Singleton, linux-pm, david singleton

Hi,
On Mon, 2006-08-07 at 01:11 +0300, ext Pavel Machek wrote:
<snip>
> > Often, in the embedded world, the person defining the operating
> > states will not be a kernel developer, and may not be comfortable
> > with, or capable of, creating a kernel module.  (There are
> 
> That's okay, we can create HOWTO or something. 'System designed is too
> stupid to hack kernel' should not be an argument.

In a production environment the competence and experience is usually
distributed across several teams and it's not realistic to just label
"stupid" somebody who has a deep understanding of the HW platform but
not so good understanding of kernel internals.
> 
Once there is a clean separation between mechanism and policies, there
is no need to have a kernel developer take care of issues that are
purely HW-related.



-- 
Cheers,
           Igor

Igor Stoppa (Nokia M - OSSO / Tampere)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
@ 2006-08-07 14:12 Scott E. Preece
  2006-08-07 16:58 ` Greg KH
  0 siblings, 1 reply; 45+ messages in thread
From: Scott E. Preece @ 2006-08-07 14:12 UTC (permalink / raw)
  To: greg; +Cc: daviado, linux-pm, dsingleton, pavel


While I think it makes sense to treat a list as a "single value" in this
kind of case, a reasonable alternative would be to make
/sys/power/supported_states a directory, with separate files for each
supported OP.

scott

| From: Greg KH<greg@kroah.com>
| 
| On Sun, Aug 06, 2006 at 08:52:15PM -0700, david singleton wrote:
| > 
| > On Aug 6, 2006, at 3:05 PM, Pavel Machek wrote:
| > 
| > > Hi!
| > >
| > >>> 2) interface (kernel as well as userspace(sysfs)) for the rest of 
| > >>> power
| > >>>    parameters except cpu voltage and frequency
| > >>
| > >>
| > >> The /sys/power/supported_states file shows the supported operating
| > >> points
| > >> and their parameters.
| > >
| > > We have one-value-per-file rule for sysfs.
| > 
| > Yes, the supported_states file would be better off in /proc.  I'll
| > move it.
| 
| No, don't put non-process stuff in /proc, that's not where it belongs.
| 
| > I just put it in /sys because it was so easy.
| 
| Then use the proper interface with sysfs please.
| 
| Just listing the supported states is acceptable, there is prior-usages
| like this.  Just not the paramaters, this doesn't need to be
| self-documented...
| 
| Or use configfs, that might be good for something like this.
| 
| thanks,
| 
| greg k-h
| _______________________________________________
| linux-pm mailing list
| linux-pm@lists.osdl.org
| https://lists.osdl.org/mailman/listinfo/linux-pm
| 

-- 
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il  61820  
e-mail:	preece@motorola.com	fax:	+1-217-384-8550
phone:	+1-217-384-8589	cell: +1-217-433-6114	pager: 2174336114@vtext.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-07 14:12 Scott E. Preece
@ 2006-08-07 16:58 ` Greg KH
  0 siblings, 0 replies; 45+ messages in thread
From: Greg KH @ 2006-08-07 16:58 UTC (permalink / raw)
  To: Scott E. Preece; +Cc: daviado, linux-pm, dsingleton, pavel


A: No.
Q: Should I include quotations after my reply?

On Mon, Aug 07, 2006 at 09:12:14AM -0500, Scott E. Preece wrote:
> 
> While I think it makes sense to treat a list as a "single value" in this
> kind of case

A simple list, like what we have today in /sys/power/state, yes.  But
with paramater information?  no.

>, a reasonable alternative would be to make /sys/power/supported_states
>a directory, with separate files for each supported OP.

Sure, if you want to do that, fine.  But then you are looking a lot like
configfs :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-06 22:11                       ` Pavel Machek
  2006-08-07 10:34                         ` Igor Stoppa
@ 2006-08-07 19:45                         ` Tim Bird
  2006-08-08 10:07                           ` Pavel Machek
  1 sibling, 1 reply; 45+ messages in thread
From: Tim Bird @ 2006-08-07 19:45 UTC (permalink / raw)
  To: Pavel Machek; +Cc: David Singleton, linux-pm, david singleton

Pavel Machek wrote:
>> Often, in the embedded world, the person defining the operating
>> states will not be a kernel developer, and may not be comfortable
>> with, or capable of, creating a kernel module.  (There are
> 
> That's okay, we can create HOWTO or something. 'System designed is too
> stupid to hack kernel' should not be an argument.

It's not a matter of stupidity.  The example I gave of not being
able to create a kernel module was the situation where loadable
module support is not available.  A HOWTO wouldn't alter this.

> Maybe operating points need to be changed at runtime -- I do not think
> so -- but *please* get the basic stuff in/merged ASAP and then add
> features one-by-one.

I agree. :-)
 -- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Electronics
=============================

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-07-29  0:38   ` david singleton
  2006-07-29  0:45     ` Greg KH
@ 2006-08-07 22:06     ` Pavel Machek
  1 sibling, 0 replies; 45+ messages in thread
From: Pavel Machek @ 2006-08-07 22:06 UTC (permalink / raw)
  To: david singleton; +Cc: David Singleton, linux-pm

Hi!

> >singleton wrote:
> >>Here is a patch that implements a version of the 
> >>PowerOp concept.
> >
> >Any chance of breaking this up into logical patches 
> >that do one thing at
> >a time so it can be reviewed better?

> Here's powerop-core.patch,  powerop-cpufreq.patch and 
> powerop-x86-centrino.patch.

You failed to inline your patches, so I can't comment properly.
Anyway, having talkative description strings 'suspend-to-disk ACPI' is
ugly for kernel, and you got them wrong.

I'm not sure if I read your patch right, but having suspend-to-disk
latency measured in *micro*seconds seems 'interesting' to me. And what
is the frequency of machine suspended to disk?

Yes, making it easier to add states to /sys/power/state would be
nice...

						Pavel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-07 19:45                         ` Tim Bird
@ 2006-08-08 10:07                           ` Pavel Machek
  2006-08-08 11:12                             ` Igor Stoppa
  0 siblings, 1 reply; 45+ messages in thread
From: Pavel Machek @ 2006-08-08 10:07 UTC (permalink / raw)
  To: Tim Bird; +Cc: David Singleton, linux-pm, david singleton

Hi!

> >> Often, in the embedded world, the person defining the operating
> >> states will not be a kernel developer, and may not be comfortable
> >> with, or capable of, creating a kernel module.  (There are
> > 
> > That's okay, we can create HOWTO or something. 'System designed is too
> > stupid to hack kernel' should not be an argument.
> 
> It's not a matter of stupidity.  The example I gave of not being
> able to create a kernel module was the situation where loadable
> module support is not available.  A HOWTO wouldn't alter this.

So they need to recompile the kernel. Not a big deal when you are
developing new machine.

> > Maybe operating points need to be changed at runtime -- I do not think
> > so -- but *please* get the basic stuff in/merged ASAP and then add
> > features one-by-one.
> 
> I agree. :-)

Yes, I believe we are actually in agreement. Maybe features you want
will indeed be useful, but we want to get basics right, first.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-08 10:07                           ` Pavel Machek
@ 2006-08-08 11:12                             ` Igor Stoppa
  2006-08-08 11:33                               ` Pavel Machek
  0 siblings, 1 reply; 45+ messages in thread
From: Igor Stoppa @ 2006-08-08 11:12 UTC (permalink / raw)
  To: ext Pavel Machek; +Cc: David Singleton, linux-pm, david singleton

On Tue, 2006-08-08 at 13:07 +0300, ext Pavel Machek wrote:
> Hi!
> 
> > >> Often, in the embedded world, the person defining the operating
> > >> states will not be a kernel developer, and may not be comfortable
> > >> with, or capable of, creating a kernel module.  (There are
> > >
> > > That's okay, we can create HOWTO or something. 'System designed is
> too
> > > stupid to hack kernel' should not be an argument.
> >
> > It's not a matter of stupidity.  The example I gave of not being
> > able to create a kernel module was the situation where loadable
> > module support is not available.  A HOWTO wouldn't alter this.
> 
> So they need to recompile the kernel. Not a big deal when you are
> developing new machine.
> 
1 time yes, several maybe, thousands no.

think about it as finding minimums in a n-dimensional discrete function:
you really don't want to recompile a kernel each time

-- 
Cheers,
           Igor

Igor Stoppa (Nokia M - OSSO / Tampere)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-08 11:12                             ` Igor Stoppa
@ 2006-08-08 11:33                               ` Pavel Machek
  2006-08-08 16:43                                 ` Tim Bird
  0 siblings, 1 reply; 45+ messages in thread
From: Pavel Machek @ 2006-08-08 11:33 UTC (permalink / raw)
  To: Igor Stoppa; +Cc: David Singleton, linux-pm, david singleton

Hi!

> > > >> Often, in the embedded world, the person defining the operating
> > > >> states will not be a kernel developer, and may not be comfortable
> > > >> with, or capable of, creating a kernel module.  (There are
> > > >
> > > > That's okay, we can create HOWTO or something. 'System designed is
> > too
> > > > stupid to hack kernel' should not be an argument.
> > >
> > > It's not a matter of stupidity.  The example I gave of not being
> > > able to create a kernel module was the situation where loadable
> > > module support is not available.  A HOWTO wouldn't alter this.
> > 
> > So they need to recompile the kernel. Not a big deal when you are
> > developing new machine.
> > 
> 1 time yes, several maybe, thousands no.
> 
> think about it as finding minimums in a n-dimensional discrete function:
> you really don't want to recompile a kernel each time

Okay... for that kind of development you probably want custom kernel
with sysfs interface. And if that sysfs interface turns out to be nice
enough, maybe we even want to merge it.

But for now please submit patch with only basic functionality, and
keep sysfs interface outside. Getting interfaces right is hard and you
do not want to block core functionality because of that.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
@ 2006-08-08 13:44 Scott E. Preece
  2006-08-08 13:52 ` Pavel Machek
  0 siblings, 1 reply; 45+ messages in thread
From: Scott E. Preece @ 2006-08-08 13:44 UTC (permalink / raw)
  To: pavel; +Cc: daviado, linux-pm, dsingleton


If you're a device manufacturer, and you build, say 50 devices that all
use the same hardware but, because they are optimized for different
functional use cases, have different preferred operating points and
DVFS policies, it's *highly* desirable to not have to maintain 50
separate builds for those devices.  Putting configuration information
in places that can be changed independently of compilation is very
important to us.

scott


| From: Pavel Machek<pavel@ucw.cz>
| 
| Hi!
| 
| > >> Often, in the embedded world, the person defining the operating
| > >> states will not be a kernel developer, and may not be comfortable
| > >> with, or capable of, creating a kernel module.  (There are
| > > 
| > > That's okay, we can create HOWTO or something. 'System designed is too
| > > stupid to hack kernel' should not be an argument.
| > 
| > It's not a matter of stupidity.  The example I gave of not being
| > able to create a kernel module was the situation where loadable
| > module support is not available.  A HOWTO wouldn't alter this.
| 
| So they need to recompile the kernel. Not a big deal when you are
| developing new machine.
| 
| > > Maybe operating points need to be changed at runtime -- I do not think
| > > so -- but *please* get the basic stuff in/merged ASAP and then add
| > > features one-by-one.
| > 
| > I agree. :-)
| 
| Yes, I believe we are actually in agreement. Maybe features you want
| will indeed be useful, but we want to get basics right, first.
| 								Pavel
| -- 
| (english) http://www.livejournal.com/~pavelmachek
| (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
| _______________________________________________
| linux-pm mailing list
| linux-pm@lists.osdl.org
| https://lists.osdl.org/mailman/listinfo/linux-pm
| 

-- 
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il  61820  
e-mail:	preece@motorola.com	fax:	+1-217-384-8550
phone:	+1-217-384-8589	cell: +1-217-433-6114	pager: 2174336114@vtext.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-08 13:44 Scott E. Preece
@ 2006-08-08 13:52 ` Pavel Machek
  2006-08-08 15:53   ` Matthew Locke
  2006-08-08 18:10   ` Igor Stoppa
  0 siblings, 2 replies; 45+ messages in thread
From: Pavel Machek @ 2006-08-08 13:52 UTC (permalink / raw)
  To: Scott E. Preece; +Cc: daviado, linux-pm, dsingleton

Hi!

A: No.
Q: Should I top-post?

> If you're a device manufacturer, and you build, say 50 devices that all
> use the same hardware but, because they are optimized for different
> functional use cases, have different preferred operating points and
> DVFS policies, it's *highly* desirable to not have to maintain 50
> separate builds for those devices.  Putting configuration information
> in places that can be changed independently of compilation is very
> important to us.

You'll still need to maintain 50 different userlands, so I do not
think that issue is _so_ important.

Anyway, lets get in core first, than talk about userspace interface
for chanig operating points, ok?
							Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
@ 2006-08-08 13:54 Scott E. Preece
  2006-08-08 16:49 ` Tim Bird
  0 siblings, 1 reply; 45+ messages in thread
From: Scott E. Preece @ 2006-08-08 13:54 UTC (permalink / raw)
  To: preece; +Cc: daviado, linux-pm, pavel, dsingleton


Note that I concur with Pavel and Tim that it would be better
to get the functionality working first, and worry about this after
we're sure the basic mechanisms work. However, it would make sense for
people reviewing the code to have in their minds the idea that it needs
to be dynamic in the future.

scott


| From preece@urbana.css.mot.com Tue Aug  8 08:44:35 2006
| Date: Tue, 8 Aug 2006 08:44:34 -0500 (CDT)
| From: "Scott E. Preece" <preece@motorola.com>
| CC: tim.bird@am.sony.com, daviado@gmail.com, linux-pm@lists.osdl.org,
| 
| 
| If you're a device manufacturer, and you build, say 50 devices that all
| use the same hardware but, because they are optimized for different
| functional use cases, have different preferred operating points and
| DVFS policies, it's *highly* desirable to not have to maintain 50
| separate builds for those devices.  Putting configuration information
| in places that can be changed independently of compilation is very
| important to us.
| 
| scott
| 
| 
| | From: Pavel Machek<pavel@ucw.cz>
| | 
| | Hi!
| | 
| | > >> Often, in the embedded world, the person defining the operating
| | > >> states will not be a kernel developer, and may not be comfortable
| | > >> with, or capable of, creating a kernel module.  (There are
| | > > 
| | > > That's okay, we can create HOWTO or something. 'System designed is too
| | > > stupid to hack kernel' should not be an argument.
| | > 
| | > It's not a matter of stupidity.  The example I gave of not being
| | > able to create a kernel module was the situation where loadable
| | > module support is not available.  A HOWTO wouldn't alter this.
| | 
| | So they need to recompile the kernel. Not a big deal when you are
| | developing new machine.
| | 
| | > > Maybe operating points need to be changed at runtime -- I do not think
| | > > so -- but *please* get the basic stuff in/merged ASAP and then add
| | > > features one-by-one.
| | > 
| | > I agree. :-)
| | 
| | Yes, I believe we are actually in agreement. Maybe features you want
| | will indeed be useful, but we want to get basics right, first.
| | 								Pavel
| | -- 
| | (english) http://www.livejournal.com/~pavelmachek
| | (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
| | _______________________________________________
| | linux-pm mailing list
| | linux-pm@lists.osdl.org
| | https://lists.osdl.org/mailman/listinfo/linux-pm
| | 
| 
| -- 
| scott preece
| motorola mobile devices, il67, 1800 s. oak st., champaign, il  61820  
| e-mail:	preece@motorola.com	fax:	+1-217-384-8550
| phone:	+1-217-384-8589	cell: +1-217-433-6114	pager: 2174336114@vtext.com
| 
| 
| 

-- 
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il  61820  
e-mail:	preece@motorola.com	fax:	+1-217-384-8550
phone:	+1-217-384-8589	cell: +1-217-433-6114	pager: 2174336114@vtext.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-08 13:52 ` Pavel Machek
@ 2006-08-08 15:53   ` Matthew Locke
  2006-08-08 16:03     ` Matthew Locke
  2006-08-08 18:10   ` Igor Stoppa
  1 sibling, 1 reply; 45+ messages in thread
From: Matthew Locke @ 2006-08-08 15:53 UTC (permalink / raw)
  To: Pavel Machek; +Cc: daviado, linux-pm, dsingleton


On Aug 8, 2006, at 6:52 AM, Pavel Machek wrote:

> Anyway, lets get in core first, than talk about userspace interface
> for chanig operating points, ok?

Sounds like we are all in agreement on this point:)    Hopefully we can 
get agreement or some sort of conclusion on the latest interface we 
have presented (by Eugeny).  Once we get agreement there, we can start 
pushing powerop to andrew and then start working/discussing the other 
features necessary.

> 							Pavel
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) 
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-08 15:53   ` Matthew Locke
@ 2006-08-08 16:03     ` Matthew Locke
  0 siblings, 0 replies; 45+ messages in thread
From: Matthew Locke @ 2006-08-08 16:03 UTC (permalink / raw)
  To: Matthew Locke; +Cc: daviado, linux-pm, dsingleton, Pavel Machek

Just to be clear.... I think the most important part to review and 
agree to is the powerop interface and implementation.  We can always 
add the right userspace interface for creation (if agreed to), 
suspend/resume integration, etc later.
On Aug 8, 2006, at 8:53 AM, Matthew Locke wrote:

>
> On Aug 8, 2006, at 6:52 AM, Pavel Machek wrote:
>
>> Anyway, lets get in core first, than talk about userspace interface
>> for chanig operating points, ok?
>
> Sounds like we are all in agreement on this point:)    Hopefully we can
> get agreement or some sort of conclusion on the latest interface we
> have presented (by Eugeny).  Once we get agreement there, we can start
> pushing powerop to andrew and then start working/discussing the other
> features necessary.
>
>> 							Pavel
>> -- 
>> (english) http://www.livejournal.com/~pavelmachek
>> (cesky, pictures)
>> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
>> _______________________________________________
>> linux-pm mailing list
>> linux-pm@lists.osdl.org
>> https://lists.osdl.org/mailman/listinfo/linux-pm
>>
>
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-08 11:33                               ` Pavel Machek
@ 2006-08-08 16:43                                 ` Tim Bird
  0 siblings, 0 replies; 45+ messages in thread
From: Tim Bird @ 2006-08-08 16:43 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-pm, David Singleton, david singleton

Pavel Machek wrote:
> But for now please submit patch with only basic functionality, and
> keep sysfs interface outside. Getting interfaces right is hard and you
> do not want to block core functionality because of that.

I strongly concur with this.  Mainlining is an art.
Just getting powerop integrated without user-space point definition
would be a great accomplishment.
 -- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Electronics
=============================

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-08 13:54 Scott E. Preece
@ 2006-08-08 16:49 ` Tim Bird
  0 siblings, 0 replies; 45+ messages in thread
From: Tim Bird @ 2006-08-08 16:49 UTC (permalink / raw)
  To: Scott E. Preece; +Cc: daviado, linux-pm, dsingleton, pavel

Scott E. Preece wrote (in a previous post):
> Putting configuration information
> in places that can be changed independently of compilation is very
> important to us.

> Note that I concur with Pavel and Tim that it would be better
> to get the functionality working first, and worry about this after
> we're sure the basic mechanisms work. However, it would make sense for
> people reviewing the code to have in their minds the idea that it needs
> to be dynamic in the future.

That's fair.  Knowing the requirements, and the background
rationale for the requirements is always helpful, so thanks
for the previous post.  Often, you have to break things into
chunks to get them mainlined. Only mainlining a part of what
you want is still pretty good. I think we're in basic agreement.

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Electronics
=============================

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: PowerOp Design and working patch
  2006-08-08 13:52 ` Pavel Machek
  2006-08-08 15:53   ` Matthew Locke
@ 2006-08-08 18:10   ` Igor Stoppa
  1 sibling, 0 replies; 45+ messages in thread
From: Igor Stoppa @ 2006-08-08 18:10 UTC (permalink / raw)
  To: ext Pavel Machek; +Cc: daviado, linux-pm, dsingleton

On Tue, 2006-08-08 at 16:52 +0300, ext Pavel Machek wrote:
> Hi!
> 
> A: No.
> Q: Should I top-post?
> 
> > If you're a device manufacturer, and you build, say 50 devices that
> all
> > use the same hardware but, because they are optimized for different
> > functional use cases, have different preferred operating points and
> > DVFS policies, it's *highly* desirable to not have to maintain 50
> > separate builds for those devices.  Putting configuration
> information
> > in places that can be changed independently of compilation is very
> > important to us.
> 
> You'll still need to maintain 50 different userlands, so I do not
> think that issue is _so_ important.

Hi,
the complexity/burden of maintenance depends on how the data to be
flashed is partitioned and the flashing stages used in production.

Mantaining 50 different kernel versions _and_ 50 different rootfs images
is more complex than having just a single kernel.

Of course a module would live in intfs or rootfs, so that would mantain
a single package, but it would introduce extra dependancies between
kernel version and rootfs version.
Especially if module support has to be enabled just because of it.

> Anyway, lets get in core first, than talk about userspace interface
> for chanig operating points, ok?
:D I think everybody has agreed on that, but it's important that
production-related issues are understood.

-- 
Cheers,
           Igor

Igor Stoppa (Nokia M - OSSO / Tampere)

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2006-08-08 18:10 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-28 22:31 PowerOp Design and working patch david singleton
2006-07-28 23:38 ` Greg KH
2006-07-29  0:26   ` david singleton
2006-07-29  0:38   ` david singleton
2006-07-29  0:45     ` Greg KH
2006-07-29  5:12       ` david singleton
2006-07-29 19:07         ` Eugeny S. Mints
2006-07-30  4:43           ` david singleton
2006-07-30 11:02             ` Vitaly Wool
2006-08-01  0:59               ` david singleton
2006-08-01 10:09                 ` Matthew Locke
2006-08-01 10:22                   ` Matthew Locke
2006-08-01 18:31                   ` david singleton
2006-08-01 18:52                     ` Tim Bird
2006-08-01 18:59                       ` david singleton
2006-08-01 19:17                         ` Vitaly Wool
2006-08-01 19:28                       ` Vitaly Wool
2006-08-06 22:11                       ` Pavel Machek
2006-08-07 10:34                         ` Igor Stoppa
2006-08-07 19:45                         ` Tim Bird
2006-08-08 10:07                           ` Pavel Machek
2006-08-08 11:12                             ` Igor Stoppa
2006-08-08 11:33                               ` Pavel Machek
2006-08-08 16:43                                 ` Tim Bird
2006-08-01 12:23                 ` Vitaly Wool
2006-08-01 18:25                   ` Tim Bird
2006-08-01 18:02               ` Tim Bird
2006-08-06 22:05             ` Pavel Machek
2006-08-07  3:52               ` david singleton
2006-08-07  4:17                 ` Greg KH
2006-08-07  4:32                   ` Vitaly Wool
2006-07-29 22:09         ` Greg KH
2006-08-01  0:36           ` david singleton
2006-08-01  1:27           ` david singleton
2006-08-07 22:06     ` Pavel Machek
  -- strict thread matches above, loose matches on Subject: below --
2006-08-01 10:09 Matthew Locke
2006-08-07 14:12 Scott E. Preece
2006-08-07 16:58 ` Greg KH
2006-08-08 13:44 Scott E. Preece
2006-08-08 13:52 ` Pavel Machek
2006-08-08 15:53   ` Matthew Locke
2006-08-08 16:03     ` Matthew Locke
2006-08-08 18:10   ` Igor Stoppa
2006-08-08 13:54 Scott E. Preece
2006-08-08 16:49 ` Tim Bird

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox