* Re: [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2
2006-09-14 14:37 Eugeny S. Mints
@ 2006-09-18 10:44 ` Pavel Machek
2006-09-18 11:32 ` Eugeny S. Mints
0 siblings, 1 reply; 19+ messages in thread
From: Pavel Machek @ 2006-09-18 10:44 UTC (permalink / raw)
To: Eugeny S. Mints; +Cc: pm list, kernel list
Hi!
> The PowerOP Core provides completely arch independent interface
> to create and control operating points which consist of arbitrary
> subset of power parameters available on a certain platform.
> Also, PowerOP Core provides optional SysFS interface to access
> operating point from userspace.
Please inline patches and sign them off.
Also if you are providing new userland interface, describe it... in
Documentation/ABI.
> +struct powerop_driver {
> + char *name;
> + void *(*create_point) (const char *pwr_params, va_list args);
> + int (*set_point) (void *md_opt);
> + int (*get_point) (void *md_opt, const char *pwr_params, va_list args);
> +};
We can certainly get better interface than va_list, right?
> +
> +#
> +# powerop
> +#
> +
> +menu "PowerOP (Power Management)"
> +
> +config POWEROP
> + tristate "PowerOP Core"
> + help
Hohum, this is certainly going to be clear to confused user...
> + list_add_tail(&opt->node, &named_opt_list);
> + strcpy(registered_names[registered_opt_number], id);
> + registered_opt_number++;
> + up(&named_opt_list_mutex);
> +
> + blocking_notifier_call_chain(&powerop_notifier_list,
> + POWEROP_REGISTER_EVENT, id);
> + return 0;
> +
> + fail_set_name:
> + kfree(opt->md_opt);
> +
> + fail_opt_create:
> + kfree(registered_names[registered_opt_number]);
> +
> + fail_name_nomem:
> + kfree(opt);
> + return err;
> +}
Careful about spaces vs. tabs...
...so, you got support for 20 operating points... And this should
include devices, too, right? How is it going to work on 8cpu box? will
you have states like cpu1_800MHz_cpu2_1600MHz_cpu3_800MHz_... ?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2
2006-09-18 10:44 ` [linux-pm] " Pavel Machek
@ 2006-09-18 11:32 ` Eugeny S. Mints
2006-09-18 19:58 ` Eugeny S. Mints
2006-09-19 18:22 ` Pavel Machek
0 siblings, 2 replies; 19+ messages in thread
From: Eugeny S. Mints @ 2006-09-18 11:32 UTC (permalink / raw)
To: Pavel Machek; +Cc: pm list, kernel list
Pavel Machek wrote:
> Hi!
>
>> The PowerOP Core provides completely arch independent interface
>> to create and control operating points which consist of arbitrary
>> subset of power parameters available on a certain platform.
>> Also, PowerOP Core provides optional SysFS interface to access
>> operating point from userspace.
>
> Please inline patches and sign them off.
hmm, seems my bad. will double check.
>
> Also if you are providing new userland interface, describe it... in
> Documentation/ABI.
seems already discussed
>> +struct powerop_driver {
>> + char *name;
>> + void *(*create_point) (const char *pwr_params, va_list args);
>> + int (*set_point) (void *md_opt);
>> + int (*get_point) (void *md_opt, const char *pwr_params, va_list args);
>> +};
>
> We can certainly get better interface than va_list, right?
Please elaborate.
>
>> +
>> +#
>> +# powerop
>> +#
>> +
>> +menu "PowerOP (Power Management)"
>> +
>> +config POWEROP
>> + tristate "PowerOP Core"
>> + help
>
> Hohum, this is certainly going to be clear to confused user...
please elaborate.
>> + list_add_tail(&opt->node, &named_opt_list);
>> + strcpy(registered_names[registered_opt_number], id);
>> + registered_opt_number++;
>> + up(&named_opt_list_mutex);
>> +
>> + blocking_notifier_call_chain(&powerop_notifier_list,
>> + POWEROP_REGISTER_EVENT, id);
>> + return 0;
>> +
>> + fail_set_name:
>> + kfree(opt->md_opt);
>> +
>> + fail_opt_create:
>> + kfree(registered_names[registered_opt_number]);
>> +
>> + fail_name_nomem:
>> + kfree(opt);
>> + return err;
>> +}
>
> Careful about spaces vs. tabs...
will double check but I'm pretty sure I ran all files thru lindent.
> ...so, you got support for 20 operating points... And this should
> include devices, too, right?
sorry, don't understand the question. an operating point is a set of platform
power parameters.
>How is it going to work on 8cpu box? will
> you have states like cpu1_800MHz_cpu2_1600MHz_cpu3_800MHz_... ?
i do not operate with term 'state' so I don't understand what it means here.
Eugeny
> Pavel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2
2006-09-18 11:32 ` Eugeny S. Mints
@ 2006-09-18 19:58 ` Eugeny S. Mints
2006-09-18 20:07 ` Vitaly Wool
2006-09-19 18:22 ` Pavel Machek
1 sibling, 1 reply; 19+ messages in thread
From: Eugeny S. Mints @ 2006-09-18 19:58 UTC (permalink / raw)
To: Pavel Machek; +Cc: Eugeny S. Mints, pm list, kernel list
Eugeny S. Mints wrote:
> Pavel Machek wrote:
>> Hi!
>>
[skip]
>> How is it going to work on 8cpu box? will
>> you have states like cpu1_800MHz_cpu2_1600MHz_cpu3_800MHz_... ?
basically I guess you are asking about what the names of operating points are
and how to distinguish between operating points from userspace on 8cpu box.
An advantage of PowerOP approach is that operating point name is used as a
_handle_ and may or may not be meaningful. The idea is that if a policy manager
needs to make a decision and needs to distinguish between operating points it
can check value of any power parameters of operating points in question. Power
parameter values may be obtained under <op_name> dir name.
With such approach a policy manger may compare operating points at runtime and
should not rely on compile time knowledge about what name corresponds to what
set of power parameter values. It uses name as a handle.
Eugeny
>
> i do not operate with term 'state' so I don't understand what it means
> here.
>
> Eugeny
>
>> Pavel
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2
2006-09-18 19:58 ` Eugeny S. Mints
@ 2006-09-18 20:07 ` Vitaly Wool
0 siblings, 0 replies; 19+ messages in thread
From: Vitaly Wool @ 2006-09-18 20:07 UTC (permalink / raw)
To: Eugeny S. Mints; +Cc: Pavel Machek, pm list, kernel list
Eugeny,
On 9/18/06, Eugeny S. Mints <eugeny.mints@gmail.com> wrote:
> With such approach a policy manger may compare operating points at runtime and
> should not rely on compile time knowledge about what name corresponds to what
> set of power parameter values. It uses name as a handle.
I suggest that you supply a meaningful yet simple example for SMP case.
Thanks,
Vitaly
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2
2006-09-18 11:32 ` Eugeny S. Mints
2006-09-18 19:58 ` Eugeny S. Mints
@ 2006-09-19 18:22 ` Pavel Machek
1 sibling, 0 replies; 19+ messages in thread
From: Pavel Machek @ 2006-09-19 18:22 UTC (permalink / raw)
To: Eugeny S. Mints; +Cc: pm list, kernel list
Hi!
> >>+struct powerop_driver {
> >>+ char *name;
> >>+ void *(*create_point) (const char *pwr_params, va_list args);
> >>+ int (*set_point) (void *md_opt);
> >>+ int (*get_point) (void *md_opt, const char *pwr_params, va_list
> >>args);
> >>+};
> >
> >We can certainly get better interface than va_list, right?
>
> Please elaborate.
va_list does not provide adequate type checking. I do not think it
suitable in driver<->core interface.
> >>+#
> >>+# powerop
> >>+#
> >>+
> >>+menu "PowerOP (Power Management)"
> >>+
> >>+config POWEROP
> >>+ tristate "PowerOP Core"
> >>+ help
> >
> >Hohum, this is certainly going to be clear to confused user...
>
> please elaborate.
You probably want to include some help text.
> >How is it going to work on 8cpu box? will
> >you have states like cpu1_800MHz_cpu2_1600MHz_cpu3_800MHz_... ?
>
> i do not operate with term 'state' so I don't understand what it means here.
Okay, state here means "operating point". How will operating points
look on 8cpu box? That's 256 states if cpus only support "low" and
"high". How do you name them?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2
@ 2006-09-19 19:46 Scott E. Preece
2006-09-19 20:06 ` Eugeny S. Mints
2006-09-22 14:09 ` Pavel Machek
0 siblings, 2 replies; 19+ messages in thread
From: Scott E. Preece @ 2006-09-19 19:46 UTC (permalink / raw)
To: pavel; +Cc: eugeny.mints, linux-pm, linux-kernel
| From: Pavel Machek<pavel@ucw.cz>
|
| > >>+struct powerop_driver {
| > >>+ char *name;
| > >>+ void *(*create_point) (const char *pwr_params, va_list args);
| > >>+ int (*set_point) (void *md_opt);
| > >>+ int (*get_point) (void *md_opt, const char *pwr_params, va_list
| > >>args);
| > >>+};
| > >
| > >We can certainly get better interface than va_list, right?
| >
| > Please elaborate.
|
| va_list does not provide adequate type checking. I do not think it
| suitable in driver<->core interface.
---
Well, in this particular case the typing probably has to be weak, one
way or another. The meaning of the parameters is arguably opaque at
the interface - the attributes may be meaningful to specific components
of the system, but are not defined in the standardized interface (which
would otherwise have to know about all possible kinds of power
attributes and be changed every time a new one is added).
So, if you'd rather have an array of char* or void* values, that would
probably also meet the need, but my guess is that the typing is
intentionally opaque.
---
| ...
| > >How is it going to work on 8cpu box? will
| > >you have states like cpu1_800MHz_cpu2_1600MHz_cpu3_800MHz_... ?
| >
| > i do not operate with term 'state' so I don't understand what it means here.
|
| Okay, state here means "operating point". How will operating points
| look on 8cpu box? That's 256 states if cpus only support "low" and
| "high". How do you name them?
---
I don't think you would name the compounded states. Each CPU would need
to have its own defined set of operating points (since the capabilities
of the CPUs can reasonably be different).
scott
--
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il 61820
e-mail: preece@motorola.com fax: +1-217-384-8550
phone: +1-217-384-8589 cell: +1-217-433-6114 pager: 2174336114@vtext.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2
2006-09-19 19:46 [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2 Scott E. Preece
@ 2006-09-19 20:06 ` Eugeny S. Mints
2006-09-22 14:09 ` Pavel Machek
1 sibling, 0 replies; 19+ messages in thread
From: Eugeny S. Mints @ 2006-09-19 20:06 UTC (permalink / raw)
To: Scott E. Preece; +Cc: pavel, linux-pm, linux-kernel
Scott E. Preece wrote:
> | From: Pavel Machek<pavel@ucw.cz>
> |
> | > >>+struct powerop_driver {
> | > >>+ char *name;
> | > >>+ void *(*create_point) (const char *pwr_params, va_list args);
> | > >>+ int (*set_point) (void *md_opt);
> | > >>+ int (*get_point) (void *md_opt, const char *pwr_params, va_list
> | > >>args);
> | > >>+};
> | > >
> | > >We can certainly get better interface than va_list, right?
> | >
> | > Please elaborate.
> |
> | va_list does not provide adequate type checking. I do not think it
> | suitable in driver<->core interface.
> ---
>
> Well, in this particular case the typing probably has to be weak, one
> way or another. The meaning of the parameters is arguably opaque at
> the interface - the attributes may be meaningful to specific components
> of the system, but are not defined in the standardized interface (which
> would otherwise have to know about all possible kinds of power
> attributes and be changed every time a new one is added).
>
> So, if you'd rather have an array of char* or void* values, that would
> probably also meet the need, but my guess is that the typing is
> intentionally opaque.
exactly. Thanks Scott, I don't have anything meaningful to add here.
> ---
> | ...
> | > >How is it going to work on 8cpu box? will
> | > >you have states like cpu1_800MHz_cpu2_1600MHz_cpu3_800MHz_... ?
> | >
> | > i do not operate with term 'state' so I don't understand what it means here.
> |
> | Okay, state here means "operating point". How will operating points
> | look on 8cpu box? That's 256 states if cpus only support "low" and
> | "high". How do you name them?
> ---
>
> I don't think you would name the compounded states. Each CPU would need
> to have its own defined set of operating points (since the capabilities
> of the CPUs can reasonably be different).
i'm not sure - may be it's the mail list issues but in fact I sent out
my additional comment on this question in a separate letter on this thread
a while ago.
Eugeny
> scott
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2
@ 2006-09-19 21:37 Scott E. Preece
2006-09-22 14:11 ` Pavel Machek
0 siblings, 1 reply; 19+ messages in thread
From: Scott E. Preece @ 2006-09-19 21:37 UTC (permalink / raw)
To: eugeny.mints; +Cc: pavel, linux-pm, linux-kernel
From: "Eugeny S. Mints" <eugeny.mints@gmail.com>
| Eugeny S. Mints wrote:
| > Pavel Machek wrote:
| >> Hi!
| >>
| [skip]
| >> How is it going to work on 8cpu box? will
| >> you have states like cpu1_800MHz_cpu2_1600MHz_cpu3_800MHz_... ?
|
| basically I guess you are asking about what the names of operating
| points are and how to distinguish between operating points from
| userspace on 8cpu box.
|
| An advantage of PowerOP approach is that operating point name is used as
| a _handle_ and may or may not be meaningful. The idea is that if a
| policy manager needs to make a decision and needs to distinguish between
| operating points it can check value of any power parameters of operating
| points in question. Power parameter values may be obtained under
| <op_name> dir name.
|
| With such approach a policy manger may compare operating points at
| runtime and should not rely on compile time knowledge about what name
| corresponds to what set of power parameter values. It uses name as a
| handle.
---
Hmm. If you assume the CPUs in an SMP system can be in different
operating points, this would (as Pavel pointed out) result in an
explosion of operating points.
I see several possible responses to this problem:
(1) Make operating points a CPU-level abstraction, rather than a
system-level abstraction, with the set of OPs for each CPU defined
individually. This allows for non-symmetric CPUs. Each CPU would have
its own policy manager driving OP selection for that CPU (the set of
policy managers could be shared, as I believe cpufreq shares governors
among CPU).
(2) Create CPU-group operating points that captured the power
parameters for each of a set of CPUs (that is, a group OP would be a
vector of CPU OPs).
(3) Create CPU-group operating points that varied a group of CPUs
symetrically (that is, one set of CPU-level power parameters shared
across a set of CPUs), with group-level policy managers that control
transitions for the group in lockstep.
(4) Create CPU-group operating points that varied a group of CPUs
symetrically (as in (3)), and have both group-level policy managers and
a system-level policy manager that moves CPUs from one group to
another.
scott
--
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il 61820
e-mail: preece@motorola.com fax: +1-217-384-8550
phone: +1-217-384-8589 cell: +1-217-433-6114 pager: 2174336114@vtext.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2
2006-09-19 19:46 [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2 Scott E. Preece
2006-09-19 20:06 ` Eugeny S. Mints
@ 2006-09-22 14:09 ` Pavel Machek
2006-09-26 21:45 ` Matthew Locke
1 sibling, 1 reply; 19+ messages in thread
From: Pavel Machek @ 2006-09-22 14:09 UTC (permalink / raw)
To: Scott E. Preece; +Cc: eugeny.mints, linux-pm, linux-kernel
Hi!
> | > >>+struct powerop_driver {
> | > >>+ char *name;
> | > >>+ void *(*create_point) (const char *pwr_params, va_list args);
> | > >>+ int (*set_point) (void *md_opt);
> | > >>+ int (*get_point) (void *md_opt, const char *pwr_params, va_list
> | > >>args);
> | > >>+};
> | > >
> | > >We can certainly get better interface than va_list, right?
> | >
> | > Please elaborate.
> |
> | va_list does not provide adequate type checking. I do not think it
> | suitable in driver<->core interface.
> ---
>
> Well, in this particular case the typing probably has to be weak, one
> way or another. The meaning of the parameters is arguably opaque at
Why not have struct powerop_parameters, defined in machine-specific
header somewhere, but used everywhere?
> the interface - the attributes may be meaningful to specific components
> of the system, but are not defined in the standardized interface (which
> would otherwise have to know about all possible kinds of power
> attributes and be changed every time a new one is added).
>
> So, if you'd rather have an array of char* or void* values, that would
> probably also meet the need, but my guess is that the typing is
> intentionally opaque.
Actually array of integers would be better than this.
> | > >How is it going to work on 8cpu box? will
> | > >you have states like cpu1_800MHz_cpu2_1600MHz_cpu3_800MHz_... ?
> | >
> | > i do not operate with term 'state' so I don't understand what it means here.
> |
> | Okay, state here means "operating point". How will operating points
> | look on 8cpu box? That's 256 states if cpus only support "low" and
> | "high". How do you name them?
>
> I don't think you would name the compounded states. Each CPU would need
> to have its own defined set of operating points (since the capabilities
> of the CPUs can reasonably be different).
Well, having few "powerop domains" per system would likely solve that
problem... and problem of 20 devices on my PC. Can we get that?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2
2006-09-19 21:37 Scott E. Preece
@ 2006-09-22 14:11 ` Pavel Machek
2006-09-22 14:48 ` Igor Stoppa
0 siblings, 1 reply; 19+ messages in thread
From: Pavel Machek @ 2006-09-22 14:11 UTC (permalink / raw)
To: Scott E. Preece; +Cc: eugeny.mints, linux-pm, linux-kernel
Hi!
> Hmm. If you assume the CPUs in an SMP system can be in different
> operating points, this would (as Pavel pointed out) result in an
> explosion of operating points.
Problem is not only CPUs, devices are mostly independent in PC
case... it would be nice to solve that problem with same approach.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2
2006-09-22 14:11 ` Pavel Machek
@ 2006-09-22 14:48 ` Igor Stoppa
0 siblings, 0 replies; 19+ messages in thread
From: Igor Stoppa @ 2006-09-22 14:48 UTC (permalink / raw)
To: ext Pavel Machek; +Cc: Scott E. Preece, linux-pm, linux-kernel
On Fri, 2006-09-22 at 16:11 +0200, ext Pavel Machek wrote:
> Hi!
>
> > Hmm. If you assume the CPUs in an SMP system can be in different
> > operating points, this would (as Pavel pointed out) result in an
> > explosion of operating points.
>
> Problem is not only CPUs, devices are mostly independent in PC
> case... it would be nice to solve that problem with same approach.
>
> Pavel
This whole discussion is, imho, very misleading.
The number of CPU in a box or the number of cores in a chip is not a
significant element, per se.
What is really important is how interdependent they are.
In the case of a board with 2, 4 or 8 CPU, the decisione if their states
are tied together or not is not based on the packaging, but rather on
the (possibly suboptimal) HW design: shared clock or power sources
impose constraints and correlations.
Correlations lead to the multiplication of subsystem states, while
constraints curb the number, because if CPU1 and CPU2 share the same
voltage source, then of all the possible states, only those where this
constraint is satisfied are possible.
Remember what an OP is:
a set of values that exaustively and uniquely define the state of a
system.
So if your box has 256 CPUs, I bet that they are not all on the same
board and probably you also have several independently programmable
power sources.
If every power source feeds say 8 CPUs, then the box contains 16
independent subsystems.
Of course one probably would like to orchestrate all of them, but that's
a 2nd level problem, that could be addressed by a power/workload
manager.
However, even starting with localised dynamic power management would
yeld a significant improvement.
About other device within a PC: SW design cannot really change whatever
constraints the HW design is imposing: if 2 devices are sharing a
programmable v/f source, the source is generating a system which
comprises both devices and it has to be addressed as such.
The innterdipendency could be masked at some high abstract level, but
then going down, close to HW, it has to be explicitly dealt with.
--
Cheers,
Igor
Igor Stoppa (Nokia M - OSSO / Tampere)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2
@ 2006-09-22 20:28 Scott E. Preece
2006-09-22 21:18 ` Vitaly Wool
0 siblings, 1 reply; 19+ messages in thread
From: Scott E. Preece @ 2006-09-22 20:28 UTC (permalink / raw)
To: pavel; +Cc: eugeny.mints, linux-pm, linux-kernel
| From pavel@ucw.cz Fri Sep 22 09:12:25 2006
Hi, Pavel,
| ...
|
| > | Okay, state here means "operating point". How will operating points
| > | look on 8cpu box? That's 256 states if cpus only support "low" and
| > | "high". How do you name them?
| >
| > I don't think you would name the compounded states. Each CPU would need
| > to have its own defined set of operating points (since the capabilities
| > of the CPUs can reasonably be different).
|
| Well, having few "powerop domains" per system would likely solve that
| problem... and problem of 20 devices on my PC. Can we get that?
---
I've been hoping somebody would send a good description of exactly what
they mean by "power domain" or "voltage domain", so we could talk about
it...
scott
--
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il 61820
e-mail: preece@motorola.com fax: +1-217-384-8550
phone: +1-217-384-8589 cell: +1-217-433-6114 pager: 2174336114@vtext.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2
@ 2006-09-22 20:34 Scott E. Preece
2006-09-23 11:18 ` Pavel Machek
0 siblings, 1 reply; 19+ messages in thread
From: Scott E. Preece @ 2006-09-22 20:34 UTC (permalink / raw)
To: pavel; +Cc: eugeny.mints, linux-pm, linux-kernel
Note that I don't think PowerOp would cover all devices. In fact, I
think most devices would remain autonomous or controlled as part of
specific subsystems. The only things that PowerOp would bundle together
would be things that aren't independent (and may not even be visible as
"devices" in the usual Linux sense), but that have to be managed
together in changing frequency/voltage. At least, that's the way I
imagined it would work.
scott
| From pavel@ucw.cz Fri Sep 22 09:13:53 2006
|
| Hi!
|
| > Hmm. If you assume the CPUs in an SMP system can be in different
| > operating points, this would (as Pavel pointed out) result in an
| > explosion of operating points.
|
| Problem is not only CPUs, devices are mostly independent in PC
| case... it would be nice to solve that problem with same approach.
|
| Pavel
| --
| (english) http://www.livejournal.com/~pavelmachek
| (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
|
--
scott preece
motorola mobile devices, il67, 1800 s. oak st., champaign, il 61820
e-mail: preece@motorola.com fax: +1-217-384-8550
phone: +1-217-384-8589 cell: +1-217-433-6114 pager: 2174336114@vtext.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2
@ 2006-09-22 21:10 Woodruff, Richard
0 siblings, 0 replies; 19+ messages in thread
From: Woodruff, Richard @ 2006-09-22 21:10 UTC (permalink / raw)
To: Scott E. Preece, pavel; +Cc: linux-pm, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 368 bytes --]
> I've been hoping somebody would send a good description of exactly
what
> they mean by "power domain" or "voltage domain", so we could talk
about
> it...
>
> scott
Here are a couple early mails which illustrate it a bit (sorry if they
mess up any ones mail reader). I try an place a device inside of some
of the abstractions.
Regards,
Richard W.
[-- Attachment #2: Type: message/rfc822, Size: 4577 bytes --]
From: "Woodruff, Richard" <r-woodruff2@ti.com>
To: "David Brownell" <david-b@pacbell.net>, "Gross, Mark" <mark.gross@intel.com>
Cc: "Rhoads, Rob" <rob.rhoads@intel.com>, <alb@google.com>, <amit.kucheria@nokia.com>, <mail@dominikbrodowski.de>, <k.stewart@freescale.com>, <matthew.a.locke@comcast.net>, <sampsa.fabritius@nokia.com>, "Sven-Thorsten Dietrich" <sven@mvista.com>, <ext-tuukka.tikkanen@nokia.com>, "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>, "Mochel, Patrick" <patrick.mochel@intel.com>, <mochel@digitalimplant.org>, <toddpoynor@gmail.com>, <abelay@mit.edu>, "Griffiths, Richard" <Richard.Griffiths@windriver.com>, "Loeliger Jon-LOELIGER" <jdl@freescale.com>, "deVries, Alex" <Alex.deVries@windriver.com>, <dsingleton@mvista.com>, "Cai, Stanley" <stanley.cai@intel.com>
Subject: RE: PM framework
Date: Sun, 28 May 2006 09:55:50 -0500
> > We want to
> > support multiple parents, and the notion of dependencies and
> > constraints. Is a core capability that is needed to accommodate the
> > hierarchical and overlapping areas of influence for clock and
voltage
> > domains.
>
> It supports reparenting, which is how I read "multiple parents".
> If that's not what you mean by "multiple parents", please give an
> example of what you mean. As a rule, once a base clock is generated
> (say, by a PLL) it gets gated into children (and maybe subdivided).
>
> Dependencies of the parent/child type are explicit, while any others
> (as well as constraints, if any) are platform-specific. And right
> now there's no generic code that needs to know about **either** ...
I need to do a little catch up reading... I'm not sure this bit is
complete on target, but...
As far as clocks go a functional module can have multiple independent
clocks required for operation. Say for MMC on OMAP2, the DPLL derived
clock might supply the interface clock, a fixed APLL might provide the
functional clock, and a 32KHz clock might provide denounce. All clocks
are necessary for proper operation. And in the case of OMAP2 all 3 are
controllable at the module level.
In a more general sense for a module's functional operation there may be
many dependencies. This kind of thing goes beyond just clocks. This
can be inter-driver, LCD needs I2C to turn on a backlight.
If you look at any block of functionality you create a 'driver' for
there can be a lot of external dependencies. How to best express these
dependencies is where things get unclear. Currently things like sysfs
and the clk_api handle part of them. They are not complete.
It all gets less and less clear as you layer on more intra-soc-chip
levels. A chip has multiple voltage domains (VD), within each VD there
are several power domains (PD), across several of the PDs (or fully
within a PD) there are multiple clock domains (CD). Each domain level
has a number of sub levels with certain functionality states. Finally
you drop the end module function say MMC into this. Software writes a
single end driver for this which is likely part of some subsystem stack.
This functional block (say MMC) will be part of a VD, PD, multiple CD's
(each domain has multiple levels of operation). To achieve power
management at the end module you get somewhat simple controls
(clk_enable), to achieve domain power controls, you must act on entire
groups with-in each domain type. Domains are contained with-in domains
(VD contains PD) so in order to make a VD change all internal PD need to
be at a certain state, and for this to happen all CD in a PD need to be
in at a certain state. For this to happen end functions (MMC) must have
each managed their end clocks and idles properly.
If you handle each layer from the bottom up, you can achieve massive
power savings. There are many explicit relationships in this picture
which need to be expressed. You can't look at devices individually and
get a domain state, you must look at them together.
OMAP1 which you are very familiar with does not provide this type of
domain partitioning nor the intermediate states (and control of each)
with in each domain. You get much more of the whole chip in sleep
states, but not all the partial states which are useable during active
mode.
Regards,
Richard W.
[-- Attachment #3: Type: message/rfc822, Size: 7779 bytes --]
From: "David Brownell" <david-b@pacbell.net>
To: "Woodruff, Richard" <r-woodruff2@ti.com>
Cc: "Gross, Mark" <mark.gross@intel.com>, "Rhoads, Rob" <rob.rhoads@intel.com>, <alb@google.com>, <amit.kucheria@nokia.com>, <mail@dominikbrodowski.de>, <k.stewart@freescale.com>, <matthew.a.locke@comcast.net>, <sampsa.fabritius@nokia.com>, "Sven-Thorsten Dietrich" <sven@mvista.com>, <ext-tuukka.tikkanen@nokia.com>, "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>, "Mochel, Patrick" <patrick.mochel@intel.com>, <mochel@digitalimplant.org>, <toddpoynor@gmail.com>, <abelay@mit.edu>, "Griffiths, Richard" <Richard.Griffiths@windriver.com>, "Loeliger Jon-LOELIGER" <jdl@freescale.com>, "deVries, Alex" <Alex.deVries@windriver.com>, <dsingleton@mvista.com>, "Cai, Stanley" <stanley.cai@intel.com>
Subject: Re: PM framework
Date: Mon, 29 May 2006 10:20:48 -0500
Message-ID: <200605290820.51778.david-b@pacbell.net>
On Sunday 28 May 2006 7:55 am, Woodruff, Richard wrote:
> I need to do a little catch up reading... I'm not sure this bit is
> complete on target, but...
In terms of explaining clock, power, and voltage domains I think it's a
good explanation, likely worth putting into the framework writeup. The
concepts apply both within chips and across larger systems (like boards),
but they're not always apparent to folks who don't build computers that
run for a long time on itty bitty batteries. ;)
Plus it didn't disagree with that point I was making: the clock API has
no major issues handling some of the most complex real-world clock trees
that Linux deals with today, except the missing clk_must_disable() call.
> As far as clocks go a functional module can have multiple independent
> clocks required for operation. Say for MMC on OMAP2, the DPLL derived
> clock might supply the interface clock, a fixed APLL might provide the
> functional clock, and a 32KHz clock might provide denounce. All clocks
> are necessary for proper operation. And in the case of OMAP2 all 3 are
> controllable at the module level.
And I'm sure Intel and Freescale can come up with similar examples using
their own most power-efficient chips: devices that use multiple clock
domains, each clock having a single parent.
> In a more general sense for a module's functional operation there may be
> many dependencies. This kind of thing goes beyond just clocks. This
> can be inter-driver, LCD needs I2C to turn on a backlight.
>
> If you look at any block of functionality you create a 'driver' for
> there can be a lot of external dependencies. How to best express these
> dependencies is where things get unclear. Currently things like sysfs
> and the clk_api handle part of them. They are not complete.
Well, sysfs is just a way to present kernel data structures, emphasis
currently being on device and driver related information. Using that
kobject framework also helps solve some memory management nightmares.
One example of something that's missing would be an entity explicitly
representing a power or voltage domain. While "struct clk" exposes
clock (sub)domains so drivers can work with them, and associates them
with device model tree nodes (which tend to reflect bus hierarchies not
power or clock hierarchies...) the corresponding notions for switching
power domains on/off/higher/lower are platform-specific rarities.
Any driver needing to use one adds lots of platform-specific code...
which is contrary to the goal of maintainability.
> It all gets less and less clear as you layer on more intra-soc-chip
> levels. A chip has multiple voltage domains (VD), within each VD there
> are several power domains (PD), across several of the PDs (or fully
> within a PD) there are multiple clock domains (CD). Each domain level
> has a number of sub levels with certain functionality states. Finally
> you drop the end module function say MMC into this. Software writes a
> single end driver for this which is likely part of some subsystem stack.
> This functional block (say MMC) will be part of a VD, PD, multiple CD's
> (each domain has multiple levels of operation). To achieve power
> management at the end module you get somewhat simple controls
> (clk_enable), to achieve domain power controls, you must act on entire
> groups with-in each domain type. Domains are contained with-in domains
> (VD contains PD) so in order to make a VD change all internal PD need to
> be at a certain state, and for this to happen all CD in a PD need to be
> in at a certain state. For this to happen end functions (MMC) must have
> each managed their end clocks and idles properly.
Most of that also applies at the board integration level. MMC is a good
example to build on here, because "supply 1.8V to slot 2" is likely to
have a variety of board-specific implementations, and a given MMC card
might well need a voltage that is not currently available.
> If you handle each layer from the bottom up, you can achieve massive
> power savings. There are many explicit relationships in this picture
> which need to be expressed. You can't look at devices individually and
> get a domain state, you must look at them together.
Does "need to be expressed" need to be much more than "here's the
API that drivers use"? That is, what's the point of expressing
that stuff? It's easy to understand "writing power-aware drivers".
Also "monitoring power-aware systems". Neither of those goals would
seem to need much in terms of infrastructure, beyond adopting some
lightweight APIs, analagous to the current clock API.
> OMAP1 which you are very familiar with does not provide this type of
> domain partitioning nor the intermediate states (and control of each)
> with in each domain. You get much more of the whole chip in sleep
> states, but not all the partial states which are useable during active
> mode.
So who defines the states a given Linux system will use ... in terms
of the scope this new "framework" will address?
SOC/chip vendor docs more or less define what the SOC is expected to handle.
Board designers add to that, including constraints that reduce scope, as
does that platform's Linux support. The application software layer(s) do
the same. And most end users won't care about many of the details.
- Dave
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2
2006-09-22 20:28 Scott E. Preece
@ 2006-09-22 21:18 ` Vitaly Wool
0 siblings, 0 replies; 19+ messages in thread
From: Vitaly Wool @ 2006-09-22 21:18 UTC (permalink / raw)
To: Scott E. Preece; +Cc: pavel, linux-pm, linux-kernel
On 9/23/06, Scott E. Preece <preece@motorola.com> wrote:
>
> I've been hoping somebody would send a good description of exactly what
> they mean by "power domain" or "voltage domain", so we could talk about
> it...
I understand it as an arbitrary set of devices grouped according to a
certain rule. An operating point will then deal with set of power
domains not with set of devices.
Vitaly
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2
2006-09-22 20:34 Scott E. Preece
@ 2006-09-23 11:18 ` Pavel Machek
2006-09-24 21:33 ` Matthew Locke
0 siblings, 1 reply; 19+ messages in thread
From: Pavel Machek @ 2006-09-23 11:18 UTC (permalink / raw)
To: Scott E. Preece; +Cc: eugeny.mints, linux-pm, linux-kernel
Hi!
> Note that I don't think PowerOp would cover all devices. In fact, I
> think most devices would remain autonomous or controlled as part of
> specific subsystems. The only things that PowerOp would bundle together
> would be things that aren't independent (and may not even be visible as
> "devices" in the usual Linux sense), but that have to be managed
> together in changing frequency/voltage. At least, that's the way I
> imagined it would work.
Well, two objections to that
a) current powerop code does not handle 256 CPU machine, because that
would need 256 independend bundles, and powerop has hardcoded "only
one bundle" rule.
b) having some devices controlled by powerop and some by specific
subsystem is indeed ugly. I'd hope powerop would cover all the
devices. (Or maybe cover _no_ devices). Userland should not need to
know if touchscreen is part of SoC or if it happens to be independend
on given machine.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2
2006-09-23 11:18 ` Pavel Machek
@ 2006-09-24 21:33 ` Matthew Locke
2006-09-24 21:45 ` Pavel Machek
0 siblings, 1 reply; 19+ messages in thread
From: Matthew Locke @ 2006-09-24 21:33 UTC (permalink / raw)
To: Pavel Machek; +Cc: linux-pm, Scott E. Preece, linux-kernel
On Sep 23, 2006, at 4:18 AM, Pavel Machek wrote:
> Hi!
>
>> Note that I don't think PowerOp would cover all devices. In fact, I
>> think most devices would remain autonomous or controlled as part of
>> specific subsystems. The only things that PowerOp would bundle
>> together
>> would be things that aren't independent (and may not even be visible
>> as
>> "devices" in the usual Linux sense), but that have to be managed
>> together in changing frequency/voltage. At least, that's the way I
>> imagined it would work.
>
> Well, two objections to that
>
> a) current powerop code does not handle 256 CPU machine, because that
> would need 256 independend bundles, and powerop has hardcoded "only
> one bundle" rule.
The 256 is only a temporary implementation limitation.
>
> b) having some devices controlled by powerop and some by specific
> subsystem is indeed ugly. I'd hope powerop would cover all the
> devices. (Or maybe cover _no_ devices). Userland should not need to
> know if touchscreen is part of SoC or if it happens to be independend
> on given machine.
PowerOP does *not* cover devices. It covers system level parameters
such clocks, buses, voltages.
>
> Pavel
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures)
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2
2006-09-24 21:33 ` Matthew Locke
@ 2006-09-24 21:45 ` Pavel Machek
0 siblings, 0 replies; 19+ messages in thread
From: Pavel Machek @ 2006-09-24 21:45 UTC (permalink / raw)
To: Matthew Locke; +Cc: linux-pm, Scott E. Preece, linux-kernel
Hi!
> >Well, two objections to that
> >
> >a) current powerop code does not handle 256 CPU machine, because that
> >would need 256 independend bundles, and powerop has hardcoded "only
> >one bundle" rule.
>
> The 256 is only a temporary implementation limitation.
Really? 256 CPUs mean 2^256 states. How do you handle that without
introducing vectors?
> >b) having some devices controlled by powerop and some by specific
> >subsystem is indeed ugly. I'd hope powerop would cover all the
> >devices. (Or maybe cover _no_ devices). Userland should not need to
> >know if touchscreen is part of SoC or if it happens to be independend
> >on given machine.
>
> PowerOP does *not* cover devices. It covers system level parameters
> such clocks, buses, voltages.
I've seen "usb enabled" in one of examples.. and that sure seems like
device to me.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2
2006-09-22 14:09 ` Pavel Machek
@ 2006-09-26 21:45 ` Matthew Locke
0 siblings, 0 replies; 19+ messages in thread
From: Matthew Locke @ 2006-09-26 21:45 UTC (permalink / raw)
To: Pavel Machek; +Cc: linux-pm, Scott E. Preece, linux-kernel
On Sep 22, 2006, at 7:09 AM, Pavel Machek wrote:
> Hi!
>
>> | > >>+struct powerop_driver {
>> | > >>+ char *name;
>> | > >>+ void *(*create_point) (const char *pwr_params, va_list args);
>> | > >>+ int (*set_point) (void *md_opt);
>> | > >>+ int (*get_point) (void *md_opt, const char *pwr_params,
>> va_list
>> | > >>args);
>> | > >>+};
>> | > >
>> | > >We can certainly get better interface than va_list, right?
>> | >
>> | > Please elaborate.
>> |
>> | va_list does not provide adequate type checking. I do not think it
>> | suitable in driver<->core interface.
>> ---
>>
>> Well, in this particular case the typing probably has to be weak, one
>> way or another. The meaning of the parameters is arguably opaque at
>
> Why not have struct powerop_parameters, defined in machine-specific
> header somewhere, but used everywhere?
We started out with a machine-specific header. The general feedback on
the linux-pm list was that a complete machine independent interface was
preferred. At first Eugeny and I were against that but the resulting
interface is much more flexible. Users of the PowerOP API do not have
to include an asm/powerop.h to use it. Instead you can get all the
information you need from the interface. Also, using the current
implementation you can get/set any arbitrary number and order of the
power parameters. For example, a platform has parameters p0-p5. An
operating point could be registered with values for p0,p1 and another
with p0,p1,p2,p3,p4 and p5. The code that registers operating points
and accesses existing operating points don't have to know the correct
order of parameters, provide values for the entire parameter list or
have a machine specific header. In addition to being machine
independent, the current implementation seems to works nicely for
integrating with cpufreq.
We are not completely satisfied with the string parsing that results
from the current interface but we can improve it over time.
>
>> the interface - the attributes may be meaningful to specific
>> components
>> of the system, but are not defined in the standardized interface
>> (which
>> would otherwise have to know about all possible kinds of power
>> attributes and be changed every time a new one is added).
>>
>> So, if you'd rather have an array of char* or void* values, that would
>> probably also meet the need, but my guess is that the typing is
>> intentionally opaque.
>
> Actually array of integers would be better than this.
>
>> | > >How is it going to work on 8cpu box? will
>> | > >you have states like cpu1_800MHz_cpu2_1600MHz_cpu3_800MHz_... ?
>> | >
>> | > i do not operate with term 'state' so I don't understand what it
>> means here.
>> |
>> | Okay, state here means "operating point". How will operating points
>> | look on 8cpu box? That's 256 states if cpus only support "low" and
>> | "high". How do you name them?
>>
>> I don't think you would name the compounded states. Each CPU would
>> need
>> to have its own defined set of operating points (since the
>> capabilities
>> of the CPUs can reasonably be different).
>
> Well, having few "powerop domains" per system would likely solve that
> problem... and problem of 20 devices on my PC. Can we get that?
>
> Pavel
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures)
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/linux-pm
>
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2006-09-26 21:45 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-19 19:46 [linux-pm] [PATCH] PowerOP, PowerOP Core, 1/2 Scott E. Preece
2006-09-19 20:06 ` Eugeny S. Mints
2006-09-22 14:09 ` Pavel Machek
2006-09-26 21:45 ` Matthew Locke
-- strict thread matches above, loose matches on Subject: below --
2006-09-22 21:10 Woodruff, Richard
2006-09-22 20:34 Scott E. Preece
2006-09-23 11:18 ` Pavel Machek
2006-09-24 21:33 ` Matthew Locke
2006-09-24 21:45 ` Pavel Machek
2006-09-22 20:28 Scott E. Preece
2006-09-22 21:18 ` Vitaly Wool
2006-09-19 21:37 Scott E. Preece
2006-09-22 14:11 ` Pavel Machek
2006-09-22 14:48 ` Igor Stoppa
2006-09-14 14:37 Eugeny S. Mints
2006-09-18 10:44 ` [linux-pm] " Pavel Machek
2006-09-18 11:32 ` Eugeny S. Mints
2006-09-18 19:58 ` Eugeny S. Mints
2006-09-18 20:07 ` Vitaly Wool
2006-09-19 18:22 ` Pavel Machek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox