[RFC 0/3] Runtime PM support for named power states

public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC 0/3] Runtime PM support for named power states
@ 2005-09-30 20:14 Alan Stern
  2005-10-02 20:43 ` Pavel Machek
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Alan Stern @ 2005-09-30 20:14 UTC (permalink / raw)
  To: Linux-pm mailing list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3159 bytes --]

This series of patches presents the beginnings of serious runtime PM
support.  It's not meant to be applied as is; the code is more like a
proof-of-principle.  It would be surprising if people didn't have a number
of suggestions for improvements.  And in any case, one portion of it is
far from complete.

What these patches do is introduce support for named power states.  The
first patch contains the core changes, and the other two contain updates
for PCI and USB respectively.  Note that the PCI update is barely
functional; the only PCI driver I changed is the one for the PCI USB
controllers.  Other PCI drivers won't understand the new data fields, so
you shouldn't try to change their power levels using the new named states!

Updated documentation, comments, and the code itself explain pretty well
how this works.  In brief, the dev_pm_info structure has a new field
containing a pointer to a null-terminated list of state names.  There's
also a pointer to the name of the state to be used for a generic runtime
suspend.  The sysfs routines will display these names and match input
against them.  They will also continue to recognize the old state numbers
(0 - 3) and will use them for devices without support for named power
states.

There are at least two more patches on the way, to implement state change 
notifications and auto-suspend/resume.  Stay tuned...

Although it may not show up very clearly, these patches -- especially the 
first -- are running up against the limitations and bad design choices 
originally made for the PM core.

Particularly vexing was the use of a pm_message_t to store a device's
current state.  The event field in pm_message_t doesn't represent a power
state; it represents a _reason_ for a state transition.  Nevertheless, the
current code insists on using it to identify states.  This means that even
though a runtime change may leave a device in the state needed for system
suspend, the core won't realize it because the state was entered for the
wrong reason!  This is so widespread I didn't try to change it.  Instead I
decided that PM_EVENT_ON would be synonymous with the default (i.e., first
or full-power) entry in the state name array.

A consequence of this folly shows up when a device has been put in a
low-power state using sysfs and then suspend-to-disk is done.  The device
will first be woken up, just so it can be frozen, then woken up again, and
finally suspended -- ending up in the very state it started from!

The runtime PM routines insist on storing values in
dev->power.power_state.  This seems foolish since the system PM routines 
don't do it.  In any case, what gets stored there should be decided by the 
driver, not by the core.  Presumably these stores could be removed, but I 
wasn't sure so I left them in.

Finally, there's no obvious mapping between the PM_EVENT_FROZEN and 
PM_EVENT_SUSPENDED values and the new named states.  Obviously devices 
will be left in _some_ state, but the PM core can't tell which.  This will 
show up in the sysfs file if you write a number from 1 to 3; instead of 
displaying a named state the file will now just display the number.

Alan Stern

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] Runtime PM support for named power states
  2005-09-30 20:14 [RFC 0/3] Runtime PM support for named power states Alan Stern
@ 2005-10-02 20:43 ` Pavel Machek
  2005-10-03 15:21   ` Alan Stern
  2005-10-05  6:04 ` [RFC 0/3] " David Brownell
  2005-10-19  0:34 ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 10+ messages in thread
From: Pavel Machek @ 2005-10-02 20:43 UTC (permalink / raw)
  To: Alan Stern; +Cc: Linux-pm mailing list

[-- Attachment #1: Type: text/plain, Size: 494 bytes --]

Hi!

> This series of patches presents the beginnings of serious runtime PM
> support.  It's not meant to be applied as is; the code is more like a
> proof-of-principle.  It would be surprising if people didn't have a number
> of suggestions for improvements.  And in any case, one portion of it is
> far from complete.

Looks good to me. It converts useless .../power/state files into
something usable.
								Pavel
-- 
if you have sharp zaurus hardware you don't need... you know my address

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] Runtime PM support for named power states
  2005-10-02 20:43 ` Pavel Machek
@ 2005-10-03 15:21   ` Alan Stern
  2005-10-03 16:08     ` Jordan Crouse
  0 siblings, 1 reply; 10+ messages in thread
From: Alan Stern @ 2005-10-03 15:21 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Linux-pm mailing list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 565 bytes --]

On Sun, 2 Oct 2005, Pavel Machek wrote:

> Hi!
> 
> > This series of patches presents the beginnings of serious runtime PM
> > support.  It's not meant to be applied as is; the code is more like a
> > proof-of-principle.  It would be surprising if people didn't have a number
> > of suggestions for improvements.  And in any case, one portion of it is
> > far from complete.
> 
> Looks good to me. It converts useless .../power/state files into
> something usable.

Thank you.  It's a start, anyway.  My real question is: Is this the right 
way to go?

Alan Stern


[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Runtime PM support for named power states
  2005-10-03 15:21   ` Alan Stern
@ 2005-10-03 16:08     ` Jordan Crouse
  0 siblings, 0 replies; 10+ messages in thread
From: Jordan Crouse @ 2005-10-03 16:08 UTC (permalink / raw)
  To: Linux-pm mailing list

[-- Attachment #1: Type: text/plain, Size: 565 bytes --]

> Thank you.  It's a start, anyway.  My real question is: Is this the right 
> way to go?

My two cents are that its definately a thrust in the right direction.  The
important thing is that it makes the devices start thinking about how to
transition between power states outside of a system level event.  This is
mandatory infrastructure, regardless of the higher level policy management 
(or lack thereof) that we eventually agree upon.

Jordan

-- 
Jordan Crouse
Senior Linux Engineer
AMD - Personal Connectivity Solutions Group
<www.amd.com/embeddedprocessors>

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] Runtime PM support for named power states
  2005-09-30 20:14 [RFC 0/3] Runtime PM support for named power states Alan Stern
  2005-10-02 20:43 ` Pavel Machek
@ 2005-10-05  6:04 ` David Brownell
  2005-10-05  7:44   ` Adam Belay
  2005-10-19  0:34 ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 10+ messages in thread
From: David Brownell @ 2005-10-05  6:04 UTC (permalink / raw)
  To: stern, linux-pm

[-- Attachment #1: Type: text/plain, Size: 1482 bytes --]

> Although it may not show up very clearly, these patches -- especially the 
> first -- are running up against the limitations and bad design choices 
> originally made for the PM core.

I think some parts just "grew"; not everything was "designed".  :)

> Particularly vexing was the use of a pm_message_t to store a device's
> current state.  The event field in pm_message_t doesn't represent a power
> state; it represents a _reason_ for a state transition.  Nevertheless, the
> current code insists on using it to identify states.

I suspect only driver code should ever care about device states.

The PM core should just tell drivers to become compatible with some
new constraint (like ACPI S3, generally implying devices in PCI D2
or D3; while S1 doesn't) ... and not worry about whether that involves
a state change or not.

Maybe they're already _in_ that state for example.

> Finally, there's no obvious mapping between the PM_EVENT_FROZEN and 
> PM_EVENT_SUSPENDED values and the new named states.  Obviously devices 
> will be left in _some_ state, but the PM core can't tell which.

The PM core shouldn't need to care.  If the driver is in a state
compatible with that message, all is fine; the driver should be
able to ignore the message.

That said, there's still confusion between FREEZE (which seems
more like a "standby" with focus on quick time to full-ON) and
SUSPEND (which for spinning up a disk would take part of a minute
to get back to full-ON).

- Dave

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] Runtime PM support for named power states
  2005-10-05  6:04 ` [RFC 0/3] " David Brownell
@ 2005-10-05  7:44   ` Adam Belay
  2005-10-05 23:24     ` David Brownell
  0 siblings, 1 reply; 10+ messages in thread
From: Adam Belay @ 2005-10-05  7:44 UTC (permalink / raw)
  To: David Brownell; +Cc: linux-pm

[-- Attachment #1: Type: text/plain, Size: 3028 bytes --]

On Tue, Oct 04, 2005 at 11:04:35PM -0700, David Brownell wrote:
> > Although it may not show up very clearly, these patches -- especially the 
> > first -- are running up against the limitations and bad design choices 
> > originally made for the PM core.
> 
> I think some parts just "grew"; not everything was "designed".  :)
> 
> 
> > Particularly vexing was the use of a pm_message_t to store a device's
> > current state.  The event field in pm_message_t doesn't represent a power
> > state; it represents a _reason_ for a state transition.  Nevertheless, the
> > current code insists on using it to identify states.
> 
> I suspect only driver code should ever care about device states.

I agree.  I'm not sure if we should represent physical device states
specifically.  For most PCI devices, the device is either on (D0) or
off (D1-D3).  D1-D3 are often determined by which wake capabilities have
been enabled.  Therefore, it seems reasonable to tell the driver to turn
off the device and let it figure out which state would meet all of the
wake constraints.  In other words, the ability to specify the exact D-state
isn't especially useful.

Some drivers may want to allow userspace to have more direct control of
their power states.  They can create sysfs attributes to allow for this
sort of device-specific control.

> 
> The PM core should just tell drivers to become compatible with some
> new constraint (like ACPI S3, generally implying devices in PCI D2
> or D3; while S1 doesn't) ... and not worry about whether that involves
> a state change or not.
> 
> Maybe they're already _in_ that state for example.

Yeah, the thought occurred to me.  One way of approaching the problem is to
have the PM core use logical states such as the following:

1.) ON
2.) FREEZE
3.) SUSPEND

others?

This is similar to our current implementation.

Device drivers would be responsible for watching physical device states like
D0, D1, etc.  Some devices might have rather complex power management options
and may contain several physical states.  This doesn't mean the standard PM
code can't provide libraries to make this device state tracking easier for
most drivers.

> 
> 
> > Finally, there's no obvious mapping between the PM_EVENT_FROZEN and 
> > PM_EVENT_SUSPENDED values and the new named states.  Obviously devices 
> > will be left in _some_ state, but the PM core can't tell which.
> 
> The PM core shouldn't need to care.  If the driver is in a state
> compatible with that message, all is fine; the driver should be
> able to ignore the message.
> 
> That said, there's still confusion between FREEZE (which seems
> more like a "standby" with focus on quick time to full-ON) and
> SUSPEND (which for spinning up a disk would take part of a minute
> to get back to full-ON).

My understanding is as follows:

FREEZE:
-stop device operation
-free interrupts, stop timers, etc.

SUSPEND:
-call FREEZE if not frozen.
-physically power down the device

In other words, FREEZE avoids changing power states.

Thanks,
Adam

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] Runtime PM support for named power states
  2005-10-05  7:44   ` Adam Belay
@ 2005-10-05 23:24     ` David Brownell
  2005-10-06 17:00       ` Alan Stern
  0 siblings, 1 reply; 10+ messages in thread
From: David Brownell @ 2005-10-05 23:24 UTC (permalink / raw)
  To: Adam Belay; +Cc: linux-pm

[-- Attachment #1: Type: text/plain, Size: 1237 bytes --]

On Wednesday 05 October 2005 12:44 am, Adam Belay wrote:
> On Tue, Oct 04, 2005 at 11:04:35PM -0700, David Brownell wrote: 
> > 
> > The PM core should just tell drivers to become compatible with some
> > new constraint (like ACPI S3, generally implying devices in PCI D2
> > or D3; while S1 doesn't) ... and not worry about whether that involves
> > a state change or not.
> > 
> > Maybe they're already _in_ that state for example.
> 
> Yeah, the thought occurred to me.  One way of approaching the problem is to
> have the PM core use logical states such as the following:
> 
> 1.) ON
> 2.) FREEZE
> 3.) SUSPEND
> 
> others?
> 
> This is similar to our current implementation.

Which is a problem, as I've described before.  The logical states
must be system-specific.  Some systems have more states than just
those three ... and in particular, the states have ** SOLID **
and ** PRECISE ** definitions.  Unlike those three.

At the risk of sounding like a broken record, I'll give the
same example I've given before:  some states may not support
particular clocks, like the 48 MHz USB clock.  So the driver
needs to know which ** PRECISE ** system-specific state is
being entered ... so it can turn off the clock if need be.

- Dave

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] Runtime PM support for named power states
  2005-10-05 23:24     ` David Brownell
@ 2005-10-06 17:00       ` Alan Stern
  0 siblings, 0 replies; 10+ messages in thread
From: Alan Stern @ 2005-10-06 17:00 UTC (permalink / raw)
  To: David Brownell; +Cc: linux-pm

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1647 bytes --]

On Wed, 5 Oct 2005, David Brownell wrote:

> On Wednesday 05 October 2005 12:44 am, Adam Belay wrote:
> > On Tue, Oct 04, 2005 at 11:04:35PM -0700, David Brownell wrote: 
> > > 
> > > The PM core should just tell drivers to become compatible with some
> > > new constraint (like ACPI S3, generally implying devices in PCI D2
> > > or D3; while S1 doesn't) ... and not worry about whether that involves
> > > a state change or not.
> > > 
> > > Maybe they're already _in_ that state for example.
> > 
> > Yeah, the thought occurred to me.  One way of approaching the problem is to
> > have the PM core use logical states such as the following:
> > 
> > 1.) ON
> > 2.) FREEZE
> > 3.) SUSPEND
> > 
> > others?
> > 
> > This is similar to our current implementation.
> 
> Which is a problem, as I've described before.  The logical states
> must be system-specific.  Some systems have more states than just
> those three ... and in particular, the states have ** SOLID **
> and ** PRECISE ** definitions.  Unlike those three.
> 
> At the risk of sounding like a broken record, I'll give the
> same example I've given before:  some states may not support
> particular clocks, like the 48 MHz USB clock.  So the driver
> needs to know which ** PRECISE ** system-specific state is
> being entered ... so it can turn off the clock if need be.

We can make it easy for a platform to define new or extra system states.  
Things like ON, FREEZE, and SUSPEND can be represented by new PM_EVENT_xxx
codes, which the relevant drivers would understand.  Other drivers would
have to be smart enough to ignore the any event codes they don't
recognize.

Alan Stern


[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] Runtime PM support for named power states
  2005-09-30 20:14 [RFC 0/3] Runtime PM support for named power states Alan Stern
  2005-10-02 20:43 ` Pavel Machek
  2005-10-05  6:04 ` [RFC 0/3] " David Brownell
@ 2005-10-19  0:34 ` Benjamin Herrenschmidt
  2005-10-19 15:17   ` Alan Stern
  2 siblings, 1 reply; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2005-10-19  0:34 UTC (permalink / raw)
  To: Alan Stern; +Cc: Linux-pm mailing list

[-- Attachment #1: Type: text/plain, Size: 3010 bytes --]

On Fri, 2005-09-30 at 16:14 -0400, Alan Stern wrote:

> Updated documentation, comments, and the code itself explain pretty well
> how this works.  In brief, the dev_pm_info structure has a new field
> containing a pointer to a null-terminated list of state names.  There's
> also a pointer to the name of the state to be used for a generic runtime
> suspend.  The sysfs routines will display these names and match input
> against them.  They will also continue to recognize the old state numbers
> (0 - 3) and will use them for devices without support for named power
> states.

A couple of things: Is it worth adding some void* with every state to be
used by the device ? (Or maybe to be made bus specific) that provides
additional infos that can be defined per bus ?

Things like operational power consumption per state, stuff like that ? 

Also, you don't expose any state dependency here. How do you plan to
handle that ?

> Particularly vexing was the use of a pm_message_t to store a device's
> current state.  The event field in pm_message_t doesn't represent a power
> state; it represents a _reason_ for a state transition. 

Yes. One of the reason it was changed to "int state" to "pm_message_t"
was to make the type more explicitely matching what it really is. As I
wrote several times, I woudl prefer _different_ callbacks for the new
names state mecanism.

>  Nevertheless, the
> current code insists on using it to identify states.  This means that even
> though a runtime change may leave a device in the state needed for system
> suspend, the core won't realize it because the state was entered for the
> wrong reason!  This is so widespread I didn't try to change it.  Instead I
> decided that PM_EVENT_ON would be synonymous with the default (i.e., first
> or full-power) entry in the state name array.

No, I'd rather fix it all :)

> A consequence of this folly shows up when a device has been put in a
> low-power state using sysfs and then suspend-to-disk is done.  The device
> will first be woken up, just so it can be frozen, then woken up again, and
> finally suspended -- ending up in the very state it started from!
> 
> The runtime PM routines insist on storing values in
> dev->power.power_state.  This seems foolish since the system PM routines 
> don't do it.  In any case, what gets stored there should be decided by the 
> driver, not by the core.  Presumably these stores could be removed, but I 
> wasn't sure so I left them in.

Yes, there is some inconsistency there

> Finally, there's no obvious mapping between the PM_EVENT_FROZEN and 
> PM_EVENT_SUSPENDED values and the new named states.  Obviously devices 
> will be left in _some_ state, but the PM core can't tell which.  This will 
> show up in the sysfs file if you write a number from 1 to 3; instead of 
> displaying a named state the file will now just display the number.

That's why I'd like to have the old routines do the mapping (or do
nothing) and separate new routines to handle named states.

Ben.



[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 0/3] Runtime PM support for named power states
  2005-10-19  0:34 ` Benjamin Herrenschmidt
@ 2005-10-19 15:17   ` Alan Stern
  0 siblings, 0 replies; 10+ messages in thread
From: Alan Stern @ 2005-10-19 15:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Linux-pm mailing list

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4916 bytes --]

On Wed, 19 Oct 2005, Benjamin Herrenschmidt wrote:

> On Fri, 2005-09-30 at 16:14 -0400, Alan Stern wrote:
> 
> > Updated documentation, comments, and the code itself explain pretty well
> > how this works.  In brief, the dev_pm_info structure has a new field
> > containing a pointer to a null-terminated list of state names.  There's
> > also a pointer to the name of the state to be used for a generic runtime
> > suspend.  The sysfs routines will display these names and match input
> > against them.  They will also continue to recognize the old state numbers
> > (0 - 3) and will use them for devices without support for named power
> > states.
> 
> A couple of things: Is it worth adding some void* with every state to be
> used by the device ? (Or maybe to be made bus specific) that provides
> additional infos that can be defined per bus ?
> 
> Things like operational power consumption per state, stuff like that ? 

Remember that this was just a preliminary patch.  It made minimal changes; 
the stuff you're talking about can always be added on later.

On the other hand, I'm not sure where this additional information should
be displayed.  The sysfs power/state file wouldn't be a good place; I'm
already pushing the limits of what you're supposed to put in these files
by listing all the available states with aliases and a "current-state"  
indicator.  Maybe you'd like to add a new attribute file under the power/
directory for this?

> Also, you don't expose any state dependency here. How do you plan to
> handle that ?

The PM core can't handle it, since the core doesn't know what the
requirements are for each kind of device.  Drivers will have to handle
intra-device dependencies themselves, by returning an error code when
asked to make a state transition they can't perform.

Inter-device dependencies will be handled by a mechanism I haven't coded 
up yet.  It will amount to a bus- or device-specific way for a driver to 
receive notifications from the drivers of its device's children.  The 
driver will then be able to change the device's state to meet the 
requirements of the children.

> > Particularly vexing was the use of a pm_message_t to store a device's
> > current state.  The event field in pm_message_t doesn't represent a power
> > state; it represents a _reason_ for a state transition. 
> 
> Yes. One of the reason it was changed to "int state" to "pm_message_t"
> was to make the type more explicitely matching what it really is. As I
> wrote several times, I woudl prefer _different_ callbacks for the new
> names state mecanism.

I don't see why.  Most of the work these callbacks have to do would end up
being the same.  Besides, do you then want to add even more callbacks for
suspend caused by inactivity timeout and resume caused by I/O requests?  
No, I think we're better off with minimal callbacks and the "event" code
in pm_message_t indicating the reason for the callback.

> >  Nevertheless, the
> > current code insists on using it to identify states.  This means that even
> > though a runtime change may leave a device in the state needed for system
> > suspend, the core won't realize it because the state was entered for the
> > wrong reason!  This is so widespread I didn't try to change it.  Instead I
> > decided that PM_EVENT_ON would be synonymous with the default (i.e., first
> > or full-power) entry in the state name array.
> 
> No, I'd rather fix it all :)

That will be hard to do until all the drivers are converted.  My proposal 
at least lets us implement the changes a piece at a time.

> > The runtime PM routines insist on storing values in
> > dev->power.power_state.  This seems foolish since the system PM routines 
> > don't do it.  In any case, what gets stored there should be decided by the 
> > driver, not by the core.  Presumably these stores could be removed, but I 
> > wasn't sure so I left them in.
> 
> Yes, there is some inconsistency there

I've decided to take out the lines where the runtime PM routines overwrite 
the value in dev->power.power_state.  Drivers alone should maintain that 
field.

> > Finally, there's no obvious mapping between the PM_EVENT_FROZEN and 
> > PM_EVENT_SUSPENDED values and the new named states.  Obviously devices 
> > will be left in _some_ state, but the PM core can't tell which.  This will 
> > show up in the sysfs file if you write a number from 1 to 3; instead of 
> > displaying a named state the file will now just display the number.
> 
> That's why I'd like to have the old routines do the mapping (or do
> nothing) and separate new routines to handle named states.

There's no reason the old routines can't do the mapping when the event
code is PM_EVENT_SUSPEND or PM_EVENT_FREEZE, while still using the
explicit named state when the event is PM_EVENT_RUNTIME.  This can even be
handled at the bus-level, so that individual drivers won't have to worry
about it.

Alan Stern

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-10-19 15:17 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-30 20:14 [RFC 0/3] Runtime PM support for named power states Alan Stern
2005-10-02 20:43 ` Pavel Machek
2005-10-03 15:21   ` Alan Stern
2005-10-03 16:08     ` Jordan Crouse
2005-10-05  6:04 ` [RFC 0/3] " David Brownell
2005-10-05  7:44   ` Adam Belay
2005-10-05 23:24     ` David Brownell
2005-10-06 17:00       ` Alan Stern
2005-10-19  0:34 ` Benjamin Herrenschmidt
2005-10-19 15:17   ` Alan Stern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox