[RFC] Linux Power Management

public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC] Linux Power Management
@ 2005-05-03  4:32 Adam Belay
  2005-05-03  6:06 ` Nigel Cunningham
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Adam Belay @ 2005-05-03  4:32 UTC (permalink / raw)
  To: linux-pm

[-- Attachment #1: Type: text/plain, Size: 10262 bytes --]

Hi all,

I've been putting together some documentation for my proposed power
management changes.  In some areas it may be different or more detailed
than what I originally posted.  I look forward to any comments or
suggestions.

Thanks,
Adam

Improving Linux Power Management (DRAFT)
Adam Belay
05/02/05

Terminology
===========

power state - the qualities of a device's power configuration
minimum state - the highest power consumption, most on, state
maximum state - the lowest power consumption, most off, state
power domain - a device with a group of child devices that depend on its
state

Problems with current Linux PM
==============================

Although the existing model is sufficient for suspend and resume, modern
hardware often has more sophisticated power management features.  This
includes runtime power management and wake events.  Also, the current
model doesn't support power domains, a key concept in most bus hardware.

Design Goals
============

This project aims to provide a more useful Linux power management
infrastructure.  Because of the wide array of power management capable
platforms, each with its own unique protocols, it's important to have a
flexible design.  Therefore, simplicity and a solid framework are
favored over platform-specific quirks.

In this model, power management is not limited to sleep and suspend
operations.  Instead, each device has the option of managing its power
dynamically while the system is running.  Parent devices must be aware
of the power requirements of their children.

Userspace interaction with power management policy is a key goal.  While
policy configuration values may be specified by the user, policy
execution should occur in kernel-space whenever possible.  Userspace
will be notified of power events (including device state changes) via
kevents.

Power States
============

Every "power device" or "power resource" has its own unique set of
supported power states.  Characteristics about each state are specified
in a "struct power_state".  This structure is intended primarily for
gathering information.  A typical usage would be in power management
policy decisions.

struct power_state {
	char * name;			/* a human-readable name */

	unsigned int state;		/* the state index number */
	unsigned int flags;		/* some flags that describe the state */
	unsigned int power_consumption; /* in mW */

	struct list_head state_list;
};

#define PM_DEVICE_STATE_USABLE			0x00000001
#define PM_DEVICE_STATE_SLEEPING		0x00000002
#define PM_DEVICE_STATE_OFF			0x00000004

#define PM_DEVICE_STATE_MASK			0xffff0000 /* controller-specific values */

It's likely that more flags will be added as they become necessary.

Power Devices
=============

The base object of this power management implementation is referred to
as a "power device".  Power devices are represented by kobjects, each
with their own children and parents.  A power device may or may not
belong to a "struct device" in the physical device tree.

Every power device can be considered a power domain.  Each domain has
its own power states, but also acts as a container for child power
devices.  These children can specify what they require from the parent
domain.  When the requirements of all children have lowered below a
domain's current state, the parent may choose to also lower its state.

struct pm_device {
	char			* name;		/* a human-readable name for the device */
	struct kobject		kobj;

	pm_state_t		state;		/* the current power state index value */
	pm_state_t		min_state;	/* the minimum supported power state */
	pm_state_t		max_domain_state; /* the maximum possible state of the parent */
	struct list_head	states;		/* a list of "struct power_state" */

	struct list_head	child_list;
	struct list_head	children;	/* a list of child power devices */
	struct pm_device	* domain;	/* the parent power device */

	struct device		* dev;		/* the optional driver model device */

	struct pm_driver	* controller;	/* the power controller driver */
	struct pm_policy	* policy;	/* the policy driver */

	void 			* policy_data;
};

extern int pm_register_device(struct pm_device * dev);
extern void pm_unregister_device(struct pm_device * dev);

extern int pm_set_state(struct pm_device * dev, pm_state_t state);
extern int pm_set_state_force(struct pm_device * dev, pm_state_t state);

extern struct power_state *
pm_get_state_data(struct pm_device * dev, pm_state_t state);

Power Drivers
=============

Power drivers are specialized drivers with knowledge of a specific power
management protocol.  They provide a mechanism for changing the power
state, and update the "struct pm_device" to reflect which states are
available during a global system state transition.

Legacy or ISA devices may choose to implement their own power driver.
Most bus technologies (e.g. PCI) will provide a more general power
driver.

Power state index values are specific to the power driver.

struct pm_driver {
	char * name;

	int  (*update)	 (struct pm_device * dev,
			  struct pm_sys_state * state);

	int  (*get_state)(struct pm_device * dev);
	int  (*set_state)(struct pm_device * dev, pm_state_t state);
};

Power Resources
===============

Generally speaking, "power resources" are power planes, clocks, etc.
that can be individually controlled.

Not every power management object fits into the power domain model,
especially in embedded systems and for ACPI.  Therefore, this
abstraction is needed to complement power domains and fills in any gaps
in the power management object topology.

Power resources are independent of power domains.  Like power devices,
they may have their own list of power states.  However, their
representation is more simplistic than power devices.  The power
management subsystem does not attempt to determine how power devices
depend on power resources or when power resources should be configured
as this is implementation specific.

The main goal behind power resource objects is to provide a framework
for some standardization, export this information to sysfs for
debugging, and act as a stub for future expansion.

struct pm_resource_ops {
	int (*update) (struct pm_resource * res,
		       struct pm_sys_state * state);

	int (*get_state) (struct pm_resource * res);
	int (*set_state) (struct pm_resource * res, pm_state_t state);
};

struct pm_resource {
	char * name;
	struct kobject kobj;

	pm_state_t		state;		/* the current power state index value */
	struct list_head	states;		/* a list of "struct power_state" */

	struct power_resource_ops *ops;		/* operations for controlling the power resource */
};

extern int pm_register_resource(struct power_resource * res);
extern void pm_unregister_resource(struct power_resource * res);

extern int pm_set_resource(struct pm_resource * res, pm_state_t state);

Power Management Policy
=======================

Each power device will have a policy manager.  Policy managers make
power management decisions based on user configurable settings and data
gathered from device drivers.  Generally this will include activity
timers and other methods of determining device idleness.

Most of the power policy manager implementation is device specific, but
a few basic notifications are provided by the power management
subsystem.  This includes when the system state is about to change or
when the net requirements of child devices have changed.

struct power_policy {
	(*requirements_changed)	(struct pm_device * dev,
				 pm_state_t new_max_state);

	(*prepare)		(struct pm_device * dev,
				 struct pm_sys_state * new);
	(*enter)		(struct pm_device * dev,
				 struct pm_sys_state * new);
};

"prepare" is called to stop dynamic power management and prepare for a
global system state change.  "enter" is called to make the actually
state change.  The policy manager will then call, at its discretion,
"pm_set_state".

In the case of resuming, "enter" will actually enable dynamic power
management if it's available.

"enter" is required, "requirements_changed" and "prepare" are optional.

Standard policies will be provided.  As an example, most PCI devices
have simple power management requirements, so they will use a generic
PCI policy manager.  The PCI policy manager might then have its own
hooks (e.g. state selection for wake).

Device Drivers
==============

Linux device drivers must often save and restore state during power
transitions.  The following API is proposed:

->prepare_state(struct device * dev, pm_state_t state,
                unsigned int reason);
->complete_state(struct device * dev, pm_state_t state,
                unsigned int reason);

The following would be an example of a typical transition:

1.) the policy manager decides to put a PCI ethernet card into D3 from
D0.
2.) ->prepare_state is called, the ethernet driver saves its state
information and disables the hardware
3.) the power driver's ->set_state function is called, and power is
actually removed.
4.) ->complete_state is called to cleanup and make any final
adjustments.

* In the case of D3->D0 ->complete_state would restore state.

Possible "reasons" might include DYNAMIC_PM, HALT, REBOOT, SUSPEND,
RESUME, etc.

This API is different from the current ->suspend and ->resume because it
applies to situations outside of system suspend (e.g. runtime power
management) and has an emphasis on specific device power states. 

System Suspend
==============

The following would be a typical flow of execution when transitioning to
a sleep state: (note... this focuses on only the device aspect, there
are firmware issues, process freezing, etc.)

1.) ->prepare is called for each policy manager from the leafs of the
tree to the root, preventing existing states from changing.
2.) ->update is called for each power device, from the root of the tree
to the leafs.  Each power device then reflects the new available states.
3.) ->enter is called for each policy manager from the leafs of the tree
to the root, resulting in actual state changes.

So each device doing the following while walking through the tree:
->prepare_state
->set_state
->complete_state

Conclusion
==========

This document provides a basic summary of a proposed power management
design plan.  It is currently a draft.  Feel free to make any comments
or suggest revisions.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Linux Power Management
  2005-05-03  4:32 [RFC] Linux Power Management Adam Belay
@ 2005-05-03  6:06 ` Nigel Cunningham
  2005-05-03 15:52 ` Alan Stern
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Nigel Cunningham @ 2005-05-03  6:06 UTC (permalink / raw)
  To: Adam Belay; +Cc: Linux-pm mailing list

[-- Attachment #1: Type: text/plain, Size: 12442 bytes --]

Hi Adam.

On Tue, 2005-05-03 at 14:32, Adam Belay wrote:
> Hi all,
> 
> I've been putting together some documentation for my proposed power
> management changes.  In some areas it may be different or more detailed
> than what I originally posted.  I look forward to any comments or
> suggestions.
> 
> Thanks,
> Adam
> 
> 
> 
> Improving Linux Power Management (DRAFT)
> Adam Belay
> 05/02/05

You might like to make this May 2 - my first thought was "5th of
February? That will be a bit out of date!"

> Terminology
> ===========
> 
> power state - the qualities of a device's power configuration
> minimum state - the highest power consumption, most on, state

Could highest != most on in some (rare) cases? Perhaps just put one or
the other.

> maximum state - the lowest power consumption, most off, state
> power domain - a device with a group of child devices that depend on its
> state

Before someone else queries, "its" is right. It's == it is.

> Problems with current Linux PM
> ==============================
> 
> Although the existing model is sufficient for suspend and resume, modern
> hardware often has more sophisticated power management features.  This
> includes runtime power management and wake events.  Also, the current

s/Also/In addition/

> model doesn't support power domains, a key concept in most bus hardware.
> 
> Design Goals
> ============
> 
> This project aims to provide a more useful Linux power management
> infrastructure.  Because of the wide array of power management capable
> platforms, each with its own unique protocols, it's important to have a
> flexible design.  Therefore, simplicity and a solid framework are

I would move "Therefore" to after "are".

> favored over platform-specific quirks.
> 
> In this model, power management is not limited to sleep and suspend
> operations.  Instead, each device has the option of managing its power
> dynamically while the system is running.  Parent devices must be aware
> of the power requirements of their children.
> 
> Userspace interaction with power management policy is a key goal.  While
> policy configuration values may be specified by the user, policy
> execution should occur in kernel-space whenever possible.  Userspace
> will be notified of power events (including device state changes) via

"notified of power events" implies all events. Perhaps "significant
events"? (Of course that still leaves the question as to what is
significant).

> kevents.
> 
> Power States
> ============
> 
> Every "power device" or "power resource" has its own unique set of
> supported power states.  Characteristics about each state are specified
> in a "struct power_state".  This structure is intended primarily for
> gathering information.  A typical usage would be in power management
> policy decisions.
> 
> struct power_state {
> 	char * name;			/* a human-readable name */
> 
> 	unsigned int state;		/* the state index number */
> 	unsigned int flags;		/* some flags that describe the state */

Perhaps it would be good to describe these a little more.

> 	unsigned int power_consumption; /* in mW */
> 
> 	struct list_head state_list;
> };
> 
> #define PM_DEVICE_STATE_USABLE			0x00000001
> #define PM_DEVICE_STATE_SLEEPING		0x00000002
> #define PM_DEVICE_STATE_OFF			0x00000004
> 
> #define PM_DEVICE_STATE_MASK			0xffff0000 /* controller-specific values */
> 
> It's likely that more flags will be added as they become necessary.
> 
> 
> Power Devices
> =============
> 
> The base object of this power management implementation is referred to
> as a "power device".  Power devices are represented by kobjects, each
> with their own children and parents.  A power device may or may not
> belong to a "struct device" in the physical device tree.
> 
> Every power device can be considered a power domain.  Each domain has

considered to be...

> its own power states, but also acts as a container for child power
> devices.  These children can specify what they require from the parent
> domain.  When the requirements of all children have lowered below a
> domain's current state, the parent may choose to also lower its state.
> 
> struct pm_device {
> 	char			* name;		/* a human-readable name for the device */
> 	struct kobject		kobj;
> 
> 	pm_state_t		state;		/* the current power state index value */
> 	pm_state_t		min_state;	/* the minimum supported power state */
> 	pm_state_t		max_domain_state; /* the maximum possible state of the parent */
> 	struct list_head	states;		/* a list of "struct power_state" */
> 
> 	struct list_head	child_list;
> 	struct list_head	children;	/* a list of child power devices */
> 	struct pm_device	* domain;	/* the parent power device */
> 
> 	struct device		* dev;		/* the optional driver model device */
> 
> 	struct pm_driver	* controller;	/* the power controller driver */
> 	struct pm_policy	* policy;	/* the policy driver */
> 
> 	void 			* policy_data;
> };
> 
> extern int pm_register_device(struct pm_device * dev);
> extern void pm_unregister_device(struct pm_device * dev);
> 
> extern int pm_set_state(struct pm_device * dev, pm_state_t state);
> extern int pm_set_state_force(struct pm_device * dev, pm_state_t state);
> 
> extern struct power_state *
> pm_get_state_data(struct pm_device * dev, pm_state_t state);
> 
> Power Drivers
> =============
> 
> Power drivers are specialized drivers with knowledge of a specific power
> management protocol.  They provide a mechanism for changing the power
> state, and update the "struct pm_device" to reflect which states are
> available during a global system state transition.
> 
> Legacy or ISA devices may choose to implement their own power driver.
> Most bus technologies (e.g. PCI) will provide a more general power
> driver.
> 
> Power state index values are specific to the power driver.
> 
> struct pm_driver {
> 	char * name;
> 
> 	int  (*update)	 (struct pm_device * dev,
> 			  struct pm_sys_state * state);
> 
> 	int  (*get_state)(struct pm_device * dev);
> 	int  (*set_state)(struct pm_device * dev, pm_state_t state);
> };
> 
> 
> Power Resources
> ===============
> 
> Generally speaking, "power resources" are power planes, clocks, etc.
> that can be individually controlled.
> 
> Not every power management object fits into the power domain model,
> especially in embedded systems and for ACPI.  Therefore, this
> abstraction is needed to complement power domains and fills in any gaps
> in the power management object topology.

"...and fill in..."

> Power resources are independent of power domains.  Like power devices,
> they may have their own list of power states.  However, their
> representation is more simplistic than power devices.  The power
> management subsystem does not attempt to determine how power devices
> depend on power resources or when power resources should be configured
> as this is implementation specific.
> 
> The main goal behind power resource objects is to provide a framework
> for some standardization, export this information to sysfs for
> debugging, and act as a stub for future expansion.
> 
> struct pm_resource_ops {
> 	int (*update) (struct pm_resource * res,
> 		       struct pm_sys_state * state);
> 
> 	int (*get_state) (struct pm_resource * res);
> 	int (*set_state) (struct pm_resource * res, pm_state_t state);
> };
> 
> struct pm_resource {
> 	char * name;
> 	struct kobject kobj;
> 
> 	pm_state_t		state;		/* the current power state index value */
> 	struct list_head	states;		/* a list of "struct power_state" */
> 	
> 	struct power_resource_ops *ops;		/* operations for controlling the power resource */
> };
> 
> extern int pm_register_resource(struct power_resource * res);
> extern void pm_unregister_resource(struct power_resource * res);
> 
> extern int pm_set_resource(struct pm_resource * res, pm_state_t state);
> 
> Power Management Policy
> =======================
> 
> Each power device will have a policy manager.  Policy managers make
> power management decisions based on user configurable settings and data
> gathered from device drivers.  Generally this will include activity
> timers and other methods of determining device idleness.
> 
> Most of the power policy manager implementation is device specific, but
> a few basic notifications are provided by the power management
> subsystem.  This includes when the system state is about to change or
> when the net requirements of child devices have changed.
> 
> struct power_policy {
> 	(*requirements_changed)	(struct pm_device * dev,
> 				 pm_state_t new_max_state);

I'd like to see a description of what this does too :>

> 	(*prepare)		(struct pm_device * dev,
> 				 struct pm_sys_state * new);
> 	(*enter)		(struct pm_device * dev,
> 				 struct pm_sys_state * new);
> };
> 
> "prepare" is called to stop dynamic power management and prepare for a
> global system state change.  "enter" is called to make the actually
> state change.  The policy manager will then call, at its discretion,
> "pm_set_state".
> 
> In the case of resuming, "enter" will actually enable dynamic power
> management if it's available.

Am I right in thinking this implies that one of the flags in a power
state specifies whether the device can choose to change from this state
to another?

> "enter" is required, "requirements_changed" and "prepare" are optional.
> 
> Standard policies will be provided.  As an example, most PCI devices
> have simple power management requirements, so they will use a generic
> PCI policy manager.  The PCI policy manager might then have its own
> hooks (e.g. state selection for wake).
> 
> Device Drivers
> ==============
> 
> Linux device drivers must often save and restore state during power
> transitions.  The following API is proposed:
> 
> ->prepare_state(struct device * dev, pm_state_t state,
>                 unsigned int reason);
> ->complete_state(struct device * dev, pm_state_t state,
>                 unsigned int reason);
> 
> The following would be an example of a typical transition:
> 
> 1.) the policy manager decides to put a PCI ethernet card into D3 from
> D0.
> 2.) ->prepare_state is called, the ethernet driver saves its state
> information and disables the hardware
> 3.) the power driver's ->set_state function is called, and power is
> actually removed.
> 4.) ->complete_state is called to cleanup and make any final
> adjustments.
> 
> * In the case of D3->D0 ->complete_state would restore state.
> 
> Possible "reasons" might include DYNAMIC_PM, HALT, REBOOT, SUSPEND,
> RESUME, etc.
> 
> This API is different from the current ->suspend and ->resume because it
> applies to situations outside of system suspend (e.g. runtime power
> management) and has an emphasis on specific device power states.

I wonder whether reason implies state. Eg. Hard disk driver:

DYNAMIC_PM: Not relevant?
HALT: Flush data, power down device.
REBOOT: Flush data. Don't power down device.
SUSPEND: Flush data. Save state. Power down.
RESUME: Power up if necessary (might be post SUSPEND or QUIESCE).
Restore state.
(My invention) QUIESCE: Flush data. Save state.

I'm assuming here that, independent of all this, the driver knows
whether it was actually used or not, and might therefore not power up
until it sees activity, for example.

> System Suspend
> ==============
> 
> The following would be a typical flow of execution when transitioning to
> a sleep state: (note... this focuses on only the device aspect, there
> are firmware issues, process freezing, etc.)
> 
> 1.) ->prepare is called for each policy manager from the leafs of the
> tree to the root, preventing existing states from changing.
> 2.) ->update is called for each power device, from the root of the tree
> to the leafs.  Each power device then reflects the new available states.
> 3.) ->enter is called for each policy manager from the leafs of the tree
> to the root, resulting in actual state changes.

s/leafs/leaves/

> So each device doing the following while walking through the tree:
> ->prepare_state
> ->set_state
> ->complete_state
> 
> Conclusion
> ==========
> 
> This document provides a basic summary of a proposed power management
> design plan.  It is currently a draft.  Feel free to make any comments
> or suggest revisions.

Hope this is helpful.

Nigel
-- 
Nigel Cunningham
Software Engineer, Canberra, Australia
http://www.cyclades.com
Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028;  Mob: +61 (417) 100 574

Maintainer of Suspend2 Kernel Patches http://suspend2.net


[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Linux Power Management
  2005-05-03  4:32 [RFC] Linux Power Management Adam Belay
  2005-05-03  6:06 ` Nigel Cunningham
@ 2005-05-03 15:52 ` Alan Stern
  2005-05-05  4:39   ` Adam Belay
  2005-05-03 21:40 ` Pavel Machek
  2005-05-08 18:31 ` David Brownell
  3 siblings, 1 reply; 16+ messages in thread
From: Alan Stern @ 2005-05-03 15:52 UTC (permalink / raw)
  To: Adam Belay; +Cc: linux-pm

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2211 bytes --]

On Tue, 3 May 2005, Adam Belay wrote:

> Hi all,
> 
> I've been putting together some documentation for my proposed power
> management changes.  In some areas it may be different or more detailed
> than what I originally posted.  I look forward to any comments or
> suggestions.
> 
> Thanks,
> Adam

I think there are some good ideas here, but you've taken the development 
much too far for something that's still in the "early proposal" stages.  
Also it seems much more elaborate than we really need, and it includes 
distinctions that aren't necessary.

Let me try to summarize quickly the high points as I see them.  You've
defined power devices and power domains, where a power domain is really
just a power device that can act as a parent to other power devices.  In
other words there's no significant difference between devices and domains
-- there's just leaves and interior nodes in a power tree.

You said explicitly that power devices may or may not coincide with struct 
devices in the driver model.  Why do you want to do this?  Why not insist 
that power devices _are_ struct devices?  It would make many things a lot 
easier and allow much more code sharing plus reduction of memory usage.  
(Kobjects aren't cheap, especially for embedded systems.)

You've got power resources totally outside the power tree.  This seems
like a very good way to handle oddball configurations and other things
that don't fit the generic model.  It takes advantage of the fact that
most power constraints can be represented in terms of a device tree even
though some of them can't.

You've got both power drivers and policy managers.  Why separate the two?  
Won't a real implementation have to keep them closely interconnected 
anyway?  Or have I misunderstood?

For that matter, what sort of policy options can the kernel offer to 
userspace?  I'm not aware of many possibilities:

	System power transitions can be thought of as policy changes.

	Individual devices or subsystems can be powered down either
	immediately or after some inactivity timeout.

	Devices may or may not be set to resume on demand.

Anything else?  This list in itself doesn't require much in the way of 
policy management.

Alan Stern

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Linux Power Management
  2005-05-03 15:52 ` Alan Stern
@ 2005-05-05  4:39   ` Adam Belay
  2005-05-08 18:35     ` David Brownell
  0 siblings, 1 reply; 16+ messages in thread
From: Adam Belay @ 2005-05-05  4:39 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-pm

[-- Attachment #1: Type: text/plain, Size: 3738 bytes --]

On Tue, May 03, 2005 at 11:52:45AM -0400, Alan Stern wrote:
> On Tue, 3 May 2005, Adam Belay wrote:
> 
> > Hi all,
> > 
> > I've been putting together some documentation for my proposed power
> > management changes.  In some areas it may be different or more detailed
> > than what I originally posted.  I look forward to any comments or
> > suggestions.
> > 
> > Thanks,
> > Adam
> 
> I think there are some good ideas here, but you've taken the development 
> much too far for something that's still in the "early proposal" stages.  
> Also it seems much more elaborate than we really need, and it includes 
> distinctions that aren't necessary.
> 
> Let me try to summarize quickly the high points as I see them.  You've
> defined power devices and power domains, where a power domain is really
> just a power device that can act as a parent to other power devices.  In
> other words there's no significant difference between devices and domains
> -- there's just leaves and interior nodes in a power tree.
> 
> You said explicitly that power devices may or may not coincide with struct 
> devices in the driver model.  Why do you want to do this?  Why not insist 
> that power devices _are_ struct devices?  It would make many things a lot 
> easier and allow much more code sharing plus reduction of memory usage.  
> (Kobjects aren't cheap, especially for embedded systems.)

I think you make a good point.  My original intention here was to support
strange power relationships that do not match the device tree.  However, I
don't think this will be common, and not every layer in between the tree
has to participate.  So, psuedo-logical device layers like we're seeing in the
PCI express bus driver could just be skipped past.  I'll change my plans to
reflect this.

> You've got power resources totally outside the power tree.  This seems
> like a very good way to handle oddball configurations and other things
> that don't fit the generic model.  It takes advantage of the fact that
> most power constraints can be represented in terms of a device tree even
> though some of them can't.

I'm going to try to develop this idea further.

> 
> You've got both power drivers and policy managers.  Why separate the two?  
> Won't a real implementation have to keep them closely interconnected 
> anyway?  Or have I misunderstood?

My original intention was to have "power drivers" that transition device
power state and "policy managers" that say when to make those state
transitions.  However, after giving it some thought, I don't think the
"policy manager" distinction is so important.  It could just be a component
of the device driver.

> 
> For that matter, what sort of policy options can the kernel offer to 
> userspace?  I'm not aware of many possibilities:

Mostly enabling dynamic power management features and tuning idle timeout
values.

> 
> 	System power transitions can be thought of as policy changes.
> 
> 	Individual devices or subsystems can be powered down either
> 	immediately or after some inactivity timeout.
> 
> 	Devices may or may not be set to resume on demand.
> 
> Anything else?  This list in itself doesn't require much in the way of 
> policy management.

Wake devices would also require userspace configuration of policy.

I appreciate your comments.  To sumarize, I'd like to introduce power
resources into our current PM model and add a concept of device power states.
Also, I'd like improve power domain handling so we can better support dynamic
power management.  However, these power domains will live inside the normal
device tree.  Finally, I'd like to implement a PCI bus power management
driver to demonstrate these concepts.  I'll have more details on these changes
soon.

Thanks,
Adam

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Linux Power Management
  2005-05-05  4:39   ` Adam Belay
@ 2005-05-08 18:35     ` David Brownell
  2005-05-09  9:49       ` Pavel Machek
  0 siblings, 1 reply; 16+ messages in thread
From: David Brownell @ 2005-05-08 18:35 UTC (permalink / raw)
  To: linux-pm

[-- Attachment #1: Type: text/plain, Size: 4737 bytes --]

On Wednesday 04 May 2005 9:39 pm, Adam Belay wrote:
> On Tue, May 03, 2005 at 11:52:45AM -0400, Alan Stern wrote:
> > On Tue, 3 May 2005, Adam Belay wrote:
> > 
> > You said explicitly that power devices may or may not coincide with struct 
> > devices in the driver model.  Why do you want to do this?  Why not insist 
> > that power devices _are_ struct devices?  It would make many things a lot 
> > easier and allow much more code sharing plus reduction of memory usage.  
> > (Kobjects aren't cheap, especially for embedded systems.)
> 
> I think you make a good point.  My original intention here was to support
> strange power relationships that do not match the device tree.

In fact, that's why "struct dev_pm_info" has a "pm_parent".  Today.
And device_pm_set_parent() ... which I don't necessarily think would
work, if anyone were to use it.  That information is invisible through
sysfs, note!

Hmm, I think I'll post a patch I've had sitting around for a while,
to at least report violation of the relevant integrity constraints.
It doesn't solve the problem that a pm_parent won't automatically
know about its children though, for example.  That's something I
liked about Adam's direction:  formalizing that relationship.

> However, I 
> don't think this will be common, and not every layer in between the tree
> has to participate.  So, psuedo-logical device layers like we're seeing in the
> PCI express bus driver could just be skipped past.  I'll change my plans to
> reflect this.

When you "don't think this will be common" you're basically assuming
Linux won't be used on embedded hardware.   Unhealthy assumption!!

It's **really easy** and in fact natural for hardware designers to
provide power management relationships that don't directly match
the logical bus structure that software likes to use.  Chips to
switch power are in widespread use, and the GPIO lines used to drive
them don't always come from a CPU (where they'd probably be always
available unless the CPU itself suspends).

The way those systems manage to work at all on 2.6 relies on several
things, in the systems I've seen.

   * System initialization order changes like subsys_initcall for
     I2C not device_initcall, also for the power switching device.
     That way, power can be turned on by board-specific logic in
     drivers for either (a) the power switching devices, or even
     better (b) the device whose power is switched knows how
     to ask the power-switch driver to give it power.  (Since
     they can rely on the switching device to be "live" by then.)

   * Despite the fact that sysfs exposes power/state, nobody is
     actually using that for anything except very selective
     testing ... so the deep brokenneses that _could_ happen is
     quite unlikely; we don't expect userspace to selectively
     suspend a parent without having suspended its child.
     (The device drivers use selective suspend all the time,
     but they know not to do such stuff.)

You'll observe that this entirely bypasses the driver model
PM structure, using init sequencing instead.  So until someone
tries to use sysfs for selective suspend, it can't matter
that some I2C device is actually controlling power for every
device in the system (CPU included!) yet isn't the driver
model parent of any one of them...

The fact that these systems can even boot under Linux relies on
board-specific logic to handle the things that the pmcore model
was allegedly "designed to handle".  Arguably it's better to have
code in driver X that just says "driver Y, please set GPIO 3 high"
since anything less direct is just wasted code ... but if so, then
what's the relevance of pmcore here?  What _should_ pmcore do for
normal board configurations like that?

> > Anything else?  This list in itself doesn't require much in the way of 
> > policy management.
> 
> Wake devices would also require userspace configuration of policy.

I'll definitely repost my patch adding that policy support; I've got
it updated, after all!  And the PCI-E changes to access PCI config
space "early" seem to have laid to rest concerns about whether that's
safe to do, so that simplifies the PCI stuff too.

> I appreciate your comments.  To sumarize, I'd like to introduce power
> resources into our current PM model and add a concept of device power states.

That is, distinct from dev_pm_info.power_state?  That notion is rather
bogus, and AFAICT is there _only_ to support sysfs selective suspend.

And the sysfs support for selective suspend is pretty bogus -- it doesn't
do recursion, dev_pm_info.power_state updates are iffy, and it obviously
can't preserve the pm parent/child relationships -- that whole area does
need much work/replacement.

- Dave

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Linux Power Management
  2005-05-08 18:35     ` David Brownell
@ 2005-05-09  9:49       ` Pavel Machek
  2005-05-09 16:41         ` David Brownell
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Machek @ 2005-05-09  9:49 UTC (permalink / raw)
  To: David Brownell; +Cc: linux-pm

[-- Attachment #1: Type: text/plain, Size: 1180 bytes --]

Hi!

> > However, I 
> > don't think this will be common, and not every layer in between the tree
> > has to participate.  So, psuedo-logical device layers like we're seeing in the
> > PCI express bus driver could just be skipped past.  I'll change my plans to
> > reflect this.
> 
> When you "don't think this will be common" you're basically assuming
> Linux won't be used on embedded hardware.   Unhealthy assumption!!
> 
> It's **really easy** and in fact natural for hardware designers to
> provide power management relationships that don't directly match
> the logical bus structure that software likes to use.  Chips to
> switch power are in widespread use, and the GPIO lines used to drive
> them don't always come from a CPU (where they'd probably be always
> available unless the CPU itself suspends).

Well, people can mess up hardware any way they like, but don't expect
us to mess up linux to match...

If they use GPIO lines on chip that eats so much power that it must be
power-managed... yes, I'd call that broken hardware.

Of course, current pm core *is* inadequate for even simple uses...
								Pavel
-- 
Boycott Kodak -- for their patent abuse against Java.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Linux Power Management
  2005-05-09  9:49       ` Pavel Machek
@ 2005-05-09 16:41         ` David Brownell
  2005-05-09 19:30           ` Pavel Machek
  0 siblings, 1 reply; 16+ messages in thread
From: David Brownell @ 2005-05-09 16:41 UTC (permalink / raw)
  To: linux-pm; +Cc: Pavel Machek

[-- Attachment #1: Type: text/plain, Size: 1491 bytes --]

On Monday 09 May 2005 2:49 am, Pavel Machek wrote:
> > > ... However, I don't think this will be common, ...
> > 
> > When you "don't think this will be common" you're basically assuming
> > Linux won't be used on embedded hardware.   Unhealthy assumption!!
> > 
> > It's **really easy** and in fact natural for hardware designers to
> > provide power management relationships that don't directly match
> > the logical bus structure that software likes to use. ...
> 
> Well, people can mess up hardware any way they like, but don't expect
> us to mess up linux to match...

Hardly.  On the other hand, if Linux can't handle such simple
and common hardware models, it's already been messed up.

> If they use GPIO lines on chip that eats so much power that it must be
> power-managed... yes, I'd call that broken hardware.

By virtue of being in the driver model, it _is_ managed no
matter how much power it uses (or doesn't use).  The issue
isn't how much power it uses.  It's the expectation that all
power signals only flow parallel to data and control busses,
as they would on daughtercard based designs.

I'd tend to agree with hardware guys on this one:  this is
such a simple case that only broken _software_ should have
trouble handling it.

> Of course, current pm core *is* inadequate for even simple uses...

Right, and GPIO based power switching is a "simple use".
Luckily the init sequence tweaks I described make Linux
work with it, until selective suspend kicks in.

- Dave

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Linux Power Management
  2005-05-09 16:41         ` David Brownell
@ 2005-05-09 19:30           ` Pavel Machek
  0 siblings, 0 replies; 16+ messages in thread
From: Pavel Machek @ 2005-05-09 19:30 UTC (permalink / raw)
  To: David Brownell; +Cc: linux-pm

[-- Attachment #1: Type: text/plain, Size: 1144 bytes --]

Hi!

> > If they use GPIO lines on chip that eats so much power that it must be
> > power-managed... yes, I'd call that broken hardware.
> 
> By virtue of being in the driver model, it _is_ managed no
> matter how much power it uses (or doesn't use).  The issue
> isn't how much power it uses.  It's the expectation that all
> power signals only flow parallel to data and control busses,
> as they would on daughtercard based designs.
> 
> I'd tend to agree with hardware guys on this one:  this is
> such a simple case that only broken _software_ should have
> trouble handling it.
> 
> 
> > Of course, current pm core *is* inadequate for even simple uses...
> 
> Right, and GPIO based power switching is a "simple use".
> Luckily the init sequence tweaks I described make Linux
> work with it, until selective suspend kicks in.

Well, GPIO power switching should be very easy to handle ... as long
as path between CPU and GPIO-based-power-switches takes little
power. When we don't have to power-manage the path between CPU and
GPIO-based-power-switches, we are fine...
								Pavel
-- 
Boycott Kodak -- for their patent abuse against Java.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Linux Power Management
  2005-05-03  4:32 [RFC] Linux Power Management Adam Belay
  2005-05-03  6:06 ` Nigel Cunningham
  2005-05-03 15:52 ` Alan Stern
@ 2005-05-03 21:40 ` Pavel Machek
  2005-05-05  4:12   ` Adam Belay
  2005-05-08 18:31 ` David Brownell
  3 siblings, 1 reply; 16+ messages in thread
From: Pavel Machek @ 2005-05-03 21:40 UTC (permalink / raw)
  To: Adam Belay; +Cc: linux-pm

[-- Attachment #1: Type: text/plain, Size: 1344 bytes --]

Hi!

> Device Drivers
> ==============
> 
> Linux device drivers must often save and restore state during power
> transitions.  The following API is proposed:
> 
> ->prepare_state(struct device * dev, pm_state_t state,
>                 unsigned int reason);
> ->complete_state(struct device * dev, pm_state_t state,
>                 unsigned int reason);
> 
> The following would be an example of a typical transition:
> 
> 1.) the policy manager decides to put a PCI ethernet card into D3 from
> D0.
> 2.) ->prepare_state is called, the ethernet driver saves its state
> information and disables the hardware
> 3.) the power driver's ->set_state function is called, and power is
> actually removed.
> 4.) ->complete_state is called to cleanup and make any final
> adjustments.
> 
> * In the case of D3->D0 ->complete_state would restore state.
> 
> Possible "reasons" might include DYNAMIC_PM, HALT, REBOOT, SUSPEND,
> RESUME, etc.
> 
> This API is different from the current ->suspend and ->resume because it
> applies to situations outside of system suspend (e.g. runtime power
> management) and has an emphasis on specific device power states. 

No. It took 2+ years to add at at least system power states. You want
to build on that, not scratch it and start over.

								Pavel
-- 
Boycott Kodak -- for their patent abuse against Java.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Re: [RFC] Linux Power Management
  2005-05-03 21:40 ` Pavel Machek
@ 2005-05-05  4:12   ` Adam Belay
  2005-05-05  9:38     ` Pavel Machek
  0 siblings, 1 reply; 16+ messages in thread
From: Adam Belay @ 2005-05-05  4:12 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-pm

[-- Attachment #1: Type: text/plain, Size: 1695 bytes --]

On Tue, May 03, 2005 at 11:40:09PM +0200, Pavel Machek wrote:
> Hi!
> 
> > Device Drivers
> > ==============
> > 
> > Linux device drivers must often save and restore state during power
> > transitions.  The following API is proposed:
> > 
> > ->prepare_state(struct device * dev, pm_state_t state,
> >                 unsigned int reason);
> > ->complete_state(struct device * dev, pm_state_t state,
> >                 unsigned int reason);
> > 
> > The following would be an example of a typical transition:
> > 
> > 1.) the policy manager decides to put a PCI ethernet card into D3 from
> > D0.
> > 2.) ->prepare_state is called, the ethernet driver saves its state
> > information and disables the hardware
> > 3.) the power driver's ->set_state function is called, and power is
> > actually removed.
> > 4.) ->complete_state is called to cleanup and make any final
> > adjustments.
> > 
> > * In the case of D3->D0 ->complete_state would restore state.
> > 
> > Possible "reasons" might include DYNAMIC_PM, HALT, REBOOT, SUSPEND,
> > RESUME, etc.
> > 
> > This API is different from the current ->suspend and ->resume because it
> > applies to situations outside of system suspend (e.g. runtime power
> > management) and has an emphasis on specific device power states. 
> 
> No. It took 2+ years to add at at least system power states. You want
> to build on that, not scratch it and start over.
> 
> 								Pavel

Hi Pavel,

After giving it some serious thought, I've decided that I agree.  I'm
reworking my plans to reflect this.

I was wondering, however, what do you have in mind for adding to pm_message_t?
Also, are you going to use "PMSG_HALT" and/or "PMSG_REBOOT"?

Thanks,
Adam

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Re: [RFC] Linux Power Management
  2005-05-05  4:12   ` Adam Belay
@ 2005-05-05  9:38     ` Pavel Machek
  2005-05-08 18:39       ` David Brownell
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Machek @ 2005-05-05  9:38 UTC (permalink / raw)
  To: Adam Belay; +Cc: linux-pm

[-- Attachment #1: Type: text/plain, Size: 1056 bytes --]

Hi!

> > > This API is different from the current ->suspend and ->resume because it
> > > applies to situations outside of system suspend (e.g. runtime power
> > > management) and has an emphasis on specific device power states. 
> > 
> > No. It took 2+ years to add at at least system power states. You want
> > to build on that, not scratch it and start over.
> 
> After giving it some serious thought, I've decided that I agree.  I'm
> reworking my plans to reflect this.
> 
> I was wondering, however, what do you have in mind for adding to pm_message_t?
> Also, are you going to use "PMSG_HALT" and/or "PMSG_REBOOT"?

If I added PMSG_REBOOT, I'd have to modify all the drivers to support
it. Incompatible change, bad.

We already have something close enough, PMSG_FREEZE. That means that
drivers will do right thing by default. If someone really needs to
tell between normal freeze and reboot, we can add a flag; but I'm not
100% convinced it is neccessary.

Same for PMSG_HALT.
								Pavel
-- 
Boycott Kodak -- for their patent abuse against Java.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Re: [RFC] Linux Power Management
  2005-05-05  9:38     ` Pavel Machek
@ 2005-05-08 18:39       ` David Brownell
  2005-05-09  8:35         ` Pavel Machek
  0 siblings, 1 reply; 16+ messages in thread
From: David Brownell @ 2005-05-08 18:39 UTC (permalink / raw)
  To: linux-pm; +Cc: Pavel Machek

[-- Attachment #1: Type: text/plain, Size: 3507 bytes --]

On Thursday 05 May 2005 2:38 am, Pavel Machek wrote:
> > > No. It took 2+ years to add at at least system power states. You want
> > > to build on that, not scratch it and start over.

But Pavel, since you started to push all those pm_message_t patches,
you've effectively REMOVED the entire visibility of system power states
to drivers ... they no longer have any information they can use to choose
the right device power state based on the upcoming system state.

This whole pm_message_t thing was pretty darn close to a "start over".
(Though that wasn't the original goal.)

Before those patches, pretty much everything **EXCEPT SWSUSP** (I had
to examine the whole kernel tree, and study how the PM framework had
evolved since 2.4...) was good about the integer suspend() parameter being
an ACPI S-number for the target system state.  And there were drivers
that used that information ... the primary problem was that those numbers
weren't documented as such, so confusion came when swsusp created its
own new/incompatible theory about what the integers meant.  (Rather than
adopting and/or formalizing the pre-existing model, with its enums.)

What you've done with those patches is (a) force all drivers and subsystems
to change, while (b) removing even the possibility for drivers to choose
target device states based on knowledge of the target system state, and
(c) breaking the drivers that DID choose states intelligently.

In short, the problem is that you _removed_ the previous driver-visible
notion of system power.  Now here you're complaining about Adam noticing
that the status quo ante was really a better direction.  Basically giving
him the same treatment you've given me for making that same point.  Forgive
me if I remain un-persuaded.

> > I was wondering, however, what do you have in mind for adding to pm_message_t?
> > Also, are you going to use "PMSG_HALT" and/or "PMSG_REBOOT"?
> 
> If I added PMSG_REBOOT, I'd have to modify all the drivers to support
> it. Incompatible change, bad.

You're going to have to do that to support PMSG_FREEZE already... as
you note.  What was your _previous_ plan for getting FREEZE to behave,
then?  Given what's happened so far, I have to conclude you never
really expected (or wanted) something that suspend() parameter to
be used for anything except the magic number "3".  (Which is used
for both FREEZE and SUSPEND now...)

If you really had believed incompatible changes were bad, you'd
not have chosen the approach you did for pm_message_t.  You would
have instead just formalized the existing model, using the existing
enum (used inside pmcore, except by swsusp) for a "strong" type
that sparse would check.

> We already have something close enough, PMSG_FREEZE. That means that
> drivers will do right thing by default. If someone really needs to
> tell between normal freeze and reboot, we can add a flag; but I'm not
> 100% convinced it is neccessary.

So what's your current rationale for FREEZE then?  The original
motivation was to have a "quiesced" state for drivers, distinct
from a full fledged SUSPEND, to avoid pointless suspend/resume
cycles in the middle of swsusp.

What you're arguing here is effectively that the parameter to
all suspend methods can never change.  Ergo it's useless, and
we might as well just have removed it in the first place!!!

Of course, one flaw in that argument is that there _are_ in fact
different system power states, and some drivers do need to choose
device states accordingly...

- Dave

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Re: [RFC] Linux Power Management
  2005-05-08 18:39       ` David Brownell
@ 2005-05-09  8:35         ` Pavel Machek
  0 siblings, 0 replies; 16+ messages in thread
From: Pavel Machek @ 2005-05-09  8:35 UTC (permalink / raw)
  To: David Brownell; +Cc: linux-pm

[-- Attachment #1: Type: text/plain, Size: 2110 bytes --]

Hi!

> > > I was wondering, however, what do you have in mind for adding to pm_message_t?
> > > Also, are you going to use "PMSG_HALT" and/or "PMSG_REBOOT"?
> > 
> > If I added PMSG_REBOOT, I'd have to modify all the drivers to support
> > it. Incompatible change, bad.
> 
> You're going to have to do that to support PMSG_FREEZE already... as
> you note.  What was your _previous_ plan for getting FREEZE to behave,
> then?  Given what's happened so far, I have to conclude you never
> really expected (or wanted) something that suspend() parameter to
> be used for anything except the magic number "3".  (Which is used
> for both FREEZE and SUSPEND now...)
> 
> If you really had believed incompatible changes were bad, you'd
> not have chosen the approach you did for pm_message_t.  You would
> have instead just formalized the existing model, using the existing
> enum (used inside pmcore, except by swsusp) for a "strong" type
> that sparse would check.

PMSG_REBOOT is not good enough reason to go through all the
drivers. PMSG_FREEZE can be used, instead.

> > We already have something close enough, PMSG_FREEZE. That means that
> > drivers will do right thing by default. If someone really needs to
> > tell between normal freeze and reboot, we can add a flag; but I'm not
> > 100% convinced it is neccessary.
> 
> So what's your current rationale for FREEZE then?  The original
> motivation was to have a "quiesced" state for drivers, distinct
> from a full fledged SUSPEND, to avoid pointless suspend/resume
> cycles in the middle of swsusp.

It still holds. PMSG_FREEZE is quiesced state, usefull for at least
swsusp and reboot.

> What you're arguing here is effectively that the parameter to
> all suspend methods can never change.  Ergo it's useless, and
> we might as well just have removed it in the first place!!!

No, I'm not argument that parameter to suspend methods can never
change. It will change in 2.6.12. It will have 3 values, PMSG_FREEZE
and PMSG_SUSPEND (and PMSG_ON, if you want to store state somewhere).

								Pavel
-- 
Boycott Kodak -- for their patent abuse against Java.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Linux Power Management
  2005-05-03  4:32 [RFC] Linux Power Management Adam Belay
                   ` (2 preceding siblings ...)
  2005-05-03 21:40 ` Pavel Machek
@ 2005-05-08 18:31 ` David Brownell
  2005-05-09  3:26   ` Adam Belay
  3 siblings, 1 reply; 16+ messages in thread
From: David Brownell @ 2005-05-08 18:31 UTC (permalink / raw)
  To: linux-pm; +Cc: Adam Belay

[-- Attachment #1: Type: text/plain, Size: 9776 bytes --]

I like the overall goals and direction, but most details are probably
premature.  Especially since the real issue is how to evolve the current
mess while not breaking things too badly (again) ...

On Monday 02 May 2005 9:32 pm, Adam Belay wrote:
> Problems with current Linux PM
> ==============================
> 
> Although the existing model is sufficient for suspend and resume, modern

That is, system-wide suspend/resume.  For suspend/resume of individual
devices, at run-time, it's quite weak.

> hardware often has more sophisticated power management features.  This
> includes runtime power management and wake events.  Also, the current
> model doesn't support power domains, a key concept in most bus hardware.
> 
> Design Goals
> ============
> 
> This project aims to provide a more useful Linux power management
> infrastructure.  Because of the wide array of power management capable
> platforms, each with its own unique protocols, it's important to have a
> flexible design.  Therefore, simplicity and a solid framework are
> favored over platform-specific quirks.

Nobody has _ever_ advocated platform-specific quirks in the core.  :)

Folk have however advocated serious design limitations in that core
which would prevent reasonable support of some platforms.  I think
that's an anti-goal.  The point of flexibility is to let such common
platform models work well ... avoiding such limitations, rather
than labeling all other platforms (e.g. non-PC ones) as "quirky"
and predestined to not working well with Linux.

> In this model, power management is not limited to sleep and suspend
> operations.  Instead, each device has the option of managing its power
> dynamically while the system is running.  Parent devices must be aware
> of the power requirements of their children.

Yes, though the parent/child statement seems a bit too strong.

Devices commonly have multiple sorts of parents; clocks, power
control, and multiple busses (such as one for control and one
for DMA) and bridges.  It probably works better for devices to
know about those parents, and only require the PM core to
accomodate those multiple relationships (rather than getting
in the way by for example insisting the hardware may only have
one such relationship, called "parent").

> Userspace interaction with power management policy is a key goal.  While
> policy configuration values may be specified by the user, policy
> execution should occur in kernel-space whenever possible.  Userspace
> will be notified of power events (including device state changes) via
> kevents.

I don't agree about userspace interaction as a goal, beyond the
ability to pass general policy inputs to drivers.  It's fair that
some devices might support policies like "off" and "on"; but
that's not something to expect (or require!) from all drivers.

And when the drivers do choose to export such policies, it's not
clear that the export/import is ever naturally part of some "power
management" framework.  Counter-examples include "hdparm -S" to
control hard drive power usage (drive spindown), and "xset dpms"
for displays.  (Remember that disk drive and display power usage
are classically the major drains, though current generation PCs
often push CPU or GPU usage into that category too.)

In fact I still like the idea of just removing all the sysfs
power support entirely; ripping it out since it's never worked
well, and doesn't do what's needed.  The main counter-argument
is that there'd need to be some better way to test selective
suspend in drivers, if "echo -n 3 > power/state" vanishes.
(Like, hmm, something sitting in debugfs...)

> Power States
> ============
> 
> Every "power device" or "power resource" has its own unique set of
> supported power states.  Characteristics about each state are specified
> in a "struct power_state".  This structure is intended primarily for
> gathering information.  A typical usage would be in power management
> policy decisions.

Nobody's yet answered my question about why we'd need to formalize
such a state ... other than for sysfs support.  If a component is
managing power for several others, such states would be consequences
of agreements between those components.

> Power Devices
> =============
> 
> The base object of this power management implementation is referred to
> as a "power device".  Power devices are represented by kobjects, each
> with their own children and parents.  A power device may or may not
> belong to a "struct device" in the physical device tree.
> 
> Every power device can be considered a power domain.  Each domain has
> its own power states, but also acts as a container for child power
> devices.  These children can specify what they require from the parent
> domain.  When the requirements of all children have lowered below a
> domain's current state, the parent may choose to also lower its state.

As Alan observed, this doesn't necessarily seem to require a new
kind of data structure.  The problems with the existing framework
are more at the level of imposing too much policy (and the wrong
kind!) about the power relationships of devices.

And for example the current pm_parent seems like it could help to
manage such a "power domain"...

> Power Drivers
> =============
> 
> Power drivers are specialized drivers with knowledge of a specific power
> management protocol.  They provide a mechanism for changing the power
> state, and update the "struct pm_device" to reflect which states are
> available during a global system state transition.
> 
> Legacy or ISA devices may choose to implement their own power driver.
> Most bus technologies (e.g. PCI) will provide a more general power
> driver.
> 
> Power state index values are specific to the power driver.

What is a "power management protocol"?  And what "power state" is
being changed?  I don't quite see a need for such a thing; and if
it were to exist, it should have "protocol" specific identifiers
rather than "power state index values" to abuse (by offering the
ability to pass them between "protocols").

> Power Resources
> ===============
> 
> Generally speaking, "power resources" are power planes, clocks, etc.
> that can be individually controlled.
> 
> Not every power management object fits into the power domain model,
> especially in embedded systems and for ACPI.  Therefore, this
> abstraction is needed to complement power domains and fills in any gaps
> in the power management object topology.
> 
> Power resources are independent of power domains.  Like power devices,
> they may have their own list of power states.  However, their
> representation is more simplistic than power devices.  The power
> management subsystem does not attempt to determine how power devices
> depend on power resources or when power resources should be configured
> as this is implementation specific.
> 
> The main goal behind power resource objects is to provide a framework
> for some standardization, export this information to sysfs for
> debugging, and act as a stub for future expansion.

If it's for debugging, it should be exported with debugfs!!

The other arguments aren't convincing to me, in terms of having
any sort of standardized API.  The notion is fine, but the
examples you gave don't seem to need "generic" APIs.  Clocks
demonstrably don't; I've pointed out the one ARM uses there,
it can't be at all generic.

I could believe it'd be good to have a semi-generic API to
switch power though ... and maybe even a way for platform
device resources to include power switch resources, so the
drivers would get rid of related board-specific knowledge.

> Power Management Policy
> =======================
> 
> Each power device will have a policy manager.  Policy managers make
> power management decisions based on user configurable settings and data
> gathered from device drivers.  Generally this will include activity
> timers and other methods of determining device idleness.
> 
> Most of the power policy manager implementation is device specific, but
> a few basic notifications are provided by the power management
> subsystem.  This includes when the system state is about to change or
> when the net requirements of child devices have changed.
> 
> ...
> 
> Standard policies will be provided.  As an example, most PCI devices
> have simple power management requirements, so they will use a generic
> PCI policy manager.  The PCI policy manager might then have its own
> hooks (e.g. state selection for wake).

Again, given that I don't see a strong need for a separate "power device"
(vs normal "device"), I don't see a need for separate drivers here.

I think the methods you sketched are a bit overly complex.  Two phase
protocols have tended to not work well -- nobody implements them right,
they're hard to test -- so "prepare" worries me.  The call reporting
changed requirements doesn't attract me, since it seems like it should
be subsumed by the "enter" call.

And that "enter" call looks like the "suspend" call used to look ... but
instead of "pm_message_t" it's got something that's actually useful.
And yes, I think pm_message_t was broken-as-designed, and still
needs fixing.

> Device Drivers
> ==============
> 
> Linux device drivers must often save and restore state during power
> transitions.  

Sure, but that doesn't mean there need to be APIs that every driver
would have to handle ... or that they couldn't just save/restore
that state automatically during suspend/resume calls.

> Conclusion
> ==========
> 
> This document provides a basic summary of a proposed power management
> design plan.  It is currently a draft.  Feel free to make any comments
> or suggest revisions.

Slim it down, and work on having incremental updates to the existing
infrastructure.

- Dave

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Linux Power Management
  2005-05-08 18:31 ` David Brownell
@ 2005-05-09  3:26   ` Adam Belay
  2005-05-09 16:02     ` David Brownell
  0 siblings, 1 reply; 16+ messages in thread
From: Adam Belay @ 2005-05-09  3:26 UTC (permalink / raw)
  To: David Brownell; +Cc: linux-pm

[-- Attachment #1: Type: text/plain, Size: 11496 bytes --]

On Sun, 2005-05-08 at 11:31 -0700, David Brownell wrote:
> I like the overall goals and direction, but most details are probably
> premature.  Especially since the real issue is how to evolve the current
> mess while not breaking things too badly (again) ...
> 
> 
> On Monday 02 May 2005 9:32 pm, Adam Belay wrote:
> > Problems with current Linux PM
> > ==============================
> > 
> > Although the existing model is sufficient for suspend and resume, modern
> 
> That is, system-wide suspend/resume.  For suspend/resume of individual
> devices, at run-time, it's quite weak.

Agreed, but the PM model should generally be less involved with
suspend/resume of individual devices.

> 
> > hardware often has more sophisticated power management features.  This
> > includes runtime power management and wake events.  Also, the current
> > model doesn't support power domains, a key concept in most bus hardware.
> > 
> > Design Goals
> > ============
> > 
> > This project aims to provide a more useful Linux power management
> > infrastructure.  Because of the wide array of power management capable
> > platforms, each with its own unique protocols, it's important to have a
> > flexible design.  Therefore, simplicity and a solid framework are
> > favored over platform-specific quirks.
> 
> Nobody has _ever_ advocated platform-specific quirks in the core.  :)
> 
> Folk have however advocated serious design limitations in that core
> which would prevent reasonable support of some platforms.  I think
> that's an anti-goal.  The point of flexibility is to let such common
> platform models work well ... avoiding such limitations, rather
> than labeling all other platforms (e.g. non-PC ones) as "quirky"
> and predestined to not working well with Linux.

Right, and what you are referring to here is PC platform specific
qualities. :)

> 
> 
> > In this model, power management is not limited to sleep and suspend
> > operations.  Instead, each device has the option of managing its power
> > dynamically while the system is running.  Parent devices must be aware
> > of the power requirements of their children.
> 
> Yes, though the parent/child statement seems a bit too strong.
> 
> Devices commonly have multiple sorts of parents; clocks, power
> control, and multiple busses (such as one for control and one
> for DMA) and bridges.  It probably works better for devices to
> know about those parents, and only require the PM core to
> accomodate those multiple relationships (rather than getting
> in the way by for example insisting the hardware may only have
> one such relationship, called "parent").

I'm aware of this.  I think its impossible for any PM model to handle
these multiple relationships.  They're too non-standardized.  In most
cases, I think we should just stay out of the way.

However, many standards and platforms accustom to expansion follow a
power-domain model.  Although the power domain support shouldn't get in
the way of those who don't need it, it's important that we provide this
functionality.  For most devices it will just make things easier.

Power resources are my attempt to model everything else.  For things any
weirder, the PM core doesn't need to know about them at all.

> 
>  
> > Userspace interaction with power management policy is a key goal.  While
> > policy configuration values may be specified by the user, policy
> > execution should occur in kernel-space whenever possible.  Userspace
> > will be notified of power events (including device state changes) via
> > kevents.
> 
> I don't agree about userspace interaction as a goal, beyond the
> ability to pass general policy inputs to drivers.  It's fair that
> some devices might support policies like "off" and "on"; but
> that's not something to expect (or require!) from all drivers.

I think this is really something that varies between device and
platform.  As I have said numerous times, I'd like to have policy
variables be configurable, but enforcement to occur in the kernel.

> 
> And when the drivers do choose to export such policies, it's not
> clear that the export/import is ever naturally part of some "power
> management" framework.  Counter-examples include "hdparm -S" to
> control hard drive power usage (drive spindown), and "xset dpms"
> for displays.  (Remember that disk drive and display power usage
> are classically the major drains, though current generation PCs
> often push CPU or GPU usage into that category too.)

right.

> 
> In fact I still like the idea of just removing all the sysfs
> power support entirely; ripping it out since it's never worked
> well, and doesn't do what's needed.  The main counter-argument
> is that there'd need to be some better way to test selective
> suspend in drivers, if "echo -n 3 > power/state" vanishes.
> (Like, hmm, something sitting in debugfs...)

It depends on what we want to do with power management.  However, one of
the original reasons sysfs was created was to provide a power dependency
tree.  You can't just say that you don't like sysfs.  Perhaps you have
another interface in mind (ex. netlink/D-BUS)  I disagree about debugfs.
It just isn't for this sort of thing and will only lead to confusion.
Sysfs provides structure and organization and is designed to show
hardware information.

> 
> 
> > Power States
> > ============
> > 
> > Every "power device" or "power resource" has its own unique set of
> > supported power states.  Characteristics about each state are specified
> > in a "struct power_state".  This structure is intended primarily for
> > gathering information.  A typical usage would be in power management
> > policy decisions.
> 
> Nobody's yet answered my question about why we'd need to formalize
> such a state ... other than for sysfs support.  If a component is
> managing power for several others, such states would be consequences
> of agreements between those components.

Well, I think various people have mentioned them in the past.  My idea
was to include power consumption information by state.  

Anyone else have a reason why we would need a device state list?

> 
> 
> 
> > Power Devices
> > =============
> > 
> > The base object of this power management implementation is referred to
> > as a "power device".  Power devices are represented by kobjects, each
> > with their own children and parents.  A power device may or may not
> > belong to a "struct device" in the physical device tree.
> > 
> > Every power device can be considered a power domain.  Each domain has
> > its own power states, but also acts as a container for child power
> > devices.  These children can specify what they require from the parent
> > domain.  When the requirements of all children have lowered below a
> > domain's current state, the parent may choose to also lower its state.
> 
> As Alan observed, this doesn't necessarily seem to require a new
> kind of data structure.  The problems with the existing framework
> are more at the level of imposing too much policy (and the wrong
> kind!) about the power relationships of devices.

Originally I was trying to work toward supporting the multiple
dependency cases you mentioned earlier.  However, it's just too
difficult to be handled by the PM core, so I agree with Alan. 

> 
> And for example the current pm_parent seems like it could help to
> manage such a "power domain"...

Perhaps.  I like the checks you added for this.

> 
> 
> > Power Drivers
> > =============
> > 
> > Power drivers are specialized drivers with knowledge of a specific power
> > management protocol.  They provide a mechanism for changing the power
> > state, and update the "struct pm_device" to reflect which states are
> > available during a global system state transition.
> > 
> > Legacy or ISA devices may choose to implement their own power driver.
> > Most bus technologies (e.g. PCI) will provide a more general power
> > driver.
> > 
> > Power state index values are specific to the power driver.
> 
> What is a "power management protocol"?  And what "power state" is
> being changed?  I don't quite see a need for such a thing; and if
> it were to exist, it should have "protocol" specific identifiers
> rather than "power state index values" to abuse (by offering the
> ability to pass them between "protocols").

Fair enough, I think my idea here was too power-domain centric.

> 
> 
> > Power Resources
> > ===============
> > 
> > Generally speaking, "power resources" are power planes, clocks, etc.
> > that can be individually controlled.
> > 
> > Not every power management object fits into the power domain model,
> > especially in embedded systems and for ACPI.  Therefore, this
> > abstraction is needed to complement power domains and fills in any gaps
> > in the power management object topology.
> > 
> > Power resources are independent of power domains.  Like power devices,
> > they may have their own list of power states.  However, their
> > representation is more simplistic than power devices.  The power
> > management subsystem does not attempt to determine how power devices
> > depend on power resources or when power resources should be configured
> > as this is implementation specific.
> > 
> > The main goal behind power resource objects is to provide a framework
> > for some standardization, export this information to sysfs for
> > debugging, and act as a stub for future expansion.
>
> If it's for debugging, it should be exported with debugfs!!

This isn't just debugging.  It's "current status" type information.  And
no, nothing like this belongs in debugfs as I said earlier.

> 
> The other arguments aren't convincing to me, in terms of having
> any sort of standardized API.  The notion is fine, but the
> examples you gave don't seem to need "generic" APIs.  Clocks
> demonstrably don't; I've pointed out the one ARM uses there,
> it can't be at all generic.
> 
> I could believe it'd be good to have a semi-generic API to
> switch power though ... and maybe even a way for platform
> device resources to include power switch resources, so the
> drivers would get rid of related board-specific knowledge.
> 

Right, ACPI has these.

So here's another argument.  Each power resource could have its own
->suspend and ->resume hook for when we transition system states.  I
think this would be useful in some cases, and if not then just don't
provide them.  Also, these power resources might have their own policy
configuration variables.

--> snip
(I agree with your power policy comments)

> 
> 
> > Device Drivers
> > ==============
> > 
> > Linux device drivers must often save and restore state during power
> > transitions.  
> 
> Sure, but that doesn't mean there need to be APIs that every driver
> would have to handle ... or that they couldn't just save/restore
> that state automatically during suspend/resume calls.

Right, I'm not going with that model anyway.  The idea was just to
experiment with a setState approach instead of ->suspend and ->resume.

> 
> 
> > Conclusion
> > ==========
> > 
> > This document provides a basic summary of a proposed power management
> > design plan.  It is currently a draft.  Feel free to make any comments
> > or suggest revisions.
> 
> Slim it down, and work on having incremental updates to the existing
> infrastructure.

Yes, that is my intention.  This document gave me a chance to experiment
with various ideas outside the current implementation.  I think doing so
can be useful sometimes.  I appreciate the comments.

Thanks,
Adam



[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] Linux Power Management
  2005-05-09  3:26   ` Adam Belay
@ 2005-05-09 16:02     ` David Brownell
  0 siblings, 0 replies; 16+ messages in thread
From: David Brownell @ 2005-05-09 16:02 UTC (permalink / raw)
  To: Adam Belay; +Cc: linux-pm

[-- Attachment #1: Type: text/plain, Size: 5356 bytes --]

On Sunday 08 May 2005 8:26 pm, Adam Belay wrote:
> > > Problems with current Linux PM
> > > ==============================
> > > 
> > > Although the existing model is sufficient for suspend and resume, modern
> > 
> > That is, system-wide suspend/resume.  For suspend/resume of individual
> > devices, at run-time, it's quite weak.
> 
> Agreed, but the PM model should generally be less involved with
> suspend/resume of individual devices.

But right now it _is_ involved.  And regardless, it's hard to
argue in favor of different suspend/resume calls for system
sleep state transitions and selective suspend/resume actions;
to the driver, there's no real difference.

Though as I'm sure you've picked up, I'm starting to believe that
there should ONLY be system-wide PM operations through sysfs,
with any device-specific ones going through application-specific
requests (again using "hdparm -S" and "xset dpms" as examples).


> > > In this model, power management is not limited to sleep and suspend
> > > operations.  Instead, each device has the option of managing its power
> > > dynamically while the system is running.  Parent devices must be aware
> > > of the power requirements of their children.
> > 
> > Yes, though the parent/child statement seems a bit too strong.
> > 
> > Devices commonly have multiple sorts of parents; clocks, power
> > control, and multiple busses (such as one for control and one
> > for DMA) and bridges.  It probably works better for devices to
> > know about those parents, and only require the PM core to
> > accomodate those multiple relationships (rather than getting
> > in the way by for example insisting the hardware may only have
> > one such relationship, called "parent").
> 
> I'm aware of this.  I think its impossible for any PM model to handle
> these multiple relationships.  They're too non-standardized.  In most
> cases, I think we should just stay out of the way.

Staying out of the way should work, if it's done right.  That may
mean adding a few mechanisms that are missing, or changing things
that get in the way now.


> However, many standards and platforms accustom to expansion follow a
> power-domain model.  Although the power domain support shouldn't get in
> the way of those who don't need it, it's important that we provide this
> functionality.  For most devices it will just make things easier.
> 
> Power resources are my attempt to model everything else.  For things any
> weirder, the PM core doesn't need to know about them at all.

It's probably best to make sure essential concrete cases work well,
generalizing later instead of earlier.  (I still don't have such
examples for "power resources".)

Right now, I'm comfortable saying we don't have a very good way
to model even common abstractions like USB hubs (each of which
is a power domain).  And that's not really different from any
other kind of bridge, or (generally) internal node in the device
tree.


> > > Userspace interaction with power management policy is a key goal.  While
> > > policy configuration values may be specified by the user, policy
> > > execution should occur in kernel-space whenever possible.  Userspace
> > > will be notified of power events (including device state changes) via
> > > kevents.
> > 
> > I don't agree about userspace interaction as a goal, beyond the
> > ability to pass general policy inputs to drivers.  It's fair that
> > some devices might support policies like "off" and "on"; but
> > that's not something to expect (or require!) from all drivers.
> 
> I think this is really something that varies between device and
> platform.  As I have said numerous times, I'd like to have policy
> variables be configurable, but enforcement to occur in the kernel.

Depending on how "policy" is defined, that can mean a lot of things... ;)


> > In fact I still like the idea of just removing all the sysfs
> > power support entirely; ripping it out since it's never worked
> > well, and doesn't do what's needed.  The main counter-argument
> > is that there'd need to be some better way to test selective
> > suspend in drivers, if "echo -n 3 > power/state" vanishes.
> > (Like, hmm, something sitting in debugfs...)
> 
> It depends on what we want to do with power management.  However, one of
> the original reasons sysfs was created was to provide a power dependency
> tree.  You can't just say that you don't like sysfs.  

I didn't ... and neither does providing such a tree require
exposing per-device controls.  Taking away those (broken)
controls is simple.  In fact, configure without CONFIG_PM
and you'll see a useful sysfs that doesn't have them ...


> > And for example the current pm_parent seems like it could help to
> > manage such a "power domain"...
> 
> Perhaps.  I like the checks you added for this.

Which FWIW I forwarded to Greg to merge, ideally "soon" though
the checks aren't critical in normal usage.


> So here's another argument.  Each power resource could have its own
> ->suspend and ->resume hook for when we transition system states.  I
> think this would be useful in some cases, and if not then just don't
> provide them.  Also, these power resources might have their own policy
> configuration variables.

Can you maybe provide concrete examples of a "power resource"?
If we just talk about abstract notions, they don't seem useful.

- Dave


[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2005-05-09 19:30 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-03  4:32 [RFC] Linux Power Management Adam Belay
2005-05-03  6:06 ` Nigel Cunningham
2005-05-03 15:52 ` Alan Stern
2005-05-05  4:39   ` Adam Belay
2005-05-08 18:35     ` David Brownell
2005-05-09  9:49       ` Pavel Machek
2005-05-09 16:41         ` David Brownell
2005-05-09 19:30           ` Pavel Machek
2005-05-03 21:40 ` Pavel Machek
2005-05-05  4:12   ` Adam Belay
2005-05-05  9:38     ` Pavel Machek
2005-05-08 18:39       ` David Brownell
2005-05-09  8:35         ` Pavel Machek
2005-05-08 18:31 ` David Brownell
2005-05-09  3:26   ` Adam Belay
2005-05-09 16:02     ` David Brownell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox