public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] PM: Separate suspend and hibernation callbacks (highest level) - updated
@ 2008-03-09  1:20 Rafael J. Wysocki
  2008-03-09  4:13 ` Alan Stern
  0 siblings, 1 reply; 10+ messages in thread
From: Rafael J. Wysocki @ 2008-03-09  1:20 UTC (permalink / raw)
  To: pm list; +Cc: ACPI Devel Maling List, Alexey Starikovskiy

Hi,

Below is a new version of the patch that introduces the separation of suspend
and hibernation callbacks on the highest level.

Relative to the previous version two additional callbacks have been added,
->prepare() and ->complete(), allowing to prepare the device for a "suspend"
transition and to finalize the "resume" respectively.

Also, all callbacks and PM_EVENT_ messages are now described in the
comments in include/linux/pm.h (the documentation in Documentation/power
will be updated separately).

As previously, it is assumed that the existing (legacy) ->suspend() and
->resume() callbacks will be used for bus types, device classes and device
types that don't implement the new callbacks.  As a result, the patch doesn't
introduce any visible functional changes.

The patch has been compilation tested on x86-64.

Comments welcome.

Thanks,
Rafael

---
Introduce 'struct pm_ops' representing a set of suspend and
hibernation operations for bus types, device classes and device
types.

Modify the PM core to use 'struct pm_ops' objects, if defined, instead
of the ->suspend(), ->resume(), ->suspend_late(), ->resume_early()
callbacks that will be considered as legacy and gradually phased out.

Not-yet-signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---

 arch/x86/kernel/apm_32.c  |    4 
 drivers/base/power/main.c |  375 +++++++++++++++++++++++++++++++++++-----------
 include/linux/device.h    |    8 
 include/linux/pm.h        |  193 ++++++++++++++++++++---
 kernel/power/disk.c       |   14 -
 kernel/power/main.c       |    4 
 6 files changed, 482 insertions(+), 116 deletions(-)

Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -110,11 +110,9 @@ extern void (*pm_power_off_prepare)(void
 
 struct device;
 
-typedef struct pm_message {
-	int event;
-} pm_message_t;
-
-/*
+/**
+ * struct pm_ops - device PM callbacks
+ *
  * Several driver power state transitions are externally visible, affecting
  * the state of pending I/O queues and (for drivers that touch hardware)
  * interrupts, wakeups, DMA, and other hardware state.  There may also be
@@ -122,6 +120,169 @@ typedef struct pm_message {
  * to the rest of the driver stack (such as a driver that's ON gating off
  * clocks which are not in active use).
  *
+ * The externally visible transitions are handled with the help of the following
+ * callbacks included in this structure:
+ *
+ * @prepare: Prepare the device for the upcoming transition, but do NOT change
+ *	its hardware state.  Prevent new children of the device from being
+ *	registered after @prepare() returns.  Also, if a concurrent child
+ *	registration already in progress is detected by @prepare(), it should
+ *	wait for the registration to complete, block the subsequent
+ *	registrations of children and return -EAGAIN, so that the PM core can
+ *	execute it once again (after suspending the newly registered child) to
+ *	recover from the race condition.  This method is executed for all kinds
+ *	of suspend transitions and is immediately followed by one of the suspend
+ *	callbacks: @suspend(), @freeze(), @poweroff(), or @quiesce().
+ *
+ * @complete: Undo the changes made by @prepare().  This method is executed for
+ *	all kinds of resume transitions, immediately following one of the resume
+ *	callbacks: @resume(), @thaw(), @restore(), or @recover().  Also executed
+ *	if a suspend callback (@suspend(), @freeze(), @poweroff(), @quiesce())
+ *	immediately following a successful @prepare() fails.
+ *
+ * @suspend: Executed before putting the system into a sleep state in which the
+ *	contents of main memory are preserved.  Quiesce the device, put it into
+ *	a low power state appropriate for the upcoming system state (such as
+ *	PCI_D3hot), and enable wakeup events as appropriate.
+ *
+ * @resume: Executed after waking the system up from a sleep state in which the
+ *	contents of main memory were preserved.  Put the device into the
+ *	appropriate state, according to the information saved in memory by the
+ *	preceding @suspend().  The driver starts working again, responding to
+ *	hardware events and software requests.  The hardware may have gone
+ *	through a power-off reset, or it may have maintained state from the
+ *	previous suspend() which the driver may rely on while resuming.  On most
+ *	platforms, there are no restrictions on availability of resources like
+ *	clocks during @resume().
+ *
+ * @freeze: Hibernation-specific, executed before creating a hibernation image.
+ *	Quiesce operations so that a consistent image can be created, but do NOT
+ *	otherwise put the device into a low power device state and do NOT emit
+ *	system wakeup events.  Save in main memory the device settings to be
+ *	used by @restore() during the subsequent resume from hibernation.
+ *
+ * @thaw: Hibernation-specific, executed after creating a hibernation image.
+ *	Undo the changes made by the preceding @freeze(), so the device can be
+ *	operated in the same was as immediately before the call to @freeze().
+ *
+ * @poweroff: Hibernation-specific, executed after saving a hibernation image.
+ *	Quiesce the device, put it into a low power state appropriate for the
+ *	upcoming system state (such as PCI_D3hot), and enable wakeup events as
+ *	appropriate.
+ *
+ * @quiesce: Hibernation-specific, executed after loading a hibernation image
+ *	and before restoring the contents of main memory from it.  Quiesce
+ *	operations so that the contents of main memory can be restored from the
+ *	image in a consistent way, but do NOT otherwise put the device into a
+ *	low power state and do NOT emit system wakeup events.  Do NOT save any
+ *	device settings in main memory.
+ *
+ * @restore: Hibernation-specific, executed after restoring the contents of main
+ *	memory from a hibernation image.  Driver starts working again,
+ *	responding to hardware events and software requests.  Do NOT make ANY
+ *	assumptions about the hardware state right prior to @restore().  On most
+ *	platforms, there are no restrictions on availability of resources like
+ *	clocks during @restore().
+ *
+ * @recover: Hibernation-specific, executed after a failing creation of a
+ *	hibernation image OR after a failing attempt to restore the contents of
+ *	main memory from such an image.  Undo the changes made by the preceding
+ *	@freeze() or @quiesce, so the device can be operated in the same was as
+ *	immediately before the failing transition.
+ */
+
+struct pm_ops {
+#ifdef CONFIG_PM_SLEEP
+	int (*prepare)(struct device *dev);
+	void (*complete)(struct device *dev);
+#endif
+#ifdef CONFIG_SUSPEND
+	int (*suspend)(struct device *dev);
+	int (*resume)(struct device *dev);
+#endif
+#ifdef CONFIG_HIBERNATION
+	int (*freeze)(struct device *dev);
+	int (*thaw)(struct device *dev);
+	int (*poweroff)(struct device *dev);
+	int (*quiesce)(struct device *dev);
+	int (*restore)(struct device *dev);
+	int (*recover)(struct device *dev);
+#endif
+};
+
+/**
+ * PM_EVENT_ messages
+ *
+ * The following PM_EVENT_ messages are defined for the internal use of the PM
+ * core, in order to provide a mechanism allowing the high level suspend and
+ * hibernation code to convey the necessary information to the device PM core
+ * code:
+ *
+ * ON		No transition.
+ *
+ * FREEZE 	System is going to hibernate, call ->prepare() and ->freeze()
+ *		for all devices.
+ *
+ * SUSPEND	System is going to suspend, call ->prepare() and ->suspend()
+ *		for all devices.
+ *
+ * HIBERNATE	Hibernation image has been saved, call ->prepare() and
+ *		->poweroff() for all devices.
+ *
+ * QUIESCE	Contents of main memory are going to be restored from a (loaded)
+ *		hibernation image, call ->prepare() and ->quiesce() for all
+ *		devices.
+ *
+ * RESUME	System is resuming, call ->resume() and ->complete() for all
+ *		devices.
+ *
+ * THAW		Hibernation image has been created, call ->thaw() and
+ *		->complete() for all devices.
+ *
+ * RESTORE	Contents of main memory have been restored from a hibernation
+ *		image, call ->restore() and ->complete() for all devices.
+ *
+ * RECOVER	Creation of a hibernation image or restoration of the main
+ *		memory contents from a hibernation image has failed, call
+ *		->recover() and ->complete() for all devices.
+ */
+
+typedef struct pm_message {
+	int event;
+} pm_message_t;
+
+#define PM_EVENT_ON		0x0000
+#define PM_EVENT_FREEZE 	0x0001
+#define PM_EVENT_SUSPEND	0x0002
+#define PM_EVENT_HIBERNATE	0x0004
+#define PM_EVENT_QUIESCE	0x0008
+#define PM_EVENT_RESUME		0x0010
+#define PM_EVENT_THAW		0x0020
+#define PM_EVENT_RESTORE	0x0040
+#define PM_EVENT_RECOVER	0x0080
+
+#define PM_EVENT_SLEEP	(PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE)
+
+#define PMSG_FREEZE	((struct pm_message){ .event = PM_EVENT_FREEZE, })
+#define PMSG_QUIESCE	((struct pm_message){ .event = PM_EVENT_QUIESCE, })
+#define PMSG_SUSPEND	((struct pm_message){ .event = PM_EVENT_SUSPEND, })
+#define PMSG_HIBERNATE	((struct pm_message){ .event = PM_EVENT_HIBERNATE, })
+#define PMSG_RESUME	((struct pm_message){ .event = PM_EVENT_RESUME, })
+#define PMSG_THAW	((struct pm_message){ .event = PM_EVENT_THAW, })
+#define PMSG_RESTORE	((struct pm_message){ .event = PM_EVENT_RESTORE, })
+#define PMSG_RECOVER	((struct pm_message){ .event = PM_EVENT_RECOVER, })
+#define PMSG_ON		((struct pm_message){ .event = PM_EVENT_ON, })
+
+/*
+ * The PM_EVENT_ messages are also used by drivers implementing the legacy
+ * suspend framework, based on the ->suspend() and ->resume() callbacks common
+ * for suspend and hibernation transitions, according to the rules below.
+ */
+
+/* Necessary, because several drivers use PM_EVENT_PRETHAW */
+#define PM_EVENT_PRETHAW PM_EVENT_QUIESCE
+
+/*
  * One transition is triggered by resume(), after a suspend() call; the
  * message is implicit:
  *
@@ -166,20 +327,6 @@ typedef struct pm_message {
  * or from system low-power states such as standby or suspend-to-RAM.
  */
 
-#define PM_EVENT_ON 0
-#define PM_EVENT_FREEZE 1
-#define PM_EVENT_SUSPEND 2
-#define PM_EVENT_HIBERNATE 4
-#define PM_EVENT_PRETHAW 8
-
-#define PM_EVENT_SLEEP	(PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE)
-
-#define PMSG_FREEZE	((struct pm_message){ .event = PM_EVENT_FREEZE, })
-#define PMSG_PRETHAW	((struct pm_message){ .event = PM_EVENT_PRETHAW, })
-#define PMSG_SUSPEND	((struct pm_message){ .event = PM_EVENT_SUSPEND, })
-#define PMSG_HIBERNATE	((struct pm_message){ .event = PM_EVENT_HIBERNATE, })
-#define PMSG_ON		((struct pm_message){ .event = PM_EVENT_ON, })
-
 struct dev_pm_info {
 	pm_message_t		power_state;
 	unsigned		can_wakeup:1;
@@ -190,11 +337,11 @@ struct dev_pm_info {
 #endif
 };
 
-extern int device_power_down(pm_message_t state);
-extern void device_power_up(void);
-extern void device_resume(void);
-
 #ifdef CONFIG_PM_SLEEP
+extern void device_power_up(pm_message_t state);
+extern void device_resume(pm_message_t state);
+
+extern int device_power_down(pm_message_t state);
 extern int device_suspend(pm_message_t state);
 extern int device_prepare_suspend(pm_message_t state);
 
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -124,26 +124,129 @@ void device_pm_schedule_removal(struct d
 }
 EXPORT_SYMBOL_GPL(device_pm_schedule_removal);
 
+/**
+ *	pm_op - execute the PM operation appropiate for given PM event
+ *	@dev:	Device.
+ *	@ops:	PM operations to choose from.
+ *	@state:	PM event message.
+ */
+static int pm_op(struct device *dev, struct pm_ops *ops, pm_message_t state)
+{
+	int error = 0;
+
+	switch (state.event) {
+#ifdef CONFIG_SUSPEND
+	case PM_EVENT_SUSPEND:
+		if (ops->suspend) {
+			error = ops->suspend(dev);
+			suspend_report_result(ops->suspend, error);
+		}
+		break;
+	case PM_EVENT_RESUME:
+		if (ops->resume) {
+			error = ops->resume(dev);
+			suspend_report_result(ops->resume, error);
+		}
+		break;
+#endif /* CONFIG_SUSPEND */
+#ifdef CONFIG_HIBERNATION
+	case PM_EVENT_FREEZE:
+		if (ops->freeze) {
+			error = ops->freeze(dev);
+			suspend_report_result(ops->freeze, error);
+		}
+		break;
+	case PM_EVENT_QUIESCE:
+		if (ops->quiesce) {
+			error = ops->quiesce(dev);
+			suspend_report_result(ops->quiesce, error);
+		}
+		break;
+	case PM_EVENT_HIBERNATE:
+		if (ops->poweroff) {
+			error = ops->poweroff(dev);
+			suspend_report_result(ops->poweroff, error);
+		}
+		break;
+	case PM_EVENT_THAW:
+		if (ops->thaw) {
+			error = ops->thaw(dev);
+			suspend_report_result(ops->thaw, error);
+		}
+		break;
+	case PM_EVENT_RESTORE:
+		if (ops->restore) {
+			error = ops->restore(dev);
+			suspend_report_result(ops->restore, error);
+		}
+		break;
+	case PM_EVENT_RECOVER:
+		if (ops->recover) {
+			error = ops->recover(dev);
+			suspend_report_result(ops->recover, error);
+		}
+		break;
+#endif /* CONFIG_HIBERNATION */
+	default:
+		error = -EINVAL;
+	}
+	return error;
+}
+
+static char *pm_verb(int event)
+{
+	switch (event) {
+	case PM_EVENT_SUSPEND:
+		return "suspend";
+	case PM_EVENT_RESUME:
+		return "resume";
+	case PM_EVENT_FREEZE:
+		return "freeze";
+	case PM_EVENT_QUIESCE:
+		return "quiesce";
+	case PM_EVENT_HIBERNATE:
+		return "hibernate";
+	case PM_EVENT_THAW:
+		return "thaw";
+	case PM_EVENT_RESTORE:
+		return "restore";
+	default:
+		return "(unknown PM event)";
+	}
+}
+
+static void pm_dev_dbg(struct device *dev, pm_message_t state, char *info)
+{
+	dev_dbg(dev, "%s%s%s\n", info, pm_verb(state.event),
+		((state.event & PM_EVENT_SLEEP) && device_may_wakeup(dev)) ?
+		", may wakeup" : "");
+}
+
 /*------------------------- Resume routines -------------------------*/
 
 /**
- *	resume_device_early - Power on one device (early resume).
+ *	resume_device_noirq - Power on one device (early resume).
  *	@dev:	Device.
+ *	@state: Operation to carry out.
  *
  *	Must be called with interrupts disabled.
  */
-static int resume_device_early(struct device *dev)
+static int resume_device_noirq(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
 	TRACE_DEVICE(dev);
 	TRACE_RESUME(0);
 
-	if (dev->bus && dev->bus->resume_early) {
-		dev_dbg(dev, "EARLY resume\n");
-		error = dev->bus->resume_early(dev);
-	}
+	if (!dev->bus)
+		goto End;
 
+	pm_dev_dbg(dev, state, "EARLY ");
+	if (dev->bus->pm_noirq)
+		error = pm_op(dev, dev->bus->pm_noirq, state);
+	else if (dev->bus->resume_early)
+		error = dev->bus->resume_early(dev);
+ End:
 	TRACE_RESUME(error);
 	return error;
 }
@@ -158,7 +261,7 @@ static int resume_device_early(struct de
  *
  *	Must be called with interrupts disabled and only one CPU running.
  */
-static void dpm_power_up(void)
+static void dpm_power_up(pm_message_t state)
 {
 
 	while (!list_empty(&dpm_off_irq)) {
@@ -166,7 +269,7 @@ static void dpm_power_up(void)
 		struct device *dev = to_device(entry);
 
 		list_move_tail(entry, &dpm_off);
-		resume_device_early(dev);
+		resume_device_noirq(dev, state);
 	}
 }
 
@@ -178,19 +281,49 @@ static void dpm_power_up(void)
  *
  *	Must be called with interrupts disabled.
  */
-void device_power_up(void)
+void device_power_up(pm_message_t state)
 {
 	sysdev_resume();
-	dpm_power_up();
+	dpm_power_up(state);
 }
 EXPORT_SYMBOL_GPL(device_power_up);
 
 /**
+ *	complete_device - Complete a PM transition for given device
+ *	@dev:	Device.
+ *	@state:	Power transition we are completing.
+ */
+static void complete_device(struct device *dev, pm_message_t state)
+{
+	down(&dev->sem);
+
+	if (dev->bus) {
+		pm_dev_dbg(dev, state, "completing ");
+		if (dev->bus->pm && dev->bus->pm->complete)
+			dev->bus->pm->complete(dev);
+	}
+
+	if (dev->type) {
+		pm_dev_dbg(dev, state, "completing type ");
+		if (dev->type->pm && dev->type->pm->complete)
+			dev->type->pm->complete(dev);
+	}
+
+	if (dev->class) {
+		pm_dev_dbg(dev, state, "completing class ");
+		if (dev->class->pm && dev->class->pm->complete)
+			dev->class->pm->complete(dev);
+	}
+
+	up(&dev->sem);
+}
+
+/**
  *	resume_device - Restore state for one device.
  *	@dev:	Device.
- *
+ *	@state:	Operation to carry out.
  */
-static int resume_device(struct device *dev)
+static int resume_device(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
@@ -199,21 +332,34 @@ static int resume_device(struct device *
 
 	down(&dev->sem);
 
-	if (dev->bus && dev->bus->resume) {
-		dev_dbg(dev,"resuming\n");
-		error = dev->bus->resume(dev);
+	if (dev->bus) {
+		pm_dev_dbg(dev, state, "");
+		if (dev->bus->pm)
+			error = pm_op(dev, dev->bus->pm, state);
+		else if (dev->bus->resume)
+			error = dev->bus->resume(dev);
 	}
+	if (error)
+		goto End;
 
-	if (!error && dev->type && dev->type->resume) {
-		dev_dbg(dev,"resuming\n");
-		error = dev->type->resume(dev);
+	if (dev->type) {
+		pm_dev_dbg(dev, state, "type ");
+		if (dev->type->pm)
+			error = pm_op(dev, dev->type->pm, state);
+		else if (dev->type->resume)
+			error = dev->type->resume(dev);
 	}
+	if (error)
+		goto End;
 
-	if (!error && dev->class && dev->class->resume) {
-		dev_dbg(dev,"class resume\n");
-		error = dev->class->resume(dev);
+	if (dev->class) {
+		pm_dev_dbg(dev, state, "class ");
+		if (dev->class->pm)
+			error = pm_op(dev, dev->class->pm, state);
+		else if (dev->class->resume)
+			error = dev->class->resume(dev);
 	}
-
+ End:
 	up(&dev->sem);
 
 	TRACE_RESUME(error);
@@ -228,9 +374,9 @@ static int resume_device(struct device *
  *	went through the early resume.
  *
  *	Take devices from the dpm_off_list, resume them,
- *	and put them on the dpm_locked list.
+ *	and put them on the dpm_active list.
  */
-static void dpm_resume(void)
+static void dpm_resume(pm_message_t state)
 {
 	mutex_lock(&dpm_list_mtx);
 	all_sleeping = false;
@@ -241,7 +387,8 @@ static void dpm_resume(void)
 		list_move_tail(entry, &dpm_active);
 		dev->power.sleeping = false;
 		mutex_unlock(&dpm_list_mtx);
-		resume_device(dev);
+		resume_device(dev, state);
+		complete_device(dev, state);
 		mutex_lock(&dpm_list_mtx);
 	}
 	mutex_unlock(&dpm_list_mtx);
@@ -269,14 +416,15 @@ static void unregister_dropped_devices(v
 
 /**
  *	device_resume - Restore state of each device in system.
+ *	@state: Operation to carry out.
  *
  *	Resume all the devices, unlock them all, and allow new
  *	devices to be registered once again.
  */
-void device_resume(void)
+void device_resume(pm_message_t state)
 {
 	might_sleep();
-	dpm_resume();
+	dpm_resume(state);
 	unregister_dropped_devices();
 }
 EXPORT_SYMBOL_GPL(device_resume);
@@ -284,37 +432,43 @@ EXPORT_SYMBOL_GPL(device_resume);
 
 /*------------------------- Suspend routines -------------------------*/
 
-static inline char *suspend_verb(u32 event)
+/**
+ *	resume_event - return a PM message representing the resume event
+ *	               corresponding to given sleep state.
+ *	@sleep_state - PM message representing a sleep state
+ */
+static pm_message_t resume_event(pm_message_t sleep_state)
 {
-	switch (event) {
-	case PM_EVENT_SUSPEND:	return "suspend";
-	case PM_EVENT_FREEZE:	return "freeze";
-	case PM_EVENT_PRETHAW:	return "prethaw";
-	default:		return "(unknown suspend event)";
+	switch (sleep_state.event) {
+	case PM_EVENT_SUSPEND:
+		return PMSG_RESUME;
+	case PM_EVENT_FREEZE:
+	case PM_EVENT_QUIESCE:
+		return PMSG_RECOVER;
+	case PM_EVENT_HIBERNATE:
+		return PMSG_RESTORE;
 	}
-}
-
-static void
-suspend_device_dbg(struct device *dev, pm_message_t state, char *info)
-{
-	dev_dbg(dev, "%s%s%s\n", info, suspend_verb(state.event),
-		((state.event == PM_EVENT_SUSPEND) && device_may_wakeup(dev)) ?
-		", may wakeup" : "");
+	return PMSG_ON;
 }
 
 /**
- *	suspend_device_late - Shut down one device (late suspend).
+ *	suspend_device_noirq - Shut down one device (late suspend).
  *	@dev:	Device.
  *	@state:	Power state device is entering.
  *
  *	This is called with interrupts off and only a single CPU running.
  */
-static int suspend_device_late(struct device *dev, pm_message_t state)
+static int suspend_device_noirq(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
-	if (dev->bus && dev->bus->suspend_late) {
-		suspend_device_dbg(dev, state, "LATE ");
+	if (!dev->bus)
+		return 0;
+
+	pm_dev_dbg(dev, state, "LATE ");
+	if (dev->bus->pm_noirq) {
+		error = pm_op(dev, dev->bus->pm_noirq, state);
+	} else if (dev->bus->suspend_late) {
 		error = dev->bus->suspend_late(dev, state);
 		suspend_report_result(dev->bus->suspend_late, error);
 	}
@@ -339,7 +493,7 @@ int device_power_down(pm_message_t state
 		struct list_head *entry = dpm_off.prev;
 		struct device *dev = to_device(entry);
 
-		error = suspend_device_late(dev, state);
+		error = suspend_device_noirq(dev, state);
 		if (error) {
 			printk(KERN_ERR "Could not power down device %s: "
 					"error %d\n",
@@ -353,45 +507,100 @@ int device_power_down(pm_message_t state
 	if (!error)
 		error = sysdev_suspend(state);
 	if (error)
-		dpm_power_up();
+		dpm_power_up(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_power_down);
 
 /**
- *	suspend_device - Save state of one device.
+ *	prepare_device - Execute the ->prepare() callback(s) for given device.
  *	@dev:	Device.
- *	@state:	Power state device is entering.
+ *	@state: Operation we are preparing for.
  */
-static int suspend_device(struct device *dev, pm_message_t state)
+static int prepare_device(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
 	down(&dev->sem);
 
-	if (dev->power.power_state.event) {
-		dev_dbg(dev, "PM: suspend %d-->%d\n",
-			dev->power.power_state.event, state.event);
+	if (dev->bus) {
+		pm_dev_dbg(dev, state, "preparing ");
+		if (dev->bus->pm && dev->bus->pm->prepare) {
+			error = dev->bus->pm->prepare(dev);
+			suspend_report_result(dev->bus->pm->prepare, error);
+		}
 	}
+	if (error)
+		goto End;
 
-	if (dev->class && dev->class->suspend) {
-		suspend_device_dbg(dev, state, "class ");
-		error = dev->class->suspend(dev, state);
-		suspend_report_result(dev->class->suspend, error);
+	if (dev->type) {
+		pm_dev_dbg(dev, state, "preparing type ");
+		if (dev->type->pm && dev->type->pm->prepare) {
+			error = dev->type->pm->prepare(dev);
+			suspend_report_result(dev->type->pm->prepare, error);
+		}
 	}
+	if (error)
+		goto End;
 
-	if (!error && dev->type && dev->type->suspend) {
-		suspend_device_dbg(dev, state, "type ");
-		error = dev->type->suspend(dev, state);
-		suspend_report_result(dev->type->suspend, error);
+	if (dev->class) {
+		pm_dev_dbg(dev, state, "preparing class ");
+		if (dev->class->pm && dev->class->pm->prepare) {
+			error = dev->class->pm->prepare(dev);
+			suspend_report_result(dev->class->pm->prepare, error);
+		}
 	}
+ End:
+	up(&dev->sem);
+
+	return error;
+}
 
-	if (!error && dev->bus && dev->bus->suspend) {
-		suspend_device_dbg(dev, state, "");
-		error = dev->bus->suspend(dev, state);
-		suspend_report_result(dev->bus->suspend, error);
+/**
+ *	suspend_device - Save state of one device.
+ *	@dev:	Device.
+ *	@state:	Power state device is entering.
+ */
+static int suspend_device(struct device *dev, pm_message_t state)
+{
+	int error = 0;
+
+	down(&dev->sem);
+
+	if (dev->class) {
+		pm_dev_dbg(dev, state, "class ");
+		if (dev->class->pm) {
+			error = pm_op(dev, dev->class->pm, state);
+		} else if (dev->class->suspend) {
+			error = dev->class->suspend(dev, state);
+			suspend_report_result(dev->class->suspend, error);
+		}
 	}
+	if (error)
+		goto End;
+
+	if (dev->type) {
+		pm_dev_dbg(dev, state, "type ");
+		if (dev->type->pm) {
+			error = pm_op(dev, dev->type->pm, state);
+		} else if (dev->type->suspend) {
+			error = dev->type->suspend(dev, state);
+			suspend_report_result(dev->type->suspend, error);
+		}
+	}
+	if (error)
+		goto End;
 
+	if (dev->bus) {
+		pm_dev_dbg(dev, state, "");
+		if (dev->bus->pm) {
+			error = pm_op(dev, dev->bus->pm, state);
+		} else if (dev->bus->suspend) {
+			error = dev->bus->suspend(dev, state);
+			suspend_report_result(dev->bus->suspend, error);
+		}
+	}
+ End:
 	up(&dev->sem);
 
 	return error;
@@ -401,13 +610,8 @@ static int suspend_device(struct device 
  *	dpm_suspend - Suspend every device.
  *	@state:	Power state to put each device in.
  *
- *	Walk the dpm_locked list.  Suspend each device and move it
+ *	Walk the dpm_active list.  Suspend each device and move it
  *	to the dpm_off list.
- *
- *	(For historical reasons, if it returns -EAGAIN, that used to mean
- *	that the device would be called again with interrupts disabled.
- *	These days, we use the "suspend_late()" callback for that, so we
- *	print a warning and consider it an error).
  */
 static int dpm_suspend(pm_message_t state)
 {
@@ -419,21 +623,29 @@ static int dpm_suspend(pm_message_t stat
 		struct device *dev = to_device(entry);
 
 		WARN_ON(dev->parent && dev->parent->power.sleeping);
+		mutex_unlock(&dpm_list_mtx);
 
+		error = prepare_device(dev, state);
+
+		mutex_lock(&dpm_list_mtx);
+		if (error) {
+			if (error == -EAGAIN)
+				continue;
+			else
+				break;
+		}
 		dev->power.sleeping = true;
 		mutex_unlock(&dpm_list_mtx);
+
 		error = suspend_device(dev, state);
+
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
-			printk(KERN_ERR "Could not suspend device %s: "
-					"error %d%s\n",
-					kobject_name(&dev->kobj),
-					error,
-					(error == -EAGAIN ?
-					" (please convert to suspend_late)" :
-					""));
 			dev->power.sleeping = false;
-			break;
+			mutex_unlock(&dpm_list_mtx);
+
+			complete_device(dev, resume_event(state));
+			return error;
 		}
 		if (!list_empty(&dev->power.entry))
 			list_move(&dev->power.entry, &dpm_off);
@@ -449,8 +661,7 @@ static int dpm_suspend(pm_message_t stat
  *	device_suspend - Save state and stop all devices in system.
  *	@state: new power management state
  *
- *	Prevent new devices from being registered, then lock all devices
- *	and suspend them.
+ *	Prepare and suspend all devices.
  */
 int device_suspend(pm_message_t state)
 {
@@ -459,7 +670,7 @@ int device_suspend(pm_message_t state)
 	might_sleep();
 	error = dpm_suspend(state);
 	if (error)
-		device_resume();
+		device_resume(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_suspend);
Index: linux-2.6/include/linux/device.h
===================================================================
--- linux-2.6.orig/include/linux/device.h
+++ linux-2.6/include/linux/device.h
@@ -69,6 +69,9 @@ struct bus_type {
 	int (*resume_early)(struct device *dev);
 	int (*resume)(struct device *dev);
 
+	struct pm_ops *pm;
+	struct pm_ops *pm_noirq;
+
 	struct bus_type_private *p;
 };
 
@@ -202,6 +205,8 @@ struct class {
 
 	int (*suspend)(struct device *dev, pm_message_t state);
 	int (*resume)(struct device *dev);
+
+	struct pm_ops *pm;
 };
 
 extern int __must_check class_register(struct class *class);
@@ -345,8 +350,11 @@ struct device_type {
 	struct attribute_group **groups;
 	int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
 	void (*release)(struct device *dev);
+
 	int (*suspend)(struct device *dev, pm_message_t state);
 	int (*resume)(struct device *dev);
+
+	struct pm_ops *pm;
 };
 
 /* interface for exporting device attributes */
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -224,7 +224,7 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
-	device_power_up();
+	device_power_up(in_suspend ? PMSG_RECOVER : PMSG_RESTORE);
  Enable_irqs:
 	local_irq_enable();
 	return error;
@@ -280,7 +280,7 @@ int hibernation_snapshot(int platform_mo
  Finish:
 	platform_finish(platform_mode);
  Resume_devices:
-	device_resume();
+	device_resume(in_suspend ? PMSG_RECOVER : PMSG_RESTORE);
  Resume_console:
 	resume_console();
  Close:
@@ -301,7 +301,7 @@ static int resume_target_kernel(void)
 	int error;
 
 	local_irq_disable();
-	error = device_power_down(PMSG_PRETHAW);
+	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
@@ -329,7 +329,7 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
-	device_power_up();
+	device_power_up(PMSG_THAW);
  Enable_irqs:
 	local_irq_enable();
 	return error;
@@ -350,7 +350,7 @@ int hibernation_restore(int platform_mod
 
 	pm_prepare_console();
 	suspend_console();
-	error = device_suspend(PMSG_PRETHAW);
+	error = device_suspend(PMSG_QUIESCE);
 	if (error)
 		goto Finish;
 
@@ -362,7 +362,7 @@ int hibernation_restore(int platform_mod
 		enable_nonboot_cpus();
 	}
 	platform_restore_cleanup(platform_mode);
-	device_resume();
+	device_resume(PMSG_RECOVER);
  Finish:
 	resume_console();
 	pm_restore_console();
@@ -419,7 +419,7 @@ int hibernation_platform_enter(void)
  Finish:
 	hibernation_ops->finish();
  Resume_devices:
-	device_resume();
+	device_resume(PMSG_RESTORE);
  Resume_console:
 	resume_console();
  Close:
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -239,7 +239,7 @@ static int suspend_enter(suspend_state_t
 	if (!suspend_test(TEST_CORE))
 		error = suspend_ops->enter(state);
 
-	device_power_up();
+	device_power_up(PMSG_RESUME);
  Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
@@ -291,7 +291,7 @@ int suspend_devices_and_enter(suspend_st
 	if (suspend_ops->finish)
 		suspend_ops->finish();
  Resume_devices:
-	device_resume();
+	device_resume(PMSG_RESUME);
  Resume_console:
 	resume_console();
  Close:
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1221,9 +1221,9 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
-	device_power_up();
+	device_power_up(PMSG_RESUME);
 	local_irq_enable();
-	device_resume();
+	device_resume(PMSG_RESUME);
 	pm_send_all(PM_RESUME, (void *)0);
 	queue_event(APM_NORMAL_RESUME, NULL);
  out:

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] PM: Separate suspend and hibernation callbacks (highest level) - updated
  2008-03-09  1:20 [RFC][PATCH] PM: Separate suspend and hibernation callbacks (highest level) - updated Rafael J. Wysocki
@ 2008-03-09  4:13 ` Alan Stern
  2008-03-09 13:28   ` Rafael J. Wysocki
  0 siblings, 1 reply; 10+ messages in thread
From: Alan Stern @ 2008-03-09  4:13 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, ACPI Devel Maling List, Alexey Starikovskiy, Len Brown,
	Pavel Machek, David Brownell

On Sun, 9 Mar 2008, Rafael J. Wysocki wrote:

> Hi,
> 
> Below is a new version of the patch that introduces the separation of suspend
> and hibernation callbacks on the highest level.
> 
> Relative to the previous version two additional callbacks have been added,
> ->prepare() and ->complete(), allowing to prepare the device for a "suspend"
> transition and to finalize the "resume" respectively.
> 
> Also, all callbacks and PM_EVENT_ messages are now described in the
> comments in include/linux/pm.h (the documentation in Documentation/power
> will be updated separately).
> 
> As previously, it is assumed that the existing (legacy) ->suspend() and
> ->resume() callbacks will be used for bus types, device classes and device
> types that don't implement the new callbacks.  As a result, the patch doesn't
> introduce any visible functional changes.
> 
> The patch has been compilation tested on x86-64.
> 
> Comments welcome.

On the whole this looks pretty good.  A few comments are below.

Alan Stern

> @@ -122,6 +120,169 @@ typedef struct pm_message {
>   * to the rest of the driver stack (such as a driver that's ON gating off
>   * clocks which are not in active use).
>   *
> + * The externally visible transitions are handled with the help of the following
> + * callbacks included in this structure:
> + *
> + * @prepare: Prepare the device for the upcoming transition, but do NOT change
> + *	its hardware state.  Prevent new children of the device from being
> + *	registered after @prepare() returns.  Also, if a concurrent child
> + *	registration already in progress is detected by @prepare(), it should
> + *	wait for the registration to complete, block the subsequent
> + *	registrations of children and return -EAGAIN, so that the PM core can
> + *	execute it once again (after suspending the newly registered child) to
> + *	recover from the race condition.  This method is executed for all kinds
> + *	of suspend transitions and is immediately followed by one of the suspend
> + *	callbacks: @suspend(), @freeze(), @poweroff(), or @quiesce().

Didn't we decide it would be better to call @prepare in a separate
pass?  In any case, drivers aren't in a position to detect concurrent
registrations.  The PM core shouldn't require them to return -EAGAIN; 
it can detect these registrations by itself.

It is also worth mentioning that subsystems should treat calls to probe 
methods the same as child registrations: wait for concurrent ones to 
complete and block subsequent ones.  This will of course be handled 
automatically for probes initiating in the driver core.

Finally, there needs to be a new "prepared" field added to dev_pm_info,
right next to the "sleeping" field.  Mention that it will be set prior
to calling @prepare and cleared prior to calling @complete.  Even
though it is read-only to drivers, they will find it useful.

> + * @recover: Hibernation-specific, executed after a failing creation of a
> + *	hibernation image OR after a failing attempt to restore the contents of
> + *	main memory from such an image.  Undo the changes made by the preceding
> + *	@freeze() or @quiesce, so the device can be operated in the same was as

s/was/way/

> + *	immediately before the failing transition.
> + */

Don't you mean the preceding @poweroff or @quiesce?  When recovering 
after @freeze(), you can use @thaw instead of @recover.

>  struct dev_pm_info {
>  	pm_message_t		power_state;
>  	unsigned		can_wakeup:1;

Add the new "prepared" field.

> -static int resume_device_early(struct device *dev)
> +static int resume_device_noirq(struct device *dev, pm_message_t state)
>  {
>  	int error = 0;
>  
>  	TRACE_DEVICE(dev);
>  	TRACE_RESUME(0);
>  
> -	if (dev->bus && dev->bus->resume_early) {
> -		dev_dbg(dev, "EARLY resume\n");
> -		error = dev->bus->resume_early(dev);
> -	}
> +	if (!dev->bus)
> +		goto End;
>  
> +	pm_dev_dbg(dev, state, "EARLY ");
> +	if (dev->bus->pm_noirq)
> +		error = pm_op(dev, dev->bus->pm_noirq, state);
> +	else if (dev->bus->resume_early)
> +		error = dev->bus->resume_early(dev);

Here and below, you have the pm_dev_dbg() call before the test for
whether the method exists.  This will generate a lot of useless
debugging output.  Although it makes the code a bit larger, IMO it
would be to test for the existence of either the new or the legacy
method before calling pm_dev_dbg(), like you do in the pm_op() routine.

> -static void dpm_resume(void)
> +static void dpm_resume(pm_message_t state)
>  {
>  	mutex_lock(&dpm_list_mtx);
>  	all_sleeping = false;
> @@ -241,7 +387,8 @@ static void dpm_resume(void)
>  		list_move_tail(entry, &dpm_active);
>  		dev->power.sleeping = false;
>  		mutex_unlock(&dpm_list_mtx);
> -		resume_device(dev);
> +		resume_device(dev, state);

Clear dev->power.prepared.

> +		complete_device(dev, state);
>  		mutex_lock(&dpm_list_mtx);
>  	}
>  	mutex_unlock(&dpm_list_mtx);

> @@ -419,21 +623,29 @@ static int dpm_suspend(pm_message_t stat
>  		struct device *dev = to_device(entry);
>  
>  		WARN_ON(dev->parent && dev->parent->power.sleeping);
> +		mutex_unlock(&dpm_list_mtx);
>  

If dev->power.prepared is already set, skip calling prepare_device().
Otherwise set dev->power.prepared and proceed.

> +		error = prepare_device(dev, state);
> +
> +		mutex_lock(&dpm_list_mtx);
> +		if (error) {

Clear dev->power.prepared.

> +			if (error == -EAGAIN)
> +				continue;
> +			else
> +				break;
> +		}

At this point, if dev isn't still the last entry on dpm_active then
start the next loop iteration.  As a corollary, in case a suspend
aborts partway through, after the resume code has put all the devices
back onto dpm_active you will still have to run through the devices
that were already on dpm_active, calling @complete if
dev->power.prepared is set.

>  		dev->power.sleeping = true;
>  		mutex_unlock(&dpm_list_mtx);
> +
>  		error = suspend_device(dev, state);
> +
>  		mutex_lock(&dpm_list_mtx);
>  		if (error) {


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] PM: Separate suspend and hibernation callbacks (highest level) - updated
  2008-03-09  4:13 ` Alan Stern
@ 2008-03-09 13:28   ` Rafael J. Wysocki
  2008-03-09 19:56     ` Alan Stern
  0 siblings, 1 reply; 10+ messages in thread
From: Rafael J. Wysocki @ 2008-03-09 13:28 UTC (permalink / raw)
  To: Alan Stern
  Cc: pm list, ACPI Devel Maling List, Alexey Starikovskiy, Len Brown,
	Pavel Machek, David Brownell

On Sunday, 9 of March 2008, Alan Stern wrote:
> On Sun, 9 Mar 2008, Rafael J. Wysocki wrote:
> 
> > Below is a new version of the patch that introduces the separation of suspend
> > and hibernation callbacks on the highest level.
> > 
> > Relative to the previous version two additional callbacks have been added,
> > ->prepare() and ->complete(), allowing to prepare the device for a "suspend"
> > transition and to finalize the "resume" respectively.
> > 
> > Also, all callbacks and PM_EVENT_ messages are now described in the
> > comments in include/linux/pm.h (the documentation in Documentation/power
> > will be updated separately).
> > 
> > As previously, it is assumed that the existing (legacy) ->suspend() and
> > ->resume() callbacks will be used for bus types, device classes and device
> > types that don't implement the new callbacks.  As a result, the patch doesn't
> > introduce any visible functional changes.
> > 
> > The patch has been compilation tested on x86-64.
> > 
> > Comments welcome.
> 
> On the whole this looks pretty good.  A few comments are below.
> 
> > @@ -122,6 +120,169 @@ typedef struct pm_message {
> >   * to the rest of the driver stack (such as a driver that's ON gating off
> >   * clocks which are not in active use).
> >   *
> > + * The externally visible transitions are handled with the help of the following
> > + * callbacks included in this structure:
> > + *
> > + * @prepare: Prepare the device for the upcoming transition, but do NOT change
> > + *	its hardware state.  Prevent new children of the device from being
> > + *	registered after @prepare() returns.  Also, if a concurrent child
> > + *	registration already in progress is detected by @prepare(), it should
> > + *	wait for the registration to complete, block the subsequent
> > + *	registrations of children and return -EAGAIN, so that the PM core can
> > + *	execute it once again (after suspending the newly registered child) to
> > + *	recover from the race condition.  This method is executed for all kinds
> > + *	of suspend transitions and is immediately followed by one of the suspend
> > + *	callbacks: @suspend(), @freeze(), @poweroff(), or @quiesce().
> 
> Didn't we decide it would be better to call @prepare in a separate
> pass?

Afterwards I realized that it would be racy.  Namely, once a new device is
registered, it may require ->prepare() to be called before ->suspend(), so if
that happens after we've called ->prepare() for all devices, we'll be in
trouble.

> In any case, drivers aren't in a position to detect concurrent 
> registrations.  The PM core shouldn't require them to return -EAGAIN; 
> it can detect these registrations by itself.

In the appended new version of the patch, the core does it.  However, in theory
there may be situations in which the core can't detect a race condition and
recover from it, so I think it's reasonable to leave the possibility to signal
such a condition by the driver.

> It is also worth mentioning that subsystems should treat calls to probe 
> methods the same as child registrations: wait for concurrent ones to 
> complete and block subsequent ones.  This will of course be handled 
> automatically for probes initiating in the driver core.
> 
> Finally, there needs to be a new "prepared" field added to dev_pm_info,
> right next to the "sleeping" field.  Mention that it will be set prior
> to calling @prepare and cleared prior to calling @complete.  Even
> though it is read-only to drivers, they will find it useful.

Yes, I noticed that after sending the patch.  I handled it in a slightly
different way, by converting 'sleeping' into 'status' that may assum one of
the three values: DPM_ON, DPM_SUSPENDING, DPM_OFF.
 
> > + * @recover: Hibernation-specific, executed after a failing creation of a
> > + *	hibernation image OR after a failing attempt to restore the contents of
> > + *	main memory from such an image.  Undo the changes made by the preceding
> > + *	@freeze() or @quiesce, so the device can be operated in the same was as
> 
> s/was/way/
> 
> > + *	immediately before the failing transition.
> > + */
> 
> Don't you mean the preceding @poweroff or @quiesce?  When recovering 
> after @freeze(), you can use @thaw instead of @recover.

No, ->restore() is called after a failing ->poweroff(), because ->poweroff()
may put the device into a low power state which ->recover() need not be able to
handle.

Also, I'd like to reserve ->thaw() for being called only after creating the
image and before saving it (in the future we may want some devices to stay
quiescent while the image is being saved).  At the same time, the operations
necessary to recover after ->freeze() and ->quiesce() should really be the same
(I don't see a reason why it may not be the case).

> >  struct dev_pm_info {
> >  	pm_message_t		power_state;
> >  	unsigned		can_wakeup:1;
> 
> Add the new "prepared" field.

Addressed by changing 'sleeping' to 'status'.
 
> > -static int resume_device_early(struct device *dev)
> > +static int resume_device_noirq(struct device *dev, pm_message_t state)
> >  {
> >  	int error = 0;
> >  
> >  	TRACE_DEVICE(dev);
> >  	TRACE_RESUME(0);
> >  
> > -	if (dev->bus && dev->bus->resume_early) {
> > -		dev_dbg(dev, "EARLY resume\n");
> > -		error = dev->bus->resume_early(dev);
> > -	}
> > +	if (!dev->bus)
> > +		goto End;
> >  
> > +	pm_dev_dbg(dev, state, "EARLY ");
> > +	if (dev->bus->pm_noirq)
> > +		error = pm_op(dev, dev->bus->pm_noirq, state);
> > +	else if (dev->bus->resume_early)
> > +		error = dev->bus->resume_early(dev);
> 
> Here and below, you have the pm_dev_dbg() call before the test for
> whether the method exists.  This will generate a lot of useless
> debugging output.  Although it makes the code a bit larger, IMO it
> would be to test for the existence of either the new or the legacy
> method before calling pm_dev_dbg(), like you do in the pm_op() routine.

Okay, I'll do that in the future, although it's not yet implemented in the
patch below.

The remaining comments should be addressed in the appended patch.

Thanks,
Rafael

---

 arch/x86/kernel/apm_32.c  |    4 
 drivers/base/power/main.c |  421 +++++++++++++++++++++++++++++++++++-----------
 include/linux/device.h    |    8 
 include/linux/pm.h        |  216 +++++++++++++++++++++--
 kernel/power/disk.c       |   14 -
 kernel/power/main.c       |    4 
 6 files changed, 536 insertions(+), 131 deletions(-)

Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -110,11 +110,9 @@ extern void (*pm_power_off_prepare)(void
 
 struct device;
 
-typedef struct pm_message {
-	int event;
-} pm_message_t;
-
-/*
+/**
+ * struct pm_ops - device PM callbacks
+ *
  * Several driver power state transitions are externally visible, affecting
  * the state of pending I/O queues and (for drivers that touch hardware)
  * interrupts, wakeups, DMA, and other hardware state.  There may also be
@@ -122,6 +120,175 @@ typedef struct pm_message {
  * to the rest of the driver stack (such as a driver that's ON gating off
  * clocks which are not in active use).
  *
+ * The externally visible transitions are handled with the help of the following
+ * callbacks included in this structure:
+ *
+ * @prepare: Prepare the device for the upcoming transition, but do NOT change
+ *	its hardware state.  Prevent new children of the device from being
+ *	registered and prevent new calls to the probe method from being made
+ *	after @prepare() returns.  Also, if a concurrent child registration
+ *	or a call to the probe method already in progress is detected by
+ *	@prepare(), it may return -EAGAIN, so that the PM core can execute it
+ *	once again (e.g. after suspending the newly registered child) to recover
+ *	from the race condition. This method is executed for all kinds of
+ *	suspend transitions and is immediately followed by one of the suspend
+ *	callbacks: @suspend(), @freeze(), @poweroff(), or @quiesce().
+ *	[NOTE: The PM core is able to detect a registration of a new child of a
+ *	device concurrent with this device's @prepare() callback, in which case
+ *	@complete() will be called for the device and the core will try again to
+ *	suspend it later.  However, the core is NOT able to handle a call to
+ *	the probe method concurrent with @prepare().]
+ *
+ * @complete: Undo the changes made by @prepare().  This method is executed for
+ *	all kinds of resume transitions, immediately following one of the resume
+ *	callbacks: @resume(), @thaw(), @restore(), or @recover().  Also executed
+ *	if a suspend callback (@suspend(), @freeze(), @poweroff(), @quiesce())
+ *	immediately following a successful @prepare() fails OR if @prepare()
+ *	returns -EAGAIN.
+ *
+ * @suspend: Executed before putting the system into a sleep state in which the
+ *	contents of main memory are preserved.  Quiesce the device, put it into
+ *	a low power state appropriate for the upcoming system state (such as
+ *	PCI_D3hot), and enable wakeup events as appropriate.
+ *
+ * @resume: Executed after waking the system up from a sleep state in which the
+ *	contents of main memory were preserved.  Put the device into the
+ *	appropriate state, according to the information saved in memory by the
+ *	preceding @suspend().  The driver starts working again, responding to
+ *	hardware events and software requests.  The hardware may have gone
+ *	through a power-off reset, or it may have maintained state from the
+ *	previous suspend() which the driver may rely on while resuming.  On most
+ *	platforms, there are no restrictions on availability of resources like
+ *	clocks during @resume().
+ *
+ * @freeze: Hibernation-specific, executed before creating a hibernation image.
+ *	Quiesce operations so that a consistent image can be created, but do NOT
+ *	otherwise put the device into a low power device state and do NOT emit
+ *	system wakeup events.  Save in main memory the device settings to be
+ *	used by @restore() during the subsequent resume from hibernation.
+ *
+ * @thaw: Hibernation-specific, executed after creating a hibernation image.
+ *	Undo the changes made by the preceding @freeze(), so the device can be
+ *	operated in the same was as immediately before the call to @freeze().
+ *
+ * @poweroff: Hibernation-specific, executed after saving a hibernation image.
+ *	Quiesce the device, put it into a low power state appropriate for the
+ *	upcoming system state (such as PCI_D3hot), and enable wakeup events as
+ *	appropriate.
+ *
+ * @quiesce: Hibernation-specific, executed after loading a hibernation image
+ *	and before restoring the contents of main memory from it.  Quiesce
+ *	operations so that the contents of main memory can be restored from the
+ *	image in a consistent way, but do NOT otherwise put the device into a
+ *	low power state and do NOT emit system wakeup events.  Do NOT save any
+ *	device settings in main memory.
+ *
+ * @restore: Hibernation-specific, executed after restoring the contents of main
+ *	memory from a hibernation image.  Driver starts working again,
+ *	responding to hardware events and software requests.  Do NOT make ANY
+ *	assumptions about the hardware state right prior to @restore().  On most
+ *	platforms, there are no restrictions on availability of resources like
+ *	clocks during @restore().
+ *
+ * @recover: Hibernation-specific, executed after a failing creation of a
+ *	hibernation image OR after a failing attempt to restore the contents of
+ *	main memory from such an image.  Undo the changes made by the preceding
+ *	@freeze() or @quiesce(), so the device can be operated in the same way
+ *	as immediately before the failing transition.
+ */
+
+struct pm_ops {
+#ifdef CONFIG_PM_SLEEP
+	int (*prepare)(struct device *dev);
+	void (*complete)(struct device *dev);
+#endif
+#ifdef CONFIG_SUSPEND
+	int (*suspend)(struct device *dev);
+	int (*resume)(struct device *dev);
+#endif
+#ifdef CONFIG_HIBERNATION
+	int (*freeze)(struct device *dev);
+	int (*thaw)(struct device *dev);
+	int (*poweroff)(struct device *dev);
+	int (*quiesce)(struct device *dev);
+	int (*restore)(struct device *dev);
+	int (*recover)(struct device *dev);
+#endif
+};
+
+/**
+ * PM_EVENT_ messages
+ *
+ * The following PM_EVENT_ messages are defined for the internal use of the PM
+ * core, in order to provide a mechanism allowing the high level suspend and
+ * hibernation code to convey the necessary information to the device PM core
+ * code:
+ *
+ * ON		No transition.
+ *
+ * FREEZE 	System is going to hibernate, call ->prepare() and ->freeze()
+ *		for all devices.
+ *
+ * SUSPEND	System is going to suspend, call ->prepare() and ->suspend()
+ *		for all devices.
+ *
+ * HIBERNATE	Hibernation image has been saved, call ->prepare() and
+ *		->poweroff() for all devices.
+ *
+ * QUIESCE	Contents of main memory are going to be restored from a (loaded)
+ *		hibernation image, call ->prepare() and ->quiesce() for all
+ *		devices.
+ *
+ * RESUME	System is resuming, call ->resume() and ->complete() for all
+ *		devices.
+ *
+ * THAW		Hibernation image has been created, call ->thaw() and
+ *		->complete() for all devices.
+ *
+ * RESTORE	Contents of main memory have been restored from a hibernation
+ *		image, call ->restore() and ->complete() for all devices.
+ *
+ * RECOVER	Creation of a hibernation image or restoration of the main
+ *		memory contents from a hibernation image has failed, call
+ *		->recover() and ->complete() for all devices.
+ */
+
+typedef struct pm_message {
+	int event;
+} pm_message_t;
+
+#define PM_EVENT_ON		0x0000
+#define PM_EVENT_FREEZE 	0x0001
+#define PM_EVENT_SUSPEND	0x0002
+#define PM_EVENT_HIBERNATE	0x0004
+#define PM_EVENT_QUIESCE	0x0008
+#define PM_EVENT_RESUME		0x0010
+#define PM_EVENT_THAW		0x0020
+#define PM_EVENT_RESTORE	0x0040
+#define PM_EVENT_RECOVER	0x0080
+
+#define PM_EVENT_SLEEP	(PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE)
+
+#define PMSG_FREEZE	((struct pm_message){ .event = PM_EVENT_FREEZE, })
+#define PMSG_QUIESCE	((struct pm_message){ .event = PM_EVENT_QUIESCE, })
+#define PMSG_SUSPEND	((struct pm_message){ .event = PM_EVENT_SUSPEND, })
+#define PMSG_HIBERNATE	((struct pm_message){ .event = PM_EVENT_HIBERNATE, })
+#define PMSG_RESUME	((struct pm_message){ .event = PM_EVENT_RESUME, })
+#define PMSG_THAW	((struct pm_message){ .event = PM_EVENT_THAW, })
+#define PMSG_RESTORE	((struct pm_message){ .event = PM_EVENT_RESTORE, })
+#define PMSG_RECOVER	((struct pm_message){ .event = PM_EVENT_RECOVER, })
+#define PMSG_ON		((struct pm_message){ .event = PM_EVENT_ON, })
+
+/*
+ * The PM_EVENT_ messages are also used by drivers implementing the legacy
+ * suspend framework, based on the ->suspend() and ->resume() callbacks common
+ * for suspend and hibernation transitions, according to the rules below.
+ */
+
+/* Necessary, because several drivers use PM_EVENT_PRETHAW */
+#define PM_EVENT_PRETHAW PM_EVENT_QUIESCE
+
+/*
  * One transition is triggered by resume(), after a suspend() call; the
  * message is implicit:
  *
@@ -166,35 +333,40 @@ typedef struct pm_message {
  * or from system low-power states such as standby or suspend-to-RAM.
  */
 
-#define PM_EVENT_ON 0
-#define PM_EVENT_FREEZE 1
-#define PM_EVENT_SUSPEND 2
-#define PM_EVENT_HIBERNATE 4
-#define PM_EVENT_PRETHAW 8
-
-#define PM_EVENT_SLEEP	(PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE)
+/**
+ * Device power management states
+ *
+ * These state labels are used internally by the PM core to indicate the current
+ * status of a device with respect to the PM core operations.
+ *
+ * DPM_ON		Device is regarded as operational.
+ *
+ * DPM_SUSPENDING	Device is currently being suspended.
+ *
+ * DPM_OFF		Device is regarded as suspended.
+ */
 
-#define PMSG_FREEZE	((struct pm_message){ .event = PM_EVENT_FREEZE, })
-#define PMSG_PRETHAW	((struct pm_message){ .event = PM_EVENT_PRETHAW, })
-#define PMSG_SUSPEND	((struct pm_message){ .event = PM_EVENT_SUSPEND, })
-#define PMSG_HIBERNATE	((struct pm_message){ .event = PM_EVENT_HIBERNATE, })
-#define PMSG_ON		((struct pm_message){ .event = PM_EVENT_ON, })
+enum dpm_state {
+	DPM_ON,
+	DPM_SUSPENDING,
+	DPM_OFF,
+};
 
 struct dev_pm_info {
 	pm_message_t		power_state;
 	unsigned		can_wakeup:1;
-	bool			sleeping:1;	/* Owned by the PM core */
+	enum dpm_state		status:2;	/* Owned by the PM core */
 #ifdef	CONFIG_PM_SLEEP
 	unsigned		should_wakeup:1;
 	struct list_head	entry;
 #endif
 };
 
-extern int device_power_down(pm_message_t state);
-extern void device_power_up(void);
-extern void device_resume(void);
-
 #ifdef CONFIG_PM_SLEEP
+extern void device_power_up(pm_message_t state);
+extern void device_resume(pm_message_t state);
+
+extern int device_power_down(pm_message_t state);
 extern int device_suspend(pm_message_t state);
 extern int device_prepare_suspend(pm_message_t state);
 
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -71,22 +71,29 @@ int device_pm_add(struct device *dev)
 		 dev->bus ? dev->bus->name : "No Bus",
 		 kobject_name(&dev->kobj));
 	mutex_lock(&dpm_list_mtx);
-	if ((dev->parent && dev->parent->power.sleeping) || all_sleeping) {
-		if (dev->parent->power.sleeping)
-			dev_warn(dev,
-				"parent %s is sleeping, will not add\n",
+	if (all_sleeping) {
+		dev_warn(dev, "devices are sleeping, will not add\n");
+		goto Failure;
+	}
+	if (dev->parent) {
+		if (dev->parent->power.status == DPM_OFF) {
+			dev_warn(dev, "parent %s is sleeping, will not add\n",
 				dev->parent->bus_id);
-		else
-			dev_warn(dev, "devices are sleeping, will not add\n");
-		WARN_ON(true);
-		error = -EBUSY;
-	} else {
-		error = dpm_sysfs_add(dev);
-		if (!error)
-			list_add_tail(&dev->power.entry, &dpm_active);
+			goto Failure;
+		} else if (dev->parent->power.status == DPM_SUSPENDING) {
+			dev->parent->power.status == DPM_ON;
+		}
 	}
+	error = dpm_sysfs_add(dev);
+	if (!error)
+		list_add_tail(&dev->power.entry, &dpm_active);
+ End:
 	mutex_unlock(&dpm_list_mtx);
 	return error;
+ Failure:
+	WARN_ON(true);
+	error = -EBUSY;
+	goto End;
 }
 
 /**
@@ -124,26 +131,129 @@ void device_pm_schedule_removal(struct d
 }
 EXPORT_SYMBOL_GPL(device_pm_schedule_removal);
 
+/**
+ *	pm_op - execute the PM operation appropiate for given PM event
+ *	@dev:	Device.
+ *	@ops:	PM operations to choose from.
+ *	@state:	PM event message.
+ */
+static int pm_op(struct device *dev, struct pm_ops *ops, pm_message_t state)
+{
+	int error = 0;
+
+	switch (state.event) {
+#ifdef CONFIG_SUSPEND
+	case PM_EVENT_SUSPEND:
+		if (ops->suspend) {
+			error = ops->suspend(dev);
+			suspend_report_result(ops->suspend, error);
+		}
+		break;
+	case PM_EVENT_RESUME:
+		if (ops->resume) {
+			error = ops->resume(dev);
+			suspend_report_result(ops->resume, error);
+		}
+		break;
+#endif /* CONFIG_SUSPEND */
+#ifdef CONFIG_HIBERNATION
+	case PM_EVENT_FREEZE:
+		if (ops->freeze) {
+			error = ops->freeze(dev);
+			suspend_report_result(ops->freeze, error);
+		}
+		break;
+	case PM_EVENT_QUIESCE:
+		if (ops->quiesce) {
+			error = ops->quiesce(dev);
+			suspend_report_result(ops->quiesce, error);
+		}
+		break;
+	case PM_EVENT_HIBERNATE:
+		if (ops->poweroff) {
+			error = ops->poweroff(dev);
+			suspend_report_result(ops->poweroff, error);
+		}
+		break;
+	case PM_EVENT_THAW:
+		if (ops->thaw) {
+			error = ops->thaw(dev);
+			suspend_report_result(ops->thaw, error);
+		}
+		break;
+	case PM_EVENT_RESTORE:
+		if (ops->restore) {
+			error = ops->restore(dev);
+			suspend_report_result(ops->restore, error);
+		}
+		break;
+	case PM_EVENT_RECOVER:
+		if (ops->recover) {
+			error = ops->recover(dev);
+			suspend_report_result(ops->recover, error);
+		}
+		break;
+#endif /* CONFIG_HIBERNATION */
+	default:
+		error = -EINVAL;
+	}
+	return error;
+}
+
+static char *pm_verb(int event)
+{
+	switch (event) {
+	case PM_EVENT_SUSPEND:
+		return "suspend";
+	case PM_EVENT_RESUME:
+		return "resume";
+	case PM_EVENT_FREEZE:
+		return "freeze";
+	case PM_EVENT_QUIESCE:
+		return "quiesce";
+	case PM_EVENT_HIBERNATE:
+		return "hibernate";
+	case PM_EVENT_THAW:
+		return "thaw";
+	case PM_EVENT_RESTORE:
+		return "restore";
+	default:
+		return "(unknown PM event)";
+	}
+}
+
+static void pm_dev_dbg(struct device *dev, pm_message_t state, char *info)
+{
+	dev_dbg(dev, "%s%s%s\n", info, pm_verb(state.event),
+		((state.event & PM_EVENT_SLEEP) && device_may_wakeup(dev)) ?
+		", may wakeup" : "");
+}
+
 /*------------------------- Resume routines -------------------------*/
 
 /**
- *	resume_device_early - Power on one device (early resume).
+ *	resume_device_noirq - Power on one device (early resume).
  *	@dev:	Device.
+ *	@state: Operation to carry out.
  *
  *	Must be called with interrupts disabled.
  */
-static int resume_device_early(struct device *dev)
+static int resume_device_noirq(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
 	TRACE_DEVICE(dev);
 	TRACE_RESUME(0);
 
-	if (dev->bus && dev->bus->resume_early) {
-		dev_dbg(dev, "EARLY resume\n");
-		error = dev->bus->resume_early(dev);
-	}
+	if (!dev->bus)
+		goto End;
 
+	pm_dev_dbg(dev, state, "EARLY ");
+	if (dev->bus->pm_noirq)
+		error = pm_op(dev, dev->bus->pm_noirq, state);
+	else if (dev->bus->resume_early)
+		error = dev->bus->resume_early(dev);
+ End:
 	TRACE_RESUME(error);
 	return error;
 }
@@ -158,7 +268,7 @@ static int resume_device_early(struct de
  *
  *	Must be called with interrupts disabled and only one CPU running.
  */
-static void dpm_power_up(void)
+static void dpm_power_up(pm_message_t state)
 {
 
 	while (!list_empty(&dpm_off_irq)) {
@@ -166,7 +276,7 @@ static void dpm_power_up(void)
 		struct device *dev = to_device(entry);
 
 		list_move_tail(entry, &dpm_off);
-		resume_device_early(dev);
+		resume_device_noirq(dev, state);
 	}
 }
 
@@ -178,19 +288,49 @@ static void dpm_power_up(void)
  *
  *	Must be called with interrupts disabled.
  */
-void device_power_up(void)
+void device_power_up(pm_message_t state)
 {
 	sysdev_resume();
-	dpm_power_up();
+	dpm_power_up(state);
 }
 EXPORT_SYMBOL_GPL(device_power_up);
 
 /**
+ *	complete_device - Complete a PM transition for given device
+ *	@dev:	Device.
+ *	@state:	Power transition we are completing.
+ */
+static void complete_device(struct device *dev, pm_message_t state)
+{
+	down(&dev->sem);
+
+	if (dev->bus) {
+		pm_dev_dbg(dev, state, "completing ");
+		if (dev->bus->pm && dev->bus->pm->complete)
+			dev->bus->pm->complete(dev);
+	}
+
+	if (dev->type) {
+		pm_dev_dbg(dev, state, "completing type ");
+		if (dev->type->pm && dev->type->pm->complete)
+			dev->type->pm->complete(dev);
+	}
+
+	if (dev->class) {
+		pm_dev_dbg(dev, state, "completing class ");
+		if (dev->class->pm && dev->class->pm->complete)
+			dev->class->pm->complete(dev);
+	}
+
+	up(&dev->sem);
+}
+
+/**
  *	resume_device - Restore state for one device.
  *	@dev:	Device.
- *
+ *	@state:	Operation to carry out.
  */
-static int resume_device(struct device *dev)
+static int resume_device(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
@@ -199,21 +339,34 @@ static int resume_device(struct device *
 
 	down(&dev->sem);
 
-	if (dev->bus && dev->bus->resume) {
-		dev_dbg(dev,"resuming\n");
-		error = dev->bus->resume(dev);
+	if (dev->bus) {
+		pm_dev_dbg(dev, state, "");
+		if (dev->bus->pm)
+			error = pm_op(dev, dev->bus->pm, state);
+		else if (dev->bus->resume)
+			error = dev->bus->resume(dev);
 	}
+	if (error)
+		goto End;
 
-	if (!error && dev->type && dev->type->resume) {
-		dev_dbg(dev,"resuming\n");
-		error = dev->type->resume(dev);
+	if (dev->type) {
+		pm_dev_dbg(dev, state, "type ");
+		if (dev->type->pm)
+			error = pm_op(dev, dev->type->pm, state);
+		else if (dev->type->resume)
+			error = dev->type->resume(dev);
 	}
+	if (error)
+		goto End;
 
-	if (!error && dev->class && dev->class->resume) {
-		dev_dbg(dev,"class resume\n");
-		error = dev->class->resume(dev);
+	if (dev->class) {
+		pm_dev_dbg(dev, state, "class ");
+		if (dev->class->pm)
+			error = pm_op(dev, dev->class->pm, state);
+		else if (dev->class->resume)
+			error = dev->class->resume(dev);
 	}
-
+ End:
 	up(&dev->sem);
 
 	TRACE_RESUME(error);
@@ -228,9 +381,9 @@ static int resume_device(struct device *
  *	went through the early resume.
  *
  *	Take devices from the dpm_off_list, resume them,
- *	and put them on the dpm_locked list.
+ *	and put them on the dpm_active list.
  */
-static void dpm_resume(void)
+static void dpm_resume(pm_message_t state)
 {
 	mutex_lock(&dpm_list_mtx);
 	all_sleeping = false;
@@ -239,9 +392,10 @@ static void dpm_resume(void)
 		struct device *dev = to_device(entry);
 
 		list_move_tail(entry, &dpm_active);
-		dev->power.sleeping = false;
+		dev->power.status = DPM_ON;
 		mutex_unlock(&dpm_list_mtx);
-		resume_device(dev);
+		resume_device(dev, state);
+		complete_device(dev, state);
 		mutex_lock(&dpm_list_mtx);
 	}
 	mutex_unlock(&dpm_list_mtx);
@@ -269,14 +423,15 @@ static void unregister_dropped_devices(v
 
 /**
  *	device_resume - Restore state of each device in system.
+ *	@state: Operation to carry out.
  *
  *	Resume all the devices, unlock them all, and allow new
  *	devices to be registered once again.
  */
-void device_resume(void)
+void device_resume(pm_message_t state)
 {
 	might_sleep();
-	dpm_resume();
+	dpm_resume(state);
 	unregister_dropped_devices();
 }
 EXPORT_SYMBOL_GPL(device_resume);
@@ -284,37 +439,43 @@ EXPORT_SYMBOL_GPL(device_resume);
 
 /*------------------------- Suspend routines -------------------------*/
 
-static inline char *suspend_verb(u32 event)
+/**
+ *	resume_event - return a PM message representing the resume event
+ *	               corresponding to given sleep state.
+ *	@sleep_state - PM message representing a sleep state
+ */
+static pm_message_t resume_event(pm_message_t sleep_state)
 {
-	switch (event) {
-	case PM_EVENT_SUSPEND:	return "suspend";
-	case PM_EVENT_FREEZE:	return "freeze";
-	case PM_EVENT_PRETHAW:	return "prethaw";
-	default:		return "(unknown suspend event)";
+	switch (sleep_state.event) {
+	case PM_EVENT_SUSPEND:
+		return PMSG_RESUME;
+	case PM_EVENT_FREEZE:
+	case PM_EVENT_QUIESCE:
+		return PMSG_RECOVER;
+	case PM_EVENT_HIBERNATE:
+		return PMSG_RESTORE;
 	}
-}
-
-static void
-suspend_device_dbg(struct device *dev, pm_message_t state, char *info)
-{
-	dev_dbg(dev, "%s%s%s\n", info, suspend_verb(state.event),
-		((state.event == PM_EVENT_SUSPEND) && device_may_wakeup(dev)) ?
-		", may wakeup" : "");
+	return PMSG_ON;
 }
 
 /**
- *	suspend_device_late - Shut down one device (late suspend).
+ *	suspend_device_noirq - Shut down one device (late suspend).
  *	@dev:	Device.
  *	@state:	Power state device is entering.
  *
  *	This is called with interrupts off and only a single CPU running.
  */
-static int suspend_device_late(struct device *dev, pm_message_t state)
+static int suspend_device_noirq(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
-	if (dev->bus && dev->bus->suspend_late) {
-		suspend_device_dbg(dev, state, "LATE ");
+	if (!dev->bus)
+		return 0;
+
+	pm_dev_dbg(dev, state, "LATE ");
+	if (dev->bus->pm_noirq) {
+		error = pm_op(dev, dev->bus->pm_noirq, state);
+	} else if (dev->bus->suspend_late) {
 		error = dev->bus->suspend_late(dev, state);
 		suspend_report_result(dev->bus->suspend_late, error);
 	}
@@ -339,7 +500,7 @@ int device_power_down(pm_message_t state
 		struct list_head *entry = dpm_off.prev;
 		struct device *dev = to_device(entry);
 
-		error = suspend_device_late(dev, state);
+		error = suspend_device_noirq(dev, state);
 		if (error) {
 			printk(KERN_ERR "Could not power down device %s: "
 					"error %d\n",
@@ -353,45 +514,100 @@ int device_power_down(pm_message_t state
 	if (!error)
 		error = sysdev_suspend(state);
 	if (error)
-		dpm_power_up();
+		dpm_power_up(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_power_down);
 
 /**
- *	suspend_device - Save state of one device.
+ *	prepare_device - Execute the ->prepare() callback(s) for given device.
  *	@dev:	Device.
- *	@state:	Power state device is entering.
+ *	@state: Operation we are preparing for.
  */
-static int suspend_device(struct device *dev, pm_message_t state)
+static int prepare_device(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
 	down(&dev->sem);
 
-	if (dev->power.power_state.event) {
-		dev_dbg(dev, "PM: suspend %d-->%d\n",
-			dev->power.power_state.event, state.event);
+	if (dev->bus) {
+		pm_dev_dbg(dev, state, "preparing ");
+		if (dev->bus->pm && dev->bus->pm->prepare) {
+			error = dev->bus->pm->prepare(dev);
+			suspend_report_result(dev->bus->pm->prepare, error);
+		}
 	}
+	if (error)
+		goto End;
 
-	if (dev->class && dev->class->suspend) {
-		suspend_device_dbg(dev, state, "class ");
-		error = dev->class->suspend(dev, state);
-		suspend_report_result(dev->class->suspend, error);
+	if (dev->type) {
+		pm_dev_dbg(dev, state, "preparing type ");
+		if (dev->type->pm && dev->type->pm->prepare) {
+			error = dev->type->pm->prepare(dev);
+			suspend_report_result(dev->type->pm->prepare, error);
+		}
 	}
+	if (error)
+		goto End;
 
-	if (!error && dev->type && dev->type->suspend) {
-		suspend_device_dbg(dev, state, "type ");
-		error = dev->type->suspend(dev, state);
-		suspend_report_result(dev->type->suspend, error);
+	if (dev->class) {
+		pm_dev_dbg(dev, state, "preparing class ");
+		if (dev->class->pm && dev->class->pm->prepare) {
+			error = dev->class->pm->prepare(dev);
+			suspend_report_result(dev->class->pm->prepare, error);
+		}
 	}
+ End:
+	up(&dev->sem);
+
+	return error;
+}
 
-	if (!error && dev->bus && dev->bus->suspend) {
-		suspend_device_dbg(dev, state, "");
-		error = dev->bus->suspend(dev, state);
-		suspend_report_result(dev->bus->suspend, error);
+/**
+ *	suspend_device - Save state of one device.
+ *	@dev:	Device.
+ *	@state:	Power state device is entering.
+ */
+static int suspend_device(struct device *dev, pm_message_t state)
+{
+	int error = 0;
+
+	down(&dev->sem);
+
+	if (dev->class) {
+		pm_dev_dbg(dev, state, "class ");
+		if (dev->class->pm) {
+			error = pm_op(dev, dev->class->pm, state);
+		} else if (dev->class->suspend) {
+			error = dev->class->suspend(dev, state);
+			suspend_report_result(dev->class->suspend, error);
+		}
+	}
+	if (error)
+		goto End;
+
+	if (dev->type) {
+		pm_dev_dbg(dev, state, "type ");
+		if (dev->type->pm) {
+			error = pm_op(dev, dev->type->pm, state);
+		} else if (dev->type->suspend) {
+			error = dev->type->suspend(dev, state);
+			suspend_report_result(dev->type->suspend, error);
+		}
 	}
+	if (error)
+		goto End;
 
+	if (dev->bus) {
+		pm_dev_dbg(dev, state, "");
+		if (dev->bus->pm) {
+			error = pm_op(dev, dev->bus->pm, state);
+		} else if (dev->bus->suspend) {
+			error = dev->bus->suspend(dev, state);
+			suspend_report_result(dev->bus->suspend, error);
+		}
+	}
+ End:
 	up(&dev->sem);
 
 	return error;
@@ -401,13 +617,8 @@ static int suspend_device(struct device 
  *	dpm_suspend - Suspend every device.
  *	@state:	Power state to put each device in.
  *
- *	Walk the dpm_locked list.  Suspend each device and move it
+ *	Walk the dpm_active list.  Suspend each device and move it
  *	to the dpm_off list.
- *
- *	(For historical reasons, if it returns -EAGAIN, that used to mean
- *	that the device would be called again with interrupts disabled.
- *	These days, we use the "suspend_late()" callback for that, so we
- *	print a warning and consider it an error).
  */
 static int dpm_suspend(pm_message_t state)
 {
@@ -418,22 +629,37 @@ static int dpm_suspend(pm_message_t stat
 		struct list_head *entry = dpm_active.prev;
 		struct device *dev = to_device(entry);
 
-		WARN_ON(dev->parent && dev->parent->power.sleeping);
+		WARN_ON(dev->parent && dev->parent->power.status == DPM_OFF);
+		dev->power.status = DPM_SUSPENDING;
+		mutex_unlock(&dpm_list_mtx);
+
+		error = prepare_device(dev, state);
+
+		mutex_lock(&dpm_list_mtx);
+		if (error == -EAGAIN)
+			dev->power.status = DPM_ON;
+		else if(error)
+			break;
+		if (dev->power.status == DPM_ON) {
+			mutex_unlock(&dpm_list_mtx);
+
+			complete_device(dev, resume_event(state));
 
-		dev->power.sleeping = true;
+			mutex_lock(&dpm_list_mtx);
+			continue;
+		}
+		dev->power.status = DPM_OFF;
 		mutex_unlock(&dpm_list_mtx);
+
 		error = suspend_device(dev, state);
+
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
-			printk(KERN_ERR "Could not suspend device %s: "
-					"error %d%s\n",
-					kobject_name(&dev->kobj),
-					error,
-					(error == -EAGAIN ?
-					" (please convert to suspend_late)" :
-					""));
-			dev->power.sleeping = false;
-			break;
+			dev->power.status = DPM_ON;
+			mutex_unlock(&dpm_list_mtx);
+
+			complete_device(dev, resume_event(state));
+			return error;
 		}
 		if (!list_empty(&dev->power.entry))
 			list_move(&dev->power.entry, &dpm_off);
@@ -449,8 +675,7 @@ static int dpm_suspend(pm_message_t stat
  *	device_suspend - Save state and stop all devices in system.
  *	@state: new power management state
  *
- *	Prevent new devices from being registered, then lock all devices
- *	and suspend them.
+ *	Prepare and suspend all devices.
  */
 int device_suspend(pm_message_t state)
 {
@@ -459,7 +684,7 @@ int device_suspend(pm_message_t state)
 	might_sleep();
 	error = dpm_suspend(state);
 	if (error)
-		device_resume();
+		device_resume(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_suspend);
Index: linux-2.6/include/linux/device.h
===================================================================
--- linux-2.6.orig/include/linux/device.h
+++ linux-2.6/include/linux/device.h
@@ -69,6 +69,9 @@ struct bus_type {
 	int (*resume_early)(struct device *dev);
 	int (*resume)(struct device *dev);
 
+	struct pm_ops *pm;
+	struct pm_ops *pm_noirq;
+
 	struct bus_type_private *p;
 };
 
@@ -202,6 +205,8 @@ struct class {
 
 	int (*suspend)(struct device *dev, pm_message_t state);
 	int (*resume)(struct device *dev);
+
+	struct pm_ops *pm;
 };
 
 extern int __must_check class_register(struct class *class);
@@ -345,8 +350,11 @@ struct device_type {
 	struct attribute_group **groups;
 	int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
 	void (*release)(struct device *dev);
+
 	int (*suspend)(struct device *dev, pm_message_t state);
 	int (*resume)(struct device *dev);
+
+	struct pm_ops *pm;
 };
 
 /* interface for exporting device attributes */
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -224,7 +224,7 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
-	device_power_up();
+	device_power_up(in_suspend ? PMSG_RECOVER : PMSG_RESTORE);
  Enable_irqs:
 	local_irq_enable();
 	return error;
@@ -280,7 +280,7 @@ int hibernation_snapshot(int platform_mo
  Finish:
 	platform_finish(platform_mode);
  Resume_devices:
-	device_resume();
+	device_resume(in_suspend ? PMSG_RECOVER : PMSG_RESTORE);
  Resume_console:
 	resume_console();
  Close:
@@ -301,7 +301,7 @@ static int resume_target_kernel(void)
 	int error;
 
 	local_irq_disable();
-	error = device_power_down(PMSG_PRETHAW);
+	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
@@ -329,7 +329,7 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
-	device_power_up();
+	device_power_up(PMSG_THAW);
  Enable_irqs:
 	local_irq_enable();
 	return error;
@@ -350,7 +350,7 @@ int hibernation_restore(int platform_mod
 
 	pm_prepare_console();
 	suspend_console();
-	error = device_suspend(PMSG_PRETHAW);
+	error = device_suspend(PMSG_QUIESCE);
 	if (error)
 		goto Finish;
 
@@ -362,7 +362,7 @@ int hibernation_restore(int platform_mod
 		enable_nonboot_cpus();
 	}
 	platform_restore_cleanup(platform_mode);
-	device_resume();
+	device_resume(PMSG_RECOVER);
  Finish:
 	resume_console();
 	pm_restore_console();
@@ -419,7 +419,7 @@ int hibernation_platform_enter(void)
  Finish:
 	hibernation_ops->finish();
  Resume_devices:
-	device_resume();
+	device_resume(PMSG_RESTORE);
  Resume_console:
 	resume_console();
  Close:
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -239,7 +239,7 @@ static int suspend_enter(suspend_state_t
 	if (!suspend_test(TEST_CORE))
 		error = suspend_ops->enter(state);
 
-	device_power_up();
+	device_power_up(PMSG_RESUME);
  Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
@@ -291,7 +291,7 @@ int suspend_devices_and_enter(suspend_st
 	if (suspend_ops->finish)
 		suspend_ops->finish();
  Resume_devices:
-	device_resume();
+	device_resume(PMSG_RESUME);
  Resume_console:
 	resume_console();
  Close:
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1221,9 +1221,9 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
-	device_power_up();
+	device_power_up(PMSG_RESUME);
 	local_irq_enable();
-	device_resume();
+	device_resume(PMSG_RESUME);
 	pm_send_all(PM_RESUME, (void *)0);
 	queue_event(APM_NORMAL_RESUME, NULL);
  out:

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] PM: Separate suspend and hibernation callbacks (highest level) - updated
  2008-03-09 13:28   ` Rafael J. Wysocki
@ 2008-03-09 19:56     ` Alan Stern
  2008-03-09 20:50       ` Rafael J. Wysocki
  0 siblings, 1 reply; 10+ messages in thread
From: Alan Stern @ 2008-03-09 19:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, ACPI Devel Maling List, Alexey Starikovskiy, Len Brown,
	Pavel Machek, David Brownell

On Sun, 9 Mar 2008, Rafael J. Wysocki wrote:

> > Didn't we decide it would be better to call @prepare in a separate
> > pass?
> 
> Afterwards I realized that it would be racy.  Namely, once a new device is
> registered, it may require ->prepare() to be called before ->suspend(), so if
> that happens after we've called ->prepare() for all devices, we'll be in
> trouble.

There's nothing wrong with doing it this way.  However the race you're 
worried about will never happen if drivers are written correctly.  Once 
prepare has been called for all devices, no new devices should be 
registered.

> > In any case, drivers aren't in a position to detect concurrent 
> > registrations.  The PM core shouldn't require them to return -EAGAIN; 
> > it can detect these registrations by itself.
> 
> In the appended new version of the patch, the core does it.  However, in theory
> there may be situations in which the core can't detect a race condition and
> recover from it, so I think it's reasonable to leave the possibility to signal
> such a condition by the driver.

I don't mind taking special action when drivers return -EAGAIN.  The 
point is that drivers shouldn't be _required_ to do it.

> > Finally, there needs to be a new "prepared" field added to dev_pm_info,
> > right next to the "sleeping" field.  Mention that it will be set prior
> > to calling @prepare and cleared prior to calling @complete.  Even
> > though it is read-only to drivers, they will find it useful.
> 
> Yes, I noticed that after sending the patch.  I handled it in a slightly
> different way, by converting 'sleeping' into 'status' that may assum one of
> the three values: DPM_ON, DPM_SUSPENDING, DPM_OFF.

Okay, that should work just as well.

> + * @prepare: Prepare the device for the upcoming transition, but do NOT change
> + *	its hardware state.  Prevent new children of the device from being
> + *	registered and prevent new calls to the probe method from being made
> + *	after @prepare() returns.  Also, if a concurrent child registration
> + *	or a call to the probe method already in progress is detected by
> + *	@prepare(), it may return -EAGAIN, so that the PM core can execute it
> + *	once again (e.g. after suspending the newly registered child) to recover
> + *	from the race condition. This method is executed for all kinds of
> + *	suspend transitions and is immediately followed by one of the suspend
> + *	callbacks: @suspend(), @freeze(), @poweroff(), or @quiesce().
> + *	[NOTE: The PM core is able to detect a registration of a new child of a
> + *	device concurrent with this device's @prepare() callback, in which case
> + *	@complete() will be called for the device and the core will try again to
> + *	suspend it later.  However, the core is NOT able to handle a call to
> + *	the probe method concurrent with @prepare().]

I think you can leave out this NOTE.  Direct probe method calls by 
drivers are pretty rare; almost all probing is done by the driver core.

> + * @complete: Undo the changes made by @prepare().  This method is executed for
> + *	all kinds of resume transitions, immediately following one of the resume
> + *	callbacks: @resume(), @thaw(), @restore(), or @recover().  Also executed
> + *	if a suspend callback (@suspend(), @freeze(), @poweroff(), @quiesce())
> + *	immediately following a successful @prepare() fails OR if @prepare()
> + *	returns -EAGAIN.

The last part is wrong.  If @prepare returns an error (such as -EAGAIN) 
then @complete shouldn't be called.  What you mean is that @complete 
will be invoked immediately following a successful @prepare if the PM 
core detects that a new child was registered while @prepare was 
running.

> +enum dpm_state {
> +	DPM_ON,
> +	DPM_SUSPENDING,
> +	DPM_OFF,
> +};

There should be kerneldoc for dpm_state.  Drivers may very well want to 
check dev->power.status to see whether a new child registration should 
be blocked.  The documentation should be very clear that 
dev->power.status doesn't describe the device's actual power state; 
rather it describes which method calls the PM core has made or is 
about to make.

> @@ -71,22 +71,29 @@ int device_pm_add(struct device *dev)
>  		 dev->bus ? dev->bus->name : "No Bus",
>  		 kobject_name(&dev->kobj));
>  	mutex_lock(&dpm_list_mtx);
> -	if ((dev->parent && dev->parent->power.sleeping) || all_sleeping) {
> -		if (dev->parent->power.sleeping)
> -			dev_warn(dev,
> -				"parent %s is sleeping, will not add\n",
> +	if (all_sleeping) {
> +		dev_warn(dev, "devices are sleeping, will not add\n");
> +		goto Failure;
> +	}
> +	if (dev->parent) {
> +		if (dev->parent->power.status == DPM_OFF) {
> +			dev_warn(dev, "parent %s is sleeping, will not add\n",
>  				dev->parent->bus_id);
> -		else
> -			dev_warn(dev, "devices are sleeping, will not add\n");
> -		WARN_ON(true);
> -		error = -EBUSY;
> -	} else {
> -		error = dpm_sysfs_add(dev);
> -		if (!error)
> -			list_add_tail(&dev->power.entry, &dpm_active);
> +			goto Failure;
> +		} else if (dev->parent->power.status == DPM_SUSPENDING) {
> +			dev->parent->power.status == DPM_ON;
> +		}

This part puzzled me until I read the rest of the patch.  It's a good
way to look specifically for new _child_ registrations as opposed to
any new registrations.

> @@ -418,22 +629,37 @@ static int dpm_suspend(pm_message_t stat
>  		struct list_head *entry = dpm_active.prev;
>  		struct device *dev = to_device(entry);
>  
> -		WARN_ON(dev->parent && dev->parent->power.sleeping);
> +		WARN_ON(dev->parent && dev->parent->power.status == DPM_OFF);
> +		dev->power.status = DPM_SUSPENDING;
> +		mutex_unlock(&dpm_list_mtx);
> +
> +		error = prepare_device(dev, state);
> +
> +		mutex_lock(&dpm_list_mtx);
> +		if (error == -EAGAIN)
> +			dev->power.status = DPM_ON;
> +		else if(error)
> +			break;

The last four lines should be:

		if (error) {
			dev->power.status = DPM_ON;
			if (error == -EAGAIN)
				continue;
			break;
		}

Insert:		/* Was a child added during prepare_device()? */
> +		if (dev->power.status == DPM_ON) {
> +			mutex_unlock(&dpm_list_mtx);
> +
> +			complete_device(dev, resume_event(state));
>  
> -		dev->power.sleeping = true;
> +			mutex_lock(&dpm_list_mtx);
> +			continue;
> +		}
> +		dev->power.status = DPM_OFF;
>  		mutex_unlock(&dpm_list_mtx);
> +
>  		error = suspend_device(dev, state);
> +
>  		mutex_lock(&dpm_list_mtx);
>  		if (error) {
> -			printk(KERN_ERR "Could not suspend device %s: "
> -					"error %d%s\n",
> -					kobject_name(&dev->kobj),
> -					error,
> -					(error == -EAGAIN ?
> -					" (please convert to suspend_late)" :
> -					""));
> -			dev->power.sleeping = false;
> -			break;
> +			dev->power.status = DPM_ON;
> +			mutex_unlock(&dpm_list_mtx);
> +
> +			complete_device(dev, resume_event(state));
> +			return error;

Early return?  I guess it's okay...

>  		}
>  		if (!list_empty(&dev->power.entry))
>  			list_move(&dev->power.entry, &dpm_off);

Looks good.

Alan Stern


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] PM: Separate suspend and hibernation callbacks (highest level) - updated
  2008-03-09 19:56     ` Alan Stern
@ 2008-03-09 20:50       ` Rafael J. Wysocki
  2008-03-10  2:52         ` Alan Stern
  0 siblings, 1 reply; 10+ messages in thread
From: Rafael J. Wysocki @ 2008-03-09 20:50 UTC (permalink / raw)
  To: Alan Stern
  Cc: pm list, ACPI Devel Maling List, Alexey Starikovskiy, Len Brown,
	Pavel Machek, David Brownell

On Sunday, 9 of March 2008, Alan Stern wrote:
> On Sun, 9 Mar 2008, Rafael J. Wysocki wrote:
> 
> > > Didn't we decide it would be better to call @prepare in a separate
> > > pass?
> > 
> > Afterwards I realized that it would be racy.  Namely, once a new device is
> > registered, it may require ->prepare() to be called before ->suspend(), so if
> > that happens after we've called ->prepare() for all devices, we'll be in
> > trouble.
> 
> There's nothing wrong with doing it this way.  However the race you're 
> worried about will never happen if drivers are written correctly.  Once 
> prepare has been called for all devices, no new devices should be 
> registered.

That only applies to the devices with parents.  Apart from this, the device
can be registered concurrently with its parent's ->prepare() and in that case
we should call ->complete() for the parent etc.  IOW, almost the same
complications as in dpm_suspend() would have to be taken into account.

[--snip--]
> > + * @prepare: Prepare the device for the upcoming transition, but do NOT change
> > + *	its hardware state.  Prevent new children of the device from being
> > + *	registered and prevent new calls to the probe method from being made
> > + *	after @prepare() returns.  Also, if a concurrent child registration
> > + *	or a call to the probe method already in progress is detected by
> > + *	@prepare(), it may return -EAGAIN, so that the PM core can execute it
> > + *	once again (e.g. after suspending the newly registered child) to recover
> > + *	from the race condition. This method is executed for all kinds of
> > + *	suspend transitions and is immediately followed by one of the suspend
> > + *	callbacks: @suspend(), @freeze(), @poweroff(), or @quiesce().
> > + *	[NOTE: The PM core is able to detect a registration of a new child of a
> > + *	device concurrent with this device's @prepare() callback, in which case
> > + *	@complete() will be called for the device and the core will try again to
> > + *	suspend it later.  However, the core is NOT able to handle a call to
> > + *	the probe method concurrent with @prepare().]
> 
> I think you can leave out this NOTE.  Direct probe method calls by 
> drivers are pretty rare; almost all probing is done by the driver core.

Done.

> > + * @complete: Undo the changes made by @prepare().  This method is executed for
> > + *	all kinds of resume transitions, immediately following one of the resume
> > + *	callbacks: @resume(), @thaw(), @restore(), or @recover().  Also executed
> > + *	if a suspend callback (@suspend(), @freeze(), @poweroff(), @quiesce())
> > + *	immediately following a successful @prepare() fails OR if @prepare()
> > + *	returns -EAGAIN.
> 
> The last part is wrong.

Yes, I noticed that too.

> If @prepare returns an error (such as -EAGAIN) then @complete shouldn't be called.

Hmm, I'd prefer to call it if ->prepare() fails too.  The -EAGAIN case is
special, because in fact there's no warranty that ->prepare() will be called
again (for example, the new child's ->suspend() may fail before that happens),
so we need to do the clean up after it.  In general, if ->complete() is assumed
to be called after a failing ->prepare() too, that simplifies the code a bit.

[--snip--]
> 
> > +enum dpm_state {
> > +	DPM_ON,
> > +	DPM_SUSPENDING,
> > +	DPM_OFF,
> > +};
> 
> There should be kerneldoc for dpm_state.

In fact there is one, right above the definition of 'enum dpm_state'.  It was
placed a bit confusingly under the big comment about the legacy stuff.
I changed the ordering of things in pm.h to avoid that confusion.

[--snip--]
> 
> > @@ -71,22 +71,29 @@ int device_pm_add(struct device *dev)
> >  		 dev->bus ? dev->bus->name : "No Bus",
> >  		 kobject_name(&dev->kobj));
> >  	mutex_lock(&dpm_list_mtx);
> > -	if ((dev->parent && dev->parent->power.sleeping) || all_sleeping) {
> > -		if (dev->parent->power.sleeping)
> > -			dev_warn(dev,
> > -				"parent %s is sleeping, will not add\n",
> > +	if (all_sleeping) {
> > +		dev_warn(dev, "devices are sleeping, will not add\n");
> > +		goto Failure;
> > +	}
> > +	if (dev->parent) {
> > +		if (dev->parent->power.status == DPM_OFF) {
> > +			dev_warn(dev, "parent %s is sleeping, will not add\n",
> >  				dev->parent->bus_id);
> > -		else
> > -			dev_warn(dev, "devices are sleeping, will not add\n");
> > -		WARN_ON(true);
> > -		error = -EBUSY;
> > -	} else {
> > -		error = dpm_sysfs_add(dev);
> > -		if (!error)
> > -			list_add_tail(&dev->power.entry, &dpm_active);
> > +			goto Failure;
> > +		} else if (dev->parent->power.status == DPM_SUSPENDING) {
> > +			dev->parent->power.status == DPM_ON;
> > +		}
> 
> This part puzzled me until I read the rest of the patch.  It's a good
> way to look specifically for new _child_ registrations as opposed to
> any new registrations.

I reworked that a bit.

> > @@ -418,22 +629,37 @@ static int dpm_suspend(pm_message_t stat
> >  		struct list_head *entry = dpm_active.prev;
> >  		struct device *dev = to_device(entry);
> >  
> > -		WARN_ON(dev->parent && dev->parent->power.sleeping);
> > +		WARN_ON(dev->parent && dev->parent->power.status == DPM_OFF);
> > +		dev->power.status = DPM_SUSPENDING;
> > +		mutex_unlock(&dpm_list_mtx);
> > +
> > +		error = prepare_device(dev, state);
> > +
> > +		mutex_lock(&dpm_list_mtx);
> > +		if (error == -EAGAIN)
> > +			dev->power.status = DPM_ON;
> > +		else if(error)
> > +			break;
> 
> The last four lines should be:
> 
> 		if (error) {
> 			dev->power.status = DPM_ON;
> 			if (error == -EAGAIN)
> 				continue;
> 			break;
> 		}

Well, I reworked this part too, but in a different way.

> Insert:		/* Was a child added during prepare_device()? */

I added a different comment here.

> > +		if (dev->power.status == DPM_ON) {
> > +			mutex_unlock(&dpm_list_mtx);
> > +
> > +			complete_device(dev, resume_event(state));
> >  
> > -		dev->power.sleeping = true;
> > +			mutex_lock(&dpm_list_mtx);
> > +			continue;
> > +		}
> > +		dev->power.status = DPM_OFF;
> >  		mutex_unlock(&dpm_list_mtx);
> > +
> >  		error = suspend_device(dev, state);
> > +
> >  		mutex_lock(&dpm_list_mtx);
> >  		if (error) {
> > -			printk(KERN_ERR "Could not suspend device %s: "
> > -					"error %d%s\n",
> > -					kobject_name(&dev->kobj),
> > -					error,
> > -					(error == -EAGAIN ?
> > -					" (please convert to suspend_late)" :
> > -					""));
> > -			dev->power.sleeping = false;
> > -			break;
> > +			dev->power.status = DPM_ON;
> > +			mutex_unlock(&dpm_list_mtx);
> > +
> > +			complete_device(dev, resume_event(state));
> > +			return error;
> 
> Early return?  I guess it's okay...
> 
> >  		}
> >  		if (!list_empty(&dev->power.entry))
> >  			list_move(&dev->power.entry, &dpm_off);
> 
> Looks good.

Well, I reworked it somewhat in the meantime.

First, I moved the pm_dev_dbg() stuff into the inner if () blocks.
Second, I made it possible to use ->prepare() and ->complete() in the
"noirq" case as well and added a comment about the "noirq" callbacks.
Finally, I cut some rough edges here and there (at least I hope so).

The new patch is appended.

Thanks,
Rafael


---

 arch/x86/kernel/apm_32.c  |    4 
 drivers/base/power/main.c |  463 +++++++++++++++++++++++++++++++++++-----------
 include/linux/device.h    |    8 
 include/linux/pm.h        |  238 ++++++++++++++++++++---
 kernel/power/disk.c       |   14 -
 kernel/power/main.c       |    4 
 6 files changed, 579 insertions(+), 152 deletions(-)

Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -110,11 +110,9 @@ extern void (*pm_power_off_prepare)(void
 
 struct device;
 
-typedef struct pm_message {
-	int event;
-} pm_message_t;
-
-/*
+/**
+ * struct pm_ops - device PM callbacks
+ *
  * Several driver power state transitions are externally visible, affecting
  * the state of pending I/O queues and (for drivers that touch hardware)
  * interrupts, wakeups, DMA, and other hardware state.  There may also be
@@ -122,6 +120,206 @@ typedef struct pm_message {
  * to the rest of the driver stack (such as a driver that's ON gating off
  * clocks which are not in active use).
  *
+ * The externally visible transitions are handled with the help of the following
+ * callbacks included in this structure:
+ *
+ * @prepare: Prepare the device for the upcoming transition, but do NOT change
+ *	its hardware state.  Prevent new children of the device from being
+ *	registered and prevent new calls to the probe method from being made
+ *	after @prepare() returns.  Also, if a concurrent child registration
+ *	or a call to the probe method already in progress is detected by
+ *	@prepare(), it may return -EAGAIN, so that the PM core can execute it
+ *	once again (e.g. after suspending the newly registered child) to recover
+ *	from the race condition. This method is executed for all kinds of
+ *	suspend transitions and is immediately followed by one of the suspend
+ *	callbacks: @suspend(), @freeze(), @poweroff(), or @quiesce().
+ *
+ * @complete: Undo the changes made by @prepare().  This method is executed for
+ *	all kinds of resume transitions, immediately following one of the resume
+ *	callbacks: @resume(), @thaw(), @restore(), or @recover().  Also executed
+ *	if a suspend callback (@suspend(), @freeze(), @poweroff(), @quiesce())
+ *	immediately following a successful @prepare() fails OR if @prepare()
+ *	itself fails.  @complete() cannot fail.
+ *
+ * @suspend: Executed before putting the system into a sleep state in which the
+ *	contents of main memory are preserved.  Quiesce the device, put it into
+ *	a low power state appropriate for the upcoming system state (such as
+ *	PCI_D3hot), and enable wakeup events as appropriate.
+ *
+ * @resume: Executed after waking the system up from a sleep state in which the
+ *	contents of main memory were preserved.  Put the device into the
+ *	appropriate state, according to the information saved in memory by the
+ *	preceding @suspend().  The driver starts working again, responding to
+ *	hardware events and software requests.  The hardware may have gone
+ *	through a power-off reset, or it may have maintained state from the
+ *	previous suspend() which the driver may rely on while resuming.  On most
+ *	platforms, there are no restrictions on availability of resources like
+ *	clocks during @resume().
+ *
+ * @freeze: Hibernation-specific, executed before creating a hibernation image.
+ *	Quiesce operations so that a consistent image can be created, but do NOT
+ *	otherwise put the device into a low power device state and do NOT emit
+ *	system wakeup events.  Save in main memory the device settings to be
+ *	used by @restore() during the subsequent resume from hibernation.
+ *
+ * @thaw: Hibernation-specific, executed after creating a hibernation image.
+ *	Undo the changes made by the preceding @freeze(), so the device can be
+ *	operated in the same was as immediately before the call to @freeze().
+ *
+ * @poweroff: Hibernation-specific, executed after saving a hibernation image.
+ *	Quiesce the device, put it into a low power state appropriate for the
+ *	upcoming system state (such as PCI_D3hot), and enable wakeup events as
+ *	appropriate.
+ *
+ * @quiesce: Hibernation-specific, executed after loading a hibernation image
+ *	and before restoring the contents of main memory from it.  Quiesce
+ *	operations so that the contents of main memory can be restored from the
+ *	image in a consistent way, but do NOT otherwise put the device into a
+ *	low power state and do NOT emit system wakeup events.  Do NOT save any
+ *	device settings in main memory.
+ *
+ * @restore: Hibernation-specific, executed after restoring the contents of main
+ *	memory from a hibernation image.  Driver starts working again,
+ *	responding to hardware events and software requests.  Do NOT make ANY
+ *	assumptions about the hardware state right prior to @restore().  On most
+ *	platforms, there are no restrictions on availability of resources like
+ *	clocks during @restore().
+ *
+ * @recover: Hibernation-specific, executed after a failing creation of a
+ *	hibernation image OR after a failing attempt to restore the contents of
+ *	main memory from such an image.  Undo the changes made by the preceding
+ *	@freeze() or @quiesce(), so the device can be operated in the same way
+ *	as immediately before the failing transition.
+ *
+ * struct pm_ops is also used for defining driver PM operations to be carried
+ * out with interrupts disabled.  In this case, however, there is only one
+ * active CPU while the operations are being performed, so they are executed by
+ * the PM core in a more straightforward way.  In particular, the failure of
+ * @prepare() carried out with interrupts disabled causes the entire PM
+ * transition to fail, regardless of the error code returned by it.
+ */
+
+struct pm_ops {
+#ifdef CONFIG_PM_SLEEP
+	int (*prepare)(struct device *dev);
+	void (*complete)(struct device *dev);
+#endif
+#ifdef CONFIG_SUSPEND
+	int (*suspend)(struct device *dev);
+	int (*resume)(struct device *dev);
+#endif
+#ifdef CONFIG_HIBERNATION
+	int (*freeze)(struct device *dev);
+	int (*thaw)(struct device *dev);
+	int (*poweroff)(struct device *dev);
+	int (*quiesce)(struct device *dev);
+	int (*restore)(struct device *dev);
+	int (*recover)(struct device *dev);
+#endif
+};
+
+/**
+ * PM_EVENT_ messages
+ *
+ * The following PM_EVENT_ messages are defined for the internal use of the PM
+ * core, in order to provide a mechanism allowing the high level suspend and
+ * hibernation code to convey the necessary information to the device PM core
+ * code:
+ *
+ * ON		No transition.
+ *
+ * FREEZE 	System is going to hibernate, call ->prepare() and ->freeze()
+ *		for all devices.
+ *
+ * SUSPEND	System is going to suspend, call ->prepare() and ->suspend()
+ *		for all devices.
+ *
+ * HIBERNATE	Hibernation image has been saved, call ->prepare() and
+ *		->poweroff() for all devices.
+ *
+ * QUIESCE	Contents of main memory are going to be restored from a (loaded)
+ *		hibernation image, call ->prepare() and ->quiesce() for all
+ *		devices.
+ *
+ * RESUME	System is resuming, call ->resume() and ->complete() for all
+ *		devices.
+ *
+ * THAW		Hibernation image has been created, call ->thaw() and
+ *		->complete() for all devices.
+ *
+ * RESTORE	Contents of main memory have been restored from a hibernation
+ *		image, call ->restore() and ->complete() for all devices.
+ *
+ * RECOVER	Creation of a hibernation image or restoration of the main
+ *		memory contents from a hibernation image has failed, call
+ *		->recover() and ->complete() for all devices.
+ */
+
+typedef struct pm_message {
+	int event;
+} pm_message_t;
+
+#define PM_EVENT_ON		0x0000
+#define PM_EVENT_FREEZE 	0x0001
+#define PM_EVENT_SUSPEND	0x0002
+#define PM_EVENT_HIBERNATE	0x0004
+#define PM_EVENT_QUIESCE	0x0008
+#define PM_EVENT_RESUME		0x0010
+#define PM_EVENT_THAW		0x0020
+#define PM_EVENT_RESTORE	0x0040
+#define PM_EVENT_RECOVER	0x0080
+
+#define PM_EVENT_SLEEP	(PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE)
+
+#define PMSG_FREEZE	((struct pm_message){ .event = PM_EVENT_FREEZE, })
+#define PMSG_QUIESCE	((struct pm_message){ .event = PM_EVENT_QUIESCE, })
+#define PMSG_SUSPEND	((struct pm_message){ .event = PM_EVENT_SUSPEND, })
+#define PMSG_HIBERNATE	((struct pm_message){ .event = PM_EVENT_HIBERNATE, })
+#define PMSG_RESUME	((struct pm_message){ .event = PM_EVENT_RESUME, })
+#define PMSG_THAW	((struct pm_message){ .event = PM_EVENT_THAW, })
+#define PMSG_RESTORE	((struct pm_message){ .event = PM_EVENT_RESTORE, })
+#define PMSG_RECOVER	((struct pm_message){ .event = PM_EVENT_RECOVER, })
+#define PMSG_ON		((struct pm_message){ .event = PM_EVENT_ON, })
+
+/**
+ * Device power management states
+ *
+ * These state labels are used internally by the PM core to indicate the current
+ * status of a device with respect to the PM core operations.
+ *
+ * DPM_ON		Device is regarded as operational.
+ *
+ * DPM_SUSPENDING	Device is currently being suspended.
+ *
+ * DPM_OFF		Device is regarded as suspended.
+ */
+
+enum dpm_state {
+	DPM_ON,
+	DPM_SUSPENDING,
+	DPM_OFF,
+};
+
+struct dev_pm_info {
+	pm_message_t		power_state;
+	unsigned		can_wakeup:1;
+	enum dpm_state		status:2;	/* Owned by the PM core */
+#ifdef	CONFIG_PM_SLEEP
+	unsigned		should_wakeup:1;
+	struct list_head	entry;
+#endif
+};
+
+/*
+ * The PM_EVENT_ messages are also used by drivers implementing the legacy
+ * suspend framework, based on the ->suspend() and ->resume() callbacks common
+ * for suspend and hibernation transitions, according to the rules below.
+ */
+
+/* Necessary, because several drivers use PM_EVENT_PRETHAW */
+#define PM_EVENT_PRETHAW PM_EVENT_QUIESCE
+
+/*
  * One transition is triggered by resume(), after a suspend() call; the
  * message is implicit:
  *
@@ -166,35 +364,11 @@ typedef struct pm_message {
  * or from system low-power states such as standby or suspend-to-RAM.
  */
 
-#define PM_EVENT_ON 0
-#define PM_EVENT_FREEZE 1
-#define PM_EVENT_SUSPEND 2
-#define PM_EVENT_HIBERNATE 4
-#define PM_EVENT_PRETHAW 8
-
-#define PM_EVENT_SLEEP	(PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE)
-
-#define PMSG_FREEZE	((struct pm_message){ .event = PM_EVENT_FREEZE, })
-#define PMSG_PRETHAW	((struct pm_message){ .event = PM_EVENT_PRETHAW, })
-#define PMSG_SUSPEND	((struct pm_message){ .event = PM_EVENT_SUSPEND, })
-#define PMSG_HIBERNATE	((struct pm_message){ .event = PM_EVENT_HIBERNATE, })
-#define PMSG_ON		((struct pm_message){ .event = PM_EVENT_ON, })
-
-struct dev_pm_info {
-	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	bool			sleeping:1;	/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
-	unsigned		should_wakeup:1;
-	struct list_head	entry;
-#endif
-};
+#ifdef CONFIG_PM_SLEEP
+extern void device_power_up(pm_message_t state);
+extern void device_resume(pm_message_t state);
 
 extern int device_power_down(pm_message_t state);
-extern void device_power_up(void);
-extern void device_resume(void);
-
-#ifdef CONFIG_PM_SLEEP
 extern int device_suspend(pm_message_t state);
 extern int device_prepare_suspend(pm_message_t state);
 
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -71,22 +71,30 @@ int device_pm_add(struct device *dev)
 		 dev->bus ? dev->bus->name : "No Bus",
 		 kobject_name(&dev->kobj));
 	mutex_lock(&dpm_list_mtx);
-	if ((dev->parent && dev->parent->power.sleeping) || all_sleeping) {
-		if (dev->parent->power.sleeping)
-			dev_warn(dev,
-				"parent %s is sleeping, will not add\n",
+	if (all_sleeping) {
+		dev_warn(dev, "all devices are sleeping, will not add\n");
+		goto Refuse;
+	}
+	if (dev->parent)
+		switch (dev->parent->power.status) {
+		case DPM_OFF:
+			dev_warn(dev, "parent %s is sleeping, will not add\n",
 				dev->parent->bus_id);
-		else
-			dev_warn(dev, "devices are sleeping, will not add\n");
-		WARN_ON(true);
-		error = -EBUSY;
-	} else {
-		error = dpm_sysfs_add(dev);
-		if (!error)
-			list_add_tail(&dev->power.entry, &dpm_active);
-	}
+			goto Refuse;
+		case DPM_SUSPENDING:
+			dev->parent->power.status = DPM_ON;
+			break;
+		}
+	error = dpm_sysfs_add(dev);
+	if (!error)
+		list_add_tail(&dev->power.entry, &dpm_active);
+ End:
 	mutex_unlock(&dpm_list_mtx);
 	return error;
+ Refuse:
+	WARN_ON(true);
+	error = -EBUSY;
+	goto End;
 }
 
 /**
@@ -124,26 +132,135 @@ void device_pm_schedule_removal(struct d
 }
 EXPORT_SYMBOL_GPL(device_pm_schedule_removal);
 
+/**
+ *	pm_op - execute the PM operation appropiate for given PM event
+ *	@dev:	Device.
+ *	@ops:	PM operations to choose from.
+ *	@state:	PM event message.
+ */
+static int pm_op(struct device *dev, struct pm_ops *ops, pm_message_t state)
+{
+	int error = 0;
+
+	switch (state.event) {
+#ifdef CONFIG_SUSPEND
+	case PM_EVENT_SUSPEND:
+		if (ops->suspend) {
+			error = ops->suspend(dev);
+			suspend_report_result(ops->suspend, error);
+		}
+		break;
+	case PM_EVENT_RESUME:
+		if (ops->resume) {
+			error = ops->resume(dev);
+			suspend_report_result(ops->resume, error);
+		}
+		break;
+#endif /* CONFIG_SUSPEND */
+#ifdef CONFIG_HIBERNATION
+	case PM_EVENT_FREEZE:
+		if (ops->freeze) {
+			error = ops->freeze(dev);
+			suspend_report_result(ops->freeze, error);
+		}
+		break;
+	case PM_EVENT_QUIESCE:
+		if (ops->quiesce) {
+			error = ops->quiesce(dev);
+			suspend_report_result(ops->quiesce, error);
+		}
+		break;
+	case PM_EVENT_HIBERNATE:
+		if (ops->poweroff) {
+			error = ops->poweroff(dev);
+			suspend_report_result(ops->poweroff, error);
+		}
+		break;
+	case PM_EVENT_THAW:
+		if (ops->thaw) {
+			error = ops->thaw(dev);
+			suspend_report_result(ops->thaw, error);
+		}
+		break;
+	case PM_EVENT_RESTORE:
+		if (ops->restore) {
+			error = ops->restore(dev);
+			suspend_report_result(ops->restore, error);
+		}
+		break;
+	case PM_EVENT_RECOVER:
+		if (ops->recover) {
+			error = ops->recover(dev);
+			suspend_report_result(ops->recover, error);
+		}
+		break;
+#endif /* CONFIG_HIBERNATION */
+	default:
+		error = -EINVAL;
+	}
+	return error;
+}
+
+static char *pm_verb(int event)
+{
+	switch (event) {
+	case PM_EVENT_SUSPEND:
+		return "suspend";
+	case PM_EVENT_RESUME:
+		return "resume";
+	case PM_EVENT_FREEZE:
+		return "freeze";
+	case PM_EVENT_QUIESCE:
+		return "quiesce";
+	case PM_EVENT_HIBERNATE:
+		return "hibernate";
+	case PM_EVENT_THAW:
+		return "thaw";
+	case PM_EVENT_RESTORE:
+		return "restore";
+	default:
+		return "(unknown PM event)";
+	}
+}
+
+static void pm_dev_dbg(struct device *dev, pm_message_t state, char *info)
+{
+	dev_dbg(dev, "%s%s%s\n", info, pm_verb(state.event),
+		((state.event & PM_EVENT_SLEEP) && device_may_wakeup(dev)) ?
+		", may wakeup" : "");
+}
+
 /*------------------------- Resume routines -------------------------*/
 
 /**
- *	resume_device_early - Power on one device (early resume).
+ *	resume_device_noirq - Power on one device (early resume).
  *	@dev:	Device.
+ *	@state: Operation to carry out.
  *
  *	Must be called with interrupts disabled.
  */
-static int resume_device_early(struct device *dev)
+static int resume_device_noirq(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
 	TRACE_DEVICE(dev);
 	TRACE_RESUME(0);
 
-	if (dev->bus && dev->bus->resume_early) {
-		dev_dbg(dev, "EARLY resume\n");
+	if (!dev->bus)
+		goto End;
+
+	if (dev->bus->pm_noirq) {
+		struct pm_ops *ops = dev->bus->pm_noirq;
+
+		pm_dev_dbg(dev, state, "EARLY ");
+		error = pm_op(dev, ops, state);
+		if (ops->complete)
+			ops->complete(dev);
+	} else if (dev->bus->resume_early) {
+		pm_dev_dbg(dev, state, "legacy EARLY ");
 		error = dev->bus->resume_early(dev);
 	}
-
+ End:
 	TRACE_RESUME(error);
 	return error;
 }
@@ -158,7 +275,7 @@ static int resume_device_early(struct de
  *
  *	Must be called with interrupts disabled and only one CPU running.
  */
-static void dpm_power_up(void)
+static void dpm_power_up(pm_message_t state)
 {
 
 	while (!list_empty(&dpm_off_irq)) {
@@ -166,7 +283,7 @@ static void dpm_power_up(void)
 		struct device *dev = to_device(entry);
 
 		list_move_tail(entry, &dpm_off);
-		resume_device_early(dev);
+		resume_device_noirq(dev, state);
 	}
 }
 
@@ -178,19 +295,46 @@ static void dpm_power_up(void)
  *
  *	Must be called with interrupts disabled.
  */
-void device_power_up(void)
+void device_power_up(pm_message_t state)
 {
 	sysdev_resume();
-	dpm_power_up();
+	dpm_power_up(state);
 }
 EXPORT_SYMBOL_GPL(device_power_up);
 
 /**
+ *	complete_device - Complete a PM transition for given device
+ *	@dev:	Device.
+ *	@state:	Power transition we are completing.
+ */
+static void complete_device(struct device *dev, pm_message_t state)
+{
+	down(&dev->sem);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->complete) {
+		pm_dev_dbg(dev, state, "completing ");
+		dev->bus->pm->complete(dev);
+	}
+
+	if (dev->type && dev->type->pm && dev->type->pm->complete) {
+		pm_dev_dbg(dev, state, "completing type ");
+		dev->type->pm->complete(dev);
+	}
+
+	if (dev->class && dev->class->pm && dev->class->pm->complete) {
+		pm_dev_dbg(dev, state, "completing class ");
+		dev->class->pm->complete(dev);
+	}
+
+	up(&dev->sem);
+}
+
+/**
  *	resume_device - Restore state for one device.
  *	@dev:	Device.
- *
+ *	@state:	Operation to carry out.
  */
-static int resume_device(struct device *dev)
+static int resume_device(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
@@ -199,21 +343,40 @@ static int resume_device(struct device *
 
 	down(&dev->sem);
 
-	if (dev->bus && dev->bus->resume) {
-		dev_dbg(dev,"resuming\n");
-		error = dev->bus->resume(dev);
+	if (dev->bus) {
+		if (dev->bus->pm) {
+			pm_dev_dbg(dev, state, "");
+			error = pm_op(dev, dev->bus->pm, state);
+		} else if (dev->bus->resume) {
+			pm_dev_dbg(dev, state, "legacy ");
+			error = dev->bus->resume(dev);
+		}
+		if (error)
+			goto End;
 	}
 
-	if (!error && dev->type && dev->type->resume) {
-		dev_dbg(dev,"resuming\n");
-		error = dev->type->resume(dev);
+	if (dev->type) {
+		if (dev->type->pm) {
+			pm_dev_dbg(dev, state, "type ");
+			error = pm_op(dev, dev->type->pm, state);
+		} else if (dev->type->resume) {
+			pm_dev_dbg(dev, state, "legacy type ");
+			error = dev->type->resume(dev);
+		}
+		if (error)
+			goto End;
 	}
 
-	if (!error && dev->class && dev->class->resume) {
-		dev_dbg(dev,"class resume\n");
-		error = dev->class->resume(dev);
+	if (dev->class) {
+		if (dev->class->pm) {
+			pm_dev_dbg(dev, state, "class ");
+			error = pm_op(dev, dev->class->pm, state);
+		} else if (dev->class->resume) {
+			pm_dev_dbg(dev, state, "legacy class ");
+			error = dev->class->resume(dev);
+		}
 	}
-
+ End:
 	up(&dev->sem);
 
 	TRACE_RESUME(error);
@@ -228,9 +391,9 @@ static int resume_device(struct device *
  *	went through the early resume.
  *
  *	Take devices from the dpm_off_list, resume them,
- *	and put them on the dpm_locked list.
+ *	and put them on the dpm_active list.
  */
-static void dpm_resume(void)
+static void dpm_resume(pm_message_t state)
 {
 	mutex_lock(&dpm_list_mtx);
 	all_sleeping = false;
@@ -239,9 +402,10 @@ static void dpm_resume(void)
 		struct device *dev = to_device(entry);
 
 		list_move_tail(entry, &dpm_active);
-		dev->power.sleeping = false;
+		dev->power.status = DPM_ON;
 		mutex_unlock(&dpm_list_mtx);
-		resume_device(dev);
+		resume_device(dev, state);
+		complete_device(dev, state);
 		mutex_lock(&dpm_list_mtx);
 	}
 	mutex_unlock(&dpm_list_mtx);
@@ -269,14 +433,15 @@ static void unregister_dropped_devices(v
 
 /**
  *	device_resume - Restore state of each device in system.
+ *	@state: Operation to carry out.
  *
  *	Resume all the devices, unlock them all, and allow new
  *	devices to be registered once again.
  */
-void device_resume(void)
+void device_resume(pm_message_t state)
 {
 	might_sleep();
-	dpm_resume();
+	dpm_resume(state);
 	unregister_dropped_devices();
 }
 EXPORT_SYMBOL_GPL(device_resume);
@@ -284,37 +449,51 @@ EXPORT_SYMBOL_GPL(device_resume);
 
 /*------------------------- Suspend routines -------------------------*/
 
-static inline char *suspend_verb(u32 event)
+/**
+ *	resume_event - return a PM message representing the resume event
+ *	               corresponding to given sleep state.
+ *	@sleep_state - PM message representing a sleep state
+ */
+static pm_message_t resume_event(pm_message_t sleep_state)
 {
-	switch (event) {
-	case PM_EVENT_SUSPEND:	return "suspend";
-	case PM_EVENT_FREEZE:	return "freeze";
-	case PM_EVENT_PRETHAW:	return "prethaw";
-	default:		return "(unknown suspend event)";
+	switch (sleep_state.event) {
+	case PM_EVENT_SUSPEND:
+		return PMSG_RESUME;
+	case PM_EVENT_FREEZE:
+	case PM_EVENT_QUIESCE:
+		return PMSG_RECOVER;
+	case PM_EVENT_HIBERNATE:
+		return PMSG_RESTORE;
 	}
-}
-
-static void
-suspend_device_dbg(struct device *dev, pm_message_t state, char *info)
-{
-	dev_dbg(dev, "%s%s%s\n", info, suspend_verb(state.event),
-		((state.event == PM_EVENT_SUSPEND) && device_may_wakeup(dev)) ?
-		", may wakeup" : "");
+	return PMSG_ON;
 }
 
 /**
- *	suspend_device_late - Shut down one device (late suspend).
+ *	suspend_device_noirq - Shut down one device (late suspend).
  *	@dev:	Device.
- *	@state:	Power state device is entering.
+ *	@state:	PM message representing the operation to perform.
  *
  *	This is called with interrupts off and only a single CPU running.
  */
-static int suspend_device_late(struct device *dev, pm_message_t state)
+static int suspend_device_noirq(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
-	if (dev->bus && dev->bus->suspend_late) {
-		suspend_device_dbg(dev, state, "LATE ");
+	if (!dev->bus)
+		return 0;
+
+	if (dev->bus->pm_noirq) {
+		struct pm_ops *ops = dev->bus->pm_noirq;
+
+		pm_dev_dbg(dev, state, "LATE ");
+		if (ops->prepare)
+			error = ops->prepare(dev);
+		if (!error)
+			error = pm_op(dev, ops, state);
+		if (error && ops->complete)
+			ops->complete(dev);
+	} else if (dev->bus->suspend_late) {
+		pm_dev_dbg(dev, state, "legacy LATE ");
 		error = dev->bus->suspend_late(dev, state);
 		suspend_report_result(dev->bus->suspend_late, error);
 	}
@@ -323,7 +502,7 @@ static int suspend_device_late(struct de
 
 /**
  *	device_power_down - Shut down special devices.
- *	@state:		Power state to enter.
+ *	@state:	PM message representing the operation to perform.
  *
  *	Power down devices that require interrupts to be disabled
  *	and move them from the dpm_off list to the dpm_off_irq list.
@@ -339,7 +518,7 @@ int device_power_down(pm_message_t state
 		struct list_head *entry = dpm_off.prev;
 		struct device *dev = to_device(entry);
 
-		error = suspend_device_late(dev, state);
+		error = suspend_device_noirq(dev, state);
 		if (error) {
 			printk(KERN_ERR "Could not power down device %s: "
 					"error %d\n",
@@ -353,12 +532,50 @@ int device_power_down(pm_message_t state
 	if (!error)
 		error = sysdev_suspend(state);
 	if (error)
-		dpm_power_up();
+		dpm_power_up(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_power_down);
 
 /**
+ *	prepare_device - Execute the ->prepare() callback(s) for given device.
+ *	@dev:	Device.
+ *	@state: PM operation we are preparing for.
+ */
+static int prepare_device(struct device *dev, pm_message_t state)
+{
+	int error = 0;
+
+	down(&dev->sem);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->prepare) {
+		pm_dev_dbg(dev, state, "preparing ");
+		error = dev->bus->pm->prepare(dev);
+		suspend_report_result(dev->bus->pm->prepare, error);
+		if (error)
+			goto End;
+	}
+
+	if (dev->type && dev->type->pm && dev->type->pm->prepare) {
+		pm_dev_dbg(dev, state, "preparing type ");
+		error = dev->type->pm->prepare(dev);
+		suspend_report_result(dev->type->pm->prepare, error);
+		if (error)
+			goto End;
+	}
+
+	if (dev->class && dev->class->pm && dev->class->pm->prepare) {
+		pm_dev_dbg(dev, state, "preparing class ");
+		error = dev->class->pm->prepare(dev);
+		suspend_report_result(dev->class->pm->prepare, error);
+	}
+ End:
+	up(&dev->sem);
+
+	return error;
+}
+
+/**
  *	suspend_device - Save state of one device.
  *	@dev:	Device.
  *	@state:	Power state device is entering.
@@ -369,29 +586,43 @@ static int suspend_device(struct device 
 
 	down(&dev->sem);
 
-	if (dev->power.power_state.event) {
-		dev_dbg(dev, "PM: suspend %d-->%d\n",
-			dev->power.power_state.event, state.event);
-	}
-
-	if (dev->class && dev->class->suspend) {
-		suspend_device_dbg(dev, state, "class ");
-		error = dev->class->suspend(dev, state);
-		suspend_report_result(dev->class->suspend, error);
+	if (dev->class) {
+		if (dev->class->pm) {
+			pm_dev_dbg(dev, state, "class ");
+			error = pm_op(dev, dev->class->pm, state);
+		} else if (dev->class->suspend) {
+			pm_dev_dbg(dev, state, "legacy class ");
+			error = dev->class->suspend(dev, state);
+			suspend_report_result(dev->class->suspend, error);
+		}
+		if (error)
+			goto End;
 	}
 
-	if (!error && dev->type && dev->type->suspend) {
-		suspend_device_dbg(dev, state, "type ");
-		error = dev->type->suspend(dev, state);
-		suspend_report_result(dev->type->suspend, error);
+	if (dev->type) {
+		if (dev->type->pm) {
+			pm_dev_dbg(dev, state, "type ");
+			error = pm_op(dev, dev->type->pm, state);
+		} else if (dev->type->suspend) {
+			pm_dev_dbg(dev, state, "legacy type ");
+			error = dev->type->suspend(dev, state);
+			suspend_report_result(dev->type->suspend, error);
+		}
+		if (error)
+			goto End;
 	}
 
-	if (!error && dev->bus && dev->bus->suspend) {
-		suspend_device_dbg(dev, state, "");
-		error = dev->bus->suspend(dev, state);
-		suspend_report_result(dev->bus->suspend, error);
+	if (dev->bus) {
+		if (dev->bus->pm) {
+			pm_dev_dbg(dev, state, "");
+			error = pm_op(dev, dev->bus->pm, state);
+		} else if (dev->bus->suspend) {
+			pm_dev_dbg(dev, state, "legacy ");
+			error = dev->bus->suspend(dev, state);
+			suspend_report_result(dev->bus->suspend, error);
+		}
 	}
-
+ End:
 	up(&dev->sem);
 
 	return error;
@@ -401,56 +632,70 @@ static int suspend_device(struct device 
  *	dpm_suspend - Suspend every device.
  *	@state:	Power state to put each device in.
  *
- *	Walk the dpm_locked list.  Suspend each device and move it
+ *	Walk the dpm_active list.  Suspend each device and move it
  *	to the dpm_off list.
- *
- *	(For historical reasons, if it returns -EAGAIN, that used to mean
- *	that the device would be called again with interrupts disabled.
- *	These days, we use the "suspend_late()" callback for that, so we
- *	print a warning and consider it an error).
  */
 static int dpm_suspend(pm_message_t state)
 {
-	int error = 0;
-
 	mutex_lock(&dpm_list_mtx);
 	while (!list_empty(&dpm_active)) {
 		struct list_head *entry = dpm_active.prev;
 		struct device *dev = to_device(entry);
+		int error;
+
+		WARN_ON(dev->parent && dev->parent->power.status == DPM_OFF);
+		dev->power.status = DPM_SUSPENDING;
+		mutex_unlock(&dpm_list_mtx);
+
+		error = prepare_device(dev, state);
 
-		WARN_ON(dev->parent && dev->parent->power.sleeping);
+		mutex_lock(&dpm_list_mtx);
+		if (error == -EAGAIN)
+			dev->power.status = DPM_ON;
+		else if(error)
+			goto Error;
+		if (dev->power.status == DPM_ON) {
+			/*
+			 * prepare_device() returned -EAGAIN or a child was
+			 * added during it
+			 */
+			mutex_unlock(&dpm_list_mtx);
+
+			complete_device(dev, resume_event(state));
 
-		dev->power.sleeping = true;
+			mutex_lock(&dpm_list_mtx);
+			continue;
+		}
+		dev->power.status = DPM_OFF;
 		mutex_unlock(&dpm_list_mtx);
+
 		error = suspend_device(dev, state);
+
 		mutex_lock(&dpm_list_mtx);
-		if (error) {
-			printk(KERN_ERR "Could not suspend device %s: "
-					"error %d%s\n",
-					kobject_name(&dev->kobj),
-					error,
-					(error == -EAGAIN ?
-					" (please convert to suspend_late)" :
-					""));
-			dev->power.sleeping = false;
-			break;
+		if (!error) {
+			if (!list_empty(&dev->power.entry))
+				list_move(&dev->power.entry, &dpm_off);
+			continue;
 		}
-		if (!list_empty(&dev->power.entry))
-			list_move(&dev->power.entry, &dpm_off);
+ Error:
+		printk(KERN_ERR "Could not suspend device %s: error %d\n",
+			kobject_name(&dev->kobj), error);
+		dev->power.status = DPM_ON;
+		mutex_unlock(&dpm_list_mtx);
+
+		complete_device(dev, resume_event(state));
+		return error;
 	}
-	if (!error)
-		all_sleeping = true;
+	all_sleeping = true;
 	mutex_unlock(&dpm_list_mtx);
-
-	return error;
+	return 0;
 }
 
 /**
  *	device_suspend - Save state and stop all devices in system.
  *	@state: new power management state
  *
- *	Prevent new devices from being registered, then lock all devices
- *	and suspend them.
+ *	Prepare and suspend all devices.
  */
 int device_suspend(pm_message_t state)
 {
@@ -459,7 +704,7 @@ int device_suspend(pm_message_t state)
 	might_sleep();
 	error = dpm_suspend(state);
 	if (error)
-		device_resume();
+		device_resume(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_suspend);
Index: linux-2.6/include/linux/device.h
===================================================================
--- linux-2.6.orig/include/linux/device.h
+++ linux-2.6/include/linux/device.h
@@ -69,6 +69,9 @@ struct bus_type {
 	int (*resume_early)(struct device *dev);
 	int (*resume)(struct device *dev);
 
+	struct pm_ops *pm;
+	struct pm_ops *pm_noirq;
+
 	struct bus_type_private *p;
 };
 
@@ -202,6 +205,8 @@ struct class {
 
 	int (*suspend)(struct device *dev, pm_message_t state);
 	int (*resume)(struct device *dev);
+
+	struct pm_ops *pm;
 };
 
 extern int __must_check class_register(struct class *class);
@@ -345,8 +350,11 @@ struct device_type {
 	struct attribute_group **groups;
 	int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
 	void (*release)(struct device *dev);
+
 	int (*suspend)(struct device *dev, pm_message_t state);
 	int (*resume)(struct device *dev);
+
+	struct pm_ops *pm;
 };
 
 /* interface for exporting device attributes */
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -224,7 +224,7 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
-	device_power_up();
+	device_power_up(in_suspend ? PMSG_RECOVER : PMSG_RESTORE);
  Enable_irqs:
 	local_irq_enable();
 	return error;
@@ -280,7 +280,7 @@ int hibernation_snapshot(int platform_mo
  Finish:
 	platform_finish(platform_mode);
  Resume_devices:
-	device_resume();
+	device_resume(in_suspend ? PMSG_RECOVER : PMSG_RESTORE);
  Resume_console:
 	resume_console();
  Close:
@@ -301,7 +301,7 @@ static int resume_target_kernel(void)
 	int error;
 
 	local_irq_disable();
-	error = device_power_down(PMSG_PRETHAW);
+	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
@@ -329,7 +329,7 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
-	device_power_up();
+	device_power_up(PMSG_THAW);
  Enable_irqs:
 	local_irq_enable();
 	return error;
@@ -350,7 +350,7 @@ int hibernation_restore(int platform_mod
 
 	pm_prepare_console();
 	suspend_console();
-	error = device_suspend(PMSG_PRETHAW);
+	error = device_suspend(PMSG_QUIESCE);
 	if (error)
 		goto Finish;
 
@@ -362,7 +362,7 @@ int hibernation_restore(int platform_mod
 		enable_nonboot_cpus();
 	}
 	platform_restore_cleanup(platform_mode);
-	device_resume();
+	device_resume(PMSG_RECOVER);
  Finish:
 	resume_console();
 	pm_restore_console();
@@ -419,7 +419,7 @@ int hibernation_platform_enter(void)
  Finish:
 	hibernation_ops->finish();
  Resume_devices:
-	device_resume();
+	device_resume(PMSG_RESTORE);
  Resume_console:
 	resume_console();
  Close:
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -239,7 +239,7 @@ static int suspend_enter(suspend_state_t
 	if (!suspend_test(TEST_CORE))
 		error = suspend_ops->enter(state);
 
-	device_power_up();
+	device_power_up(PMSG_RESUME);
  Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
@@ -291,7 +291,7 @@ int suspend_devices_and_enter(suspend_st
 	if (suspend_ops->finish)
 		suspend_ops->finish();
  Resume_devices:
-	device_resume();
+	device_resume(PMSG_RESUME);
  Resume_console:
 	resume_console();
  Close:
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1221,9 +1221,9 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
-	device_power_up();
+	device_power_up(PMSG_RESUME);
 	local_irq_enable();
-	device_resume();
+	device_resume(PMSG_RESUME);
 	pm_send_all(PM_RESUME, (void *)0);
 	queue_event(APM_NORMAL_RESUME, NULL);
  out:


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] PM: Separate suspend and hibernation callbacks (highest level) - updated
  2008-03-09 20:50       ` Rafael J. Wysocki
@ 2008-03-10  2:52         ` Alan Stern
  2008-03-10 16:58           ` Rafael J. Wysocki
  0 siblings, 1 reply; 10+ messages in thread
From: Alan Stern @ 2008-03-10  2:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, ACPI Devel Maling List, Alexey Starikovskiy, Len Brown,
	Pavel Machek, David Brownell

On Sun, 9 Mar 2008, Rafael J. Wysocki wrote:

> > If @prepare returns an error (such as -EAGAIN) then @complete shouldn't be called.
> 
> Hmm, I'd prefer to call it if ->prepare() fails too.  The -EAGAIN case is
> special, because in fact there's no warranty that ->prepare() will be called
> again (for example, the new child's ->suspend() may fail before that happens),
> so we need to do the clean up after it.  In general, if ->complete() is assumed
> to be called after a failing ->prepare() too, that simplifies the code a bit.

I'm quite certain this is a bad idea.

The kerneldoc says it all: @complete undoes the changes made by
@prepare.  If @prepare failed then it didn't make any changes (or else
it already undid any changes that were made), so @complete shouldn't be
called.  This is the way all programming interfaces should work.

Don't worry about complicating the core code a little bit.  It's just
in one spot; much more important is that the calling convention should
be clear.  My way is very simple: Every successful call to @prepare is
balanced by exactly one call to @complete.

> First, I moved the pm_dev_dbg() stuff into the inner if () blocks.
> Second, I made it possible to use ->prepare() and ->complete() in the
> "noirq" case as well and added a comment about the "noirq" callbacks.
> Finally, I cut some rough edges here and there (at least I hope so).

There's almost nothing left to comment on in the patch itself.  I'll 
have to try it out before giving any further feedback...

> + * @complete: Undo the changes made by @prepare().  This method is executed for
> + *	all kinds of resume transitions, immediately following one of the resume
> + *	callbacks: @resume(), @thaw(), @restore(), or @recover().  Also executed
> + *	if a suspend callback (@suspend(), @freeze(), @poweroff(), @quiesce())
> + *	immediately following a successful @prepare() fails OR if @prepare()
> + *	itself fails.  @complete() cannot fail.

Of course this comment should be changed to match the actual behavior.  
There's no need to mention that @complete cannot fail; this is obvious
because it returns void.

> + * @thaw: Hibernation-specific, executed after creating a hibernation image.
> + *	Undo the changes made by the preceding @freeze(), so the device can be
> + *	operated in the same was as immediately before the call to @freeze().

s/was/way/   I spotted this typo in @recover before but missed it here.

> +struct pm_ops {
> +#ifdef CONFIG_PM_SLEEP
> +	int (*prepare)(struct device *dev);
> +	void (*complete)(struct device *dev);
> +#endif
> +#ifdef CONFIG_SUSPEND
> +	int (*suspend)(struct device *dev);
> +	int (*resume)(struct device *dev);
> +#endif
> +#ifdef CONFIG_HIBERNATION
> +	int (*freeze)(struct device *dev);
> +	int (*thaw)(struct device *dev);
> +	int (*poweroff)(struct device *dev);
> +	int (*quiesce)(struct device *dev);
> +	int (*restore)(struct device *dev);
> +	int (*recover)(struct device *dev);
> +#endif

Mention perhaps that although the "resuming" operations (i.e.,
resume, thaw, restore, and recover) can return errors, the PM core
doesn't do anything about those errors other than print them in the
system log.  There's nothing it _can_ do...

> +/**
> + * Device power management states
> + *
> + * These state labels are used internally by the PM core to indicate the current
> + * status of a device with respect to the PM core operations.
> + *
> + * DPM_ON		Device is regarded as operational.

Add: Set this way initially and when ->resume(), ->thaw(), ->restore(), 
->recover(), or ->complete() is about to be called.  Also set when 
->prepare() fails.

> + *
> + * DPM_SUSPENDING	Device is currently being suspended.

Add: Set when ->prepare() is about to be called.

> + *
> + * DPM_OFF		Device is regarded as suspended.

Add: Set when ->suspend(), ->freeze(), ->poweroff(), or ->quiesce() is 
about to be called.

Alan Stern


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] PM: Separate suspend and hibernation callbacks (highest level) - updated
  2008-03-10  2:52         ` Alan Stern
@ 2008-03-10 16:58           ` Rafael J. Wysocki
  2008-03-10 18:54             ` Alan Stern
  2008-03-12  9:37             ` David Brownell
  0 siblings, 2 replies; 10+ messages in thread
From: Rafael J. Wysocki @ 2008-03-10 16:58 UTC (permalink / raw)
  To: Alan Stern
  Cc: pm list, ACPI Devel Maling List, Alexey Starikovskiy, Len Brown,
	Pavel Machek, David Brownell

On Monday, 10 of March 2008, Alan Stern wrote:
> On Sun, 9 Mar 2008, Rafael J. Wysocki wrote:
> 
> > > If @prepare returns an error (such as -EAGAIN) then @complete shouldn't be called.
> > 
> > Hmm, I'd prefer to call it if ->prepare() fails too.  The -EAGAIN case is
> > special, because in fact there's no warranty that ->prepare() will be called
> > again (for example, the new child's ->suspend() may fail before that happens),
> > so we need to do the clean up after it.  In general, if ->complete() is assumed
> > to be called after a failing ->prepare() too, that simplifies the code a bit.
> 
> I'm quite certain this is a bad idea.
> 
> The kerneldoc says it all: @complete undoes the changes made by
> @prepare.  If @prepare failed then it didn't make any changes (or else
> it already undid any changes that were made), so @complete shouldn't be
> called.  This is the way all programming interfaces should work.
> 
> Don't worry about complicating the core code a little bit.  It's just
> in one spot; much more important is that the calling convention should
> be clear.  My way is very simple: Every successful call to @prepare is
> balanced by exactly one call to @complete.

Okay, I reworked the patch so that ->complete() is not called after a failing
->prepare() (even if that returns -EAGAIN).  For this purpose I had to rework
dpm_suspend() quite heavily (once again).  Hopefully, I didn't break it.

> > First, I moved the pm_dev_dbg() stuff into the inner if () blocks.
> > Second, I made it possible to use ->prepare() and ->complete() in the
> > "noirq" case as well and added a comment about the "noirq" callbacks.
> > Finally, I cut some rough edges here and there (at least I hope so).
> 
> There's almost nothing left to comment on in the patch itself.  I'll 
> have to try it out before giving any further feedback...

Well, I'm going to develop some test code for checking all of the error paths
etc.
 
> > + * @complete: Undo the changes made by @prepare().  This method is executed for
> > + *	all kinds of resume transitions, immediately following one of the resume
> > + *	callbacks: @resume(), @thaw(), @restore(), or @recover().  Also executed
> > + *	if a suspend callback (@suspend(), @freeze(), @poweroff(), @quiesce())
> > + *	immediately following a successful @prepare() fails OR if @prepare()
> > + *	itself fails.  @complete() cannot fail.
> 
> Of course this comment should be changed to match the actual behavior.  
> There's no need to mention that @complete cannot fail; this is obvious
> because it returns void.

Corrected.

> > + * @thaw: Hibernation-specific, executed after creating a hibernation image.
> > + *	Undo the changes made by the preceding @freeze(), so the device can be
> > + *	operated in the same was as immediately before the call to @freeze().
> 
> s/was/way/   I spotted this typo in @recover before but missed it here.
> 
> > +struct pm_ops {
> > +#ifdef CONFIG_PM_SLEEP
> > +	int (*prepare)(struct device *dev);
> > +	void (*complete)(struct device *dev);
> > +#endif
> > +#ifdef CONFIG_SUSPEND
> > +	int (*suspend)(struct device *dev);
> > +	int (*resume)(struct device *dev);
> > +#endif
> > +#ifdef CONFIG_HIBERNATION
> > +	int (*freeze)(struct device *dev);
> > +	int (*thaw)(struct device *dev);
> > +	int (*poweroff)(struct device *dev);
> > +	int (*quiesce)(struct device *dev);
> > +	int (*restore)(struct device *dev);
> > +	int (*recover)(struct device *dev);
> > +#endif
> 
> Mention perhaps that although the "resuming" operations (i.e.,
> resume, thaw, restore, and recover) can return errors, the PM core
> doesn't do anything about those errors other than print them in the
> system log.  There's nothing it _can_ do...

Added such paragraph to the comment describing the callbacks.

> > +/**
> > + * Device power management states
> > + *
> > + * These state labels are used internally by the PM core to indicate the current
> > + * status of a device with respect to the PM core operations.
> > + *
> > + * DPM_ON		Device is regarded as operational.
> 
> Add: Set this way initially and when ->resume(), ->thaw(), ->restore(), 
> ->recover(), or ->complete() is about to be called.  Also set when 
> ->prepare() fails.
> 
> > + *
> > + * DPM_SUSPENDING	Device is currently being suspended.
> 
> Add: Set when ->prepare() is about to be called.
> 
> > + *
> > + * DPM_OFF		Device is regarded as suspended.
> 
> Add: Set when ->suspend(), ->freeze(), ->poweroff(), or ->quiesce() is 
> about to be called.

All added.

The updated patch is appended.

Thanks,
Rafael


---

 arch/x86/kernel/apm_32.c  |    4 
 drivers/base/power/main.c |  453 +++++++++++++++++++++++++++++++++++-----------
 include/linux/device.h    |    8 
 include/linux/pm.h        |  249 ++++++++++++++++++++++---
 kernel/power/disk.c       |   14 -
 kernel/power/main.c       |    4 
 6 files changed, 587 insertions(+), 145 deletions(-)

Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -110,11 +110,9 @@ extern void (*pm_power_off_prepare)(void
 
 struct device;
 
-typedef struct pm_message {
-	int event;
-} pm_message_t;
-
-/*
+/**
+ * struct pm_ops - device PM callbacks
+ *
  * Several driver power state transitions are externally visible, affecting
  * the state of pending I/O queues and (for drivers that touch hardware)
  * interrupts, wakeups, DMA, and other hardware state.  There may also be
@@ -122,6 +120,217 @@ typedef struct pm_message {
  * to the rest of the driver stack (such as a driver that's ON gating off
  * clocks which are not in active use).
  *
+ * The externally visible transitions are handled with the help of the following
+ * callbacks included in this structure:
+ *
+ * @prepare: Prepare the device for the upcoming transition, but do NOT change
+ *	its hardware state.  Prevent new children of the device from being
+ *	registered and prevent new calls to the probe method from being made
+ *	after @prepare() returns.  Also, if a concurrent child registration
+ *	or a call to the probe method already in progress is detected by
+ *	@prepare(), it may return -EAGAIN, so that the PM core can execute it
+ *	once again (e.g. after suspending the newly registered child) to recover
+ *	from the race condition. This method is executed for all kinds of
+ *	suspend transitions and is immediately followed by one of the suspend
+ *	callbacks: @suspend(), @freeze(), @poweroff(), or @quiesce().
+ *
+ * @complete: Undo the changes made by @prepare().  This method is executed for
+ *	all kinds of resume transitions, immediately following one of the resume
+ *	callbacks: @resume(), @thaw(), @restore(), or @recover().  Also executed
+ *	if a suspend callback (@suspend(), @freeze(), @poweroff(), @quiesce())
+ *	immediately following a successful @prepare() fails OR if a new child
+ *	of the device has been registered during @prepare().
+ *
+ * @suspend: Executed before putting the system into a sleep state in which the
+ *	contents of main memory are preserved.  Quiesce the device, put it into
+ *	a low power state appropriate for the upcoming system state (such as
+ *	PCI_D3hot), and enable wakeup events as appropriate.
+ *
+ * @resume: Executed after waking the system up from a sleep state in which the
+ *	contents of main memory were preserved.  Put the device into the
+ *	appropriate state, according to the information saved in memory by the
+ *	preceding @suspend().  The driver starts working again, responding to
+ *	hardware events and software requests.  The hardware may have gone
+ *	through a power-off reset, or it may have maintained state from the
+ *	previous suspend() which the driver may rely on while resuming.  On most
+ *	platforms, there are no restrictions on availability of resources like
+ *	clocks during @resume().
+ *
+ * @freeze: Hibernation-specific, executed before creating a hibernation image.
+ *	Quiesce operations so that a consistent image can be created, but do NOT
+ *	otherwise put the device into a low power device state and do NOT emit
+ *	system wakeup events.  Save in main memory the device settings to be
+ *	used by @restore() during the subsequent resume from hibernation.
+ *
+ * @thaw: Hibernation-specific, executed after creating a hibernation image.
+ *	Undo the changes made by the preceding @freeze(), so the device can be
+ *	operated in the same way as immediately before the call to @freeze().
+ *
+ * @poweroff: Hibernation-specific, executed after saving a hibernation image.
+ *	Quiesce the device, put it into a low power state appropriate for the
+ *	upcoming system state (such as PCI_D3hot), and enable wakeup events as
+ *	appropriate.
+ *
+ * @quiesce: Hibernation-specific, executed after loading a hibernation image
+ *	and before restoring the contents of main memory from it.  Quiesce
+ *	operations so that the contents of main memory can be restored from the
+ *	image in a consistent way, but do NOT otherwise put the device into a
+ *	low power state and do NOT emit system wakeup events.  Do NOT save any
+ *	device settings in main memory.
+ *
+ * @restore: Hibernation-specific, executed after restoring the contents of main
+ *	memory from a hibernation image.  Driver starts working again,
+ *	responding to hardware events and software requests.  Do NOT make ANY
+ *	assumptions about the hardware state right prior to @restore().  On most
+ *	platforms, there are no restrictions on availability of resources like
+ *	clocks during @restore().
+ *
+ * @recover: Hibernation-specific, executed after a failing creation of a
+ *	hibernation image OR after a failing attempt to restore the contents of
+ *	main memory from such an image.  Undo the changes made by the preceding
+ *	@freeze() or @quiesce(), so the device can be operated in the same way
+ *	as immediately before the failing transition.
+ *
+ * struct pm_ops is also used for defining driver PM operations to be carried
+ * out with interrupts disabled.  In this case, however, there is only one
+ * active CPU while the operations are being performed, so they are executed by
+ * the PM core in a more straightforward way.  In particular, the failure of
+ * @prepare() carried out with interrupts disabled causes the entire PM
+ * transition to fail, regardless of the error code returned by it.
+ *
+ * All of the above callbacks, except for @complete(), return error codes.
+ * However, the error codes returned by resume operations, @resume(), @thaw(),
+ * @restore(), and @recover(), are only printed in the system logs, since the
+ * PM core cannot do anything else about them.
+ */
+
+struct pm_ops {
+#ifdef CONFIG_PM_SLEEP
+	int (*prepare)(struct device *dev);
+	void (*complete)(struct device *dev);
+#endif
+#ifdef CONFIG_SUSPEND
+	int (*suspend)(struct device *dev);
+	int (*resume)(struct device *dev);
+#endif
+#ifdef CONFIG_HIBERNATION
+	int (*freeze)(struct device *dev);
+	int (*thaw)(struct device *dev);
+	int (*poweroff)(struct device *dev);
+	int (*quiesce)(struct device *dev);
+	int (*restore)(struct device *dev);
+	int (*recover)(struct device *dev);
+#endif
+};
+
+/**
+ * PM_EVENT_ messages
+ *
+ * The following PM_EVENT_ messages are defined for the internal use of the PM
+ * core, in order to provide a mechanism allowing the high level suspend and
+ * hibernation code to convey the necessary information to the device PM core
+ * code:
+ *
+ * ON		No transition.
+ *
+ * FREEZE 	System is going to hibernate, call ->prepare() and ->freeze()
+ *		for all devices.
+ *
+ * SUSPEND	System is going to suspend, call ->prepare() and ->suspend()
+ *		for all devices.
+ *
+ * HIBERNATE	Hibernation image has been saved, call ->prepare() and
+ *		->poweroff() for all devices.
+ *
+ * QUIESCE	Contents of main memory are going to be restored from a (loaded)
+ *		hibernation image, call ->prepare() and ->quiesce() for all
+ *		devices.
+ *
+ * RESUME	System is resuming, call ->resume() and ->complete() for all
+ *		devices.
+ *
+ * THAW		Hibernation image has been created, call ->thaw() and
+ *		->complete() for all devices.
+ *
+ * RESTORE	Contents of main memory have been restored from a hibernation
+ *		image, call ->restore() and ->complete() for all devices.
+ *
+ * RECOVER	Creation of a hibernation image or restoration of the main
+ *		memory contents from a hibernation image has failed, call
+ *		->recover() and ->complete() for all devices.
+ */
+
+typedef struct pm_message {
+	int event;
+} pm_message_t;
+
+#define PM_EVENT_ON		0x0000
+#define PM_EVENT_FREEZE 	0x0001
+#define PM_EVENT_SUSPEND	0x0002
+#define PM_EVENT_HIBERNATE	0x0004
+#define PM_EVENT_QUIESCE	0x0008
+#define PM_EVENT_RESUME		0x0010
+#define PM_EVENT_THAW		0x0020
+#define PM_EVENT_RESTORE	0x0040
+#define PM_EVENT_RECOVER	0x0080
+
+#define PM_EVENT_SLEEP	(PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE)
+
+#define PMSG_FREEZE	((struct pm_message){ .event = PM_EVENT_FREEZE, })
+#define PMSG_QUIESCE	((struct pm_message){ .event = PM_EVENT_QUIESCE, })
+#define PMSG_SUSPEND	((struct pm_message){ .event = PM_EVENT_SUSPEND, })
+#define PMSG_HIBERNATE	((struct pm_message){ .event = PM_EVENT_HIBERNATE, })
+#define PMSG_RESUME	((struct pm_message){ .event = PM_EVENT_RESUME, })
+#define PMSG_THAW	((struct pm_message){ .event = PM_EVENT_THAW, })
+#define PMSG_RESTORE	((struct pm_message){ .event = PM_EVENT_RESTORE, })
+#define PMSG_RECOVER	((struct pm_message){ .event = PM_EVENT_RECOVER, })
+#define PMSG_ON		((struct pm_message){ .event = PM_EVENT_ON, })
+
+/**
+ * Device power management states
+ *
+ * These state labels are used internally by the PM core to indicate the current
+ * status of a device with respect to the PM core operations.
+ *
+ * DPM_ON		Device is regarded as operational.  Set this way
+ *			initially and when ->resume(), ->thaw(), ->restore(),
+ *			->recover(), or ->complete() is about to be called.
+ *			Also set when ->prepare() fails.
+ *
+ * DPM_SUSPENDING	Device is currently being suspended.  Set when
+ *			->prepare() is about to be called.
+ *
+ * DPM_OFF		Device is regarded as suspended.  Set when ->suspend(),
+ *			->freeze(), ->poweroff(), or ->quiesce() is about to be
+ *			called.
+ */
+
+enum dpm_state {
+	DPM_ON,
+	DPM_SUSPENDING,
+	DPM_OFF,
+};
+
+struct dev_pm_info {
+	pm_message_t		power_state;
+	unsigned		can_wakeup:1;
+	enum dpm_state		status:2;	/* Owned by the PM core */
+#ifdef	CONFIG_PM_SLEEP
+	unsigned		should_wakeup:1;
+	struct list_head	entry;
+#endif
+};
+
+/*
+ * The PM_EVENT_ messages are also used by drivers implementing the legacy
+ * suspend framework, based on the ->suspend() and ->resume() callbacks common
+ * for suspend and hibernation transitions, according to the rules below.
+ */
+
+/* Necessary, because several drivers use PM_EVENT_PRETHAW */
+#define PM_EVENT_PRETHAW PM_EVENT_QUIESCE
+
+/*
  * One transition is triggered by resume(), after a suspend() call; the
  * message is implicit:
  *
@@ -166,35 +375,11 @@ typedef struct pm_message {
  * or from system low-power states such as standby or suspend-to-RAM.
  */
 
-#define PM_EVENT_ON 0
-#define PM_EVENT_FREEZE 1
-#define PM_EVENT_SUSPEND 2
-#define PM_EVENT_HIBERNATE 4
-#define PM_EVENT_PRETHAW 8
-
-#define PM_EVENT_SLEEP	(PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE)
-
-#define PMSG_FREEZE	((struct pm_message){ .event = PM_EVENT_FREEZE, })
-#define PMSG_PRETHAW	((struct pm_message){ .event = PM_EVENT_PRETHAW, })
-#define PMSG_SUSPEND	((struct pm_message){ .event = PM_EVENT_SUSPEND, })
-#define PMSG_HIBERNATE	((struct pm_message){ .event = PM_EVENT_HIBERNATE, })
-#define PMSG_ON		((struct pm_message){ .event = PM_EVENT_ON, })
-
-struct dev_pm_info {
-	pm_message_t		power_state;
-	unsigned		can_wakeup:1;
-	bool			sleeping:1;	/* Owned by the PM core */
-#ifdef	CONFIG_PM_SLEEP
-	unsigned		should_wakeup:1;
-	struct list_head	entry;
-#endif
-};
+#ifdef CONFIG_PM_SLEEP
+extern void device_power_up(pm_message_t state);
+extern void device_resume(pm_message_t state);
 
 extern int device_power_down(pm_message_t state);
-extern void device_power_up(void);
-extern void device_resume(void);
-
-#ifdef CONFIG_PM_SLEEP
 extern int device_suspend(pm_message_t state);
 extern int device_prepare_suspend(pm_message_t state);
 
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -71,22 +71,30 @@ int device_pm_add(struct device *dev)
 		 dev->bus ? dev->bus->name : "No Bus",
 		 kobject_name(&dev->kobj));
 	mutex_lock(&dpm_list_mtx);
-	if ((dev->parent && dev->parent->power.sleeping) || all_sleeping) {
-		if (dev->parent->power.sleeping)
-			dev_warn(dev,
-				"parent %s is sleeping, will not add\n",
+	if (all_sleeping) {
+		dev_warn(dev, "all devices are sleeping, will not add\n");
+		goto Refuse;
+	}
+	if (dev->parent)
+		switch (dev->parent->power.status) {
+		case DPM_OFF:
+			dev_warn(dev, "parent %s is sleeping, will not add\n",
 				dev->parent->bus_id);
-		else
-			dev_warn(dev, "devices are sleeping, will not add\n");
-		WARN_ON(true);
-		error = -EBUSY;
-	} else {
-		error = dpm_sysfs_add(dev);
-		if (!error)
-			list_add_tail(&dev->power.entry, &dpm_active);
-	}
+			goto Refuse;
+		case DPM_SUSPENDING:
+			dev->parent->power.status = DPM_ON;
+			break;
+		}
+	error = dpm_sysfs_add(dev);
+	if (!error)
+		list_add_tail(&dev->power.entry, &dpm_active);
+ End:
 	mutex_unlock(&dpm_list_mtx);
 	return error;
+ Refuse:
+	WARN_ON(true);
+	error = -EBUSY;
+	goto End;
 }
 
 /**
@@ -124,26 +132,135 @@ void device_pm_schedule_removal(struct d
 }
 EXPORT_SYMBOL_GPL(device_pm_schedule_removal);
 
+/**
+ *	pm_op - execute the PM operation appropiate for given PM event
+ *	@dev:	Device.
+ *	@ops:	PM operations to choose from.
+ *	@state:	PM event message.
+ */
+static int pm_op(struct device *dev, struct pm_ops *ops, pm_message_t state)
+{
+	int error = 0;
+
+	switch (state.event) {
+#ifdef CONFIG_SUSPEND
+	case PM_EVENT_SUSPEND:
+		if (ops->suspend) {
+			error = ops->suspend(dev);
+			suspend_report_result(ops->suspend, error);
+		}
+		break;
+	case PM_EVENT_RESUME:
+		if (ops->resume) {
+			error = ops->resume(dev);
+			suspend_report_result(ops->resume, error);
+		}
+		break;
+#endif /* CONFIG_SUSPEND */
+#ifdef CONFIG_HIBERNATION
+	case PM_EVENT_FREEZE:
+		if (ops->freeze) {
+			error = ops->freeze(dev);
+			suspend_report_result(ops->freeze, error);
+		}
+		break;
+	case PM_EVENT_QUIESCE:
+		if (ops->quiesce) {
+			error = ops->quiesce(dev);
+			suspend_report_result(ops->quiesce, error);
+		}
+		break;
+	case PM_EVENT_HIBERNATE:
+		if (ops->poweroff) {
+			error = ops->poweroff(dev);
+			suspend_report_result(ops->poweroff, error);
+		}
+		break;
+	case PM_EVENT_THAW:
+		if (ops->thaw) {
+			error = ops->thaw(dev);
+			suspend_report_result(ops->thaw, error);
+		}
+		break;
+	case PM_EVENT_RESTORE:
+		if (ops->restore) {
+			error = ops->restore(dev);
+			suspend_report_result(ops->restore, error);
+		}
+		break;
+	case PM_EVENT_RECOVER:
+		if (ops->recover) {
+			error = ops->recover(dev);
+			suspend_report_result(ops->recover, error);
+		}
+		break;
+#endif /* CONFIG_HIBERNATION */
+	default:
+		error = -EINVAL;
+	}
+	return error;
+}
+
+static char *pm_verb(int event)
+{
+	switch (event) {
+	case PM_EVENT_SUSPEND:
+		return "suspend";
+	case PM_EVENT_RESUME:
+		return "resume";
+	case PM_EVENT_FREEZE:
+		return "freeze";
+	case PM_EVENT_QUIESCE:
+		return "quiesce";
+	case PM_EVENT_HIBERNATE:
+		return "hibernate";
+	case PM_EVENT_THAW:
+		return "thaw";
+	case PM_EVENT_RESTORE:
+		return "restore";
+	default:
+		return "(unknown PM event)";
+	}
+}
+
+static void pm_dev_dbg(struct device *dev, pm_message_t state, char *info)
+{
+	dev_dbg(dev, "%s%s%s\n", info, pm_verb(state.event),
+		((state.event & PM_EVENT_SLEEP) && device_may_wakeup(dev)) ?
+		", may wakeup" : "");
+}
+
 /*------------------------- Resume routines -------------------------*/
 
 /**
- *	resume_device_early - Power on one device (early resume).
+ *	resume_device_noirq - Power on one device (early resume).
  *	@dev:	Device.
+ *	@state: Operation to carry out.
  *
  *	Must be called with interrupts disabled.
  */
-static int resume_device_early(struct device *dev)
+static int resume_device_noirq(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
 	TRACE_DEVICE(dev);
 	TRACE_RESUME(0);
 
-	if (dev->bus && dev->bus->resume_early) {
-		dev_dbg(dev, "EARLY resume\n");
+	if (!dev->bus)
+		goto End;
+
+	if (dev->bus->pm_noirq) {
+		struct pm_ops *ops = dev->bus->pm_noirq;
+
+		pm_dev_dbg(dev, state, "EARLY ");
+		error = pm_op(dev, ops, state);
+		if (ops->complete)
+			ops->complete(dev);
+	} else if (dev->bus->resume_early) {
+		pm_dev_dbg(dev, state, "legacy EARLY ");
 		error = dev->bus->resume_early(dev);
 	}
-
+ End:
 	TRACE_RESUME(error);
 	return error;
 }
@@ -158,7 +275,7 @@ static int resume_device_early(struct de
  *
  *	Must be called with interrupts disabled and only one CPU running.
  */
-static void dpm_power_up(void)
+static void dpm_power_up(pm_message_t state)
 {
 
 	while (!list_empty(&dpm_off_irq)) {
@@ -166,7 +283,7 @@ static void dpm_power_up(void)
 		struct device *dev = to_device(entry);
 
 		list_move_tail(entry, &dpm_off);
-		resume_device_early(dev);
+		resume_device_noirq(dev, state);
 	}
 }
 
@@ -178,19 +295,46 @@ static void dpm_power_up(void)
  *
  *	Must be called with interrupts disabled.
  */
-void device_power_up(void)
+void device_power_up(pm_message_t state)
 {
 	sysdev_resume();
-	dpm_power_up();
+	dpm_power_up(state);
 }
 EXPORT_SYMBOL_GPL(device_power_up);
 
 /**
+ *	complete_device - Complete a PM transition for given device
+ *	@dev:	Device.
+ *	@state:	Power transition we are completing.
+ */
+static void complete_device(struct device *dev, pm_message_t state)
+{
+	down(&dev->sem);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->complete) {
+		pm_dev_dbg(dev, state, "completing ");
+		dev->bus->pm->complete(dev);
+	}
+
+	if (dev->type && dev->type->pm && dev->type->pm->complete) {
+		pm_dev_dbg(dev, state, "completing type ");
+		dev->type->pm->complete(dev);
+	}
+
+	if (dev->class && dev->class->pm && dev->class->pm->complete) {
+		pm_dev_dbg(dev, state, "completing class ");
+		dev->class->pm->complete(dev);
+	}
+
+	up(&dev->sem);
+}
+
+/**
  *	resume_device - Restore state for one device.
  *	@dev:	Device.
- *
+ *	@state:	Operation to carry out.
  */
-static int resume_device(struct device *dev)
+static int resume_device(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
@@ -199,21 +343,40 @@ static int resume_device(struct device *
 
 	down(&dev->sem);
 
-	if (dev->bus && dev->bus->resume) {
-		dev_dbg(dev,"resuming\n");
-		error = dev->bus->resume(dev);
+	if (dev->bus) {
+		if (dev->bus->pm) {
+			pm_dev_dbg(dev, state, "");
+			error = pm_op(dev, dev->bus->pm, state);
+		} else if (dev->bus->resume) {
+			pm_dev_dbg(dev, state, "legacy ");
+			error = dev->bus->resume(dev);
+		}
+		if (error)
+			goto End;
 	}
 
-	if (!error && dev->type && dev->type->resume) {
-		dev_dbg(dev,"resuming\n");
-		error = dev->type->resume(dev);
+	if (dev->type) {
+		if (dev->type->pm) {
+			pm_dev_dbg(dev, state, "type ");
+			error = pm_op(dev, dev->type->pm, state);
+		} else if (dev->type->resume) {
+			pm_dev_dbg(dev, state, "legacy type ");
+			error = dev->type->resume(dev);
+		}
+		if (error)
+			goto End;
 	}
 
-	if (!error && dev->class && dev->class->resume) {
-		dev_dbg(dev,"class resume\n");
-		error = dev->class->resume(dev);
+	if (dev->class) {
+		if (dev->class->pm) {
+			pm_dev_dbg(dev, state, "class ");
+			error = pm_op(dev, dev->class->pm, state);
+		} else if (dev->class->resume) {
+			pm_dev_dbg(dev, state, "legacy class ");
+			error = dev->class->resume(dev);
+		}
 	}
-
+ End:
 	up(&dev->sem);
 
 	TRACE_RESUME(error);
@@ -228,9 +391,9 @@ static int resume_device(struct device *
  *	went through the early resume.
  *
  *	Take devices from the dpm_off_list, resume them,
- *	and put them on the dpm_locked list.
+ *	and put them on the dpm_active list.
  */
-static void dpm_resume(void)
+static void dpm_resume(pm_message_t state)
 {
 	mutex_lock(&dpm_list_mtx);
 	all_sleeping = false;
@@ -239,9 +402,10 @@ static void dpm_resume(void)
 		struct device *dev = to_device(entry);
 
 		list_move_tail(entry, &dpm_active);
-		dev->power.sleeping = false;
+		dev->power.status = DPM_ON;
 		mutex_unlock(&dpm_list_mtx);
-		resume_device(dev);
+		resume_device(dev, state);
+		complete_device(dev, state);
 		mutex_lock(&dpm_list_mtx);
 	}
 	mutex_unlock(&dpm_list_mtx);
@@ -269,14 +433,15 @@ static void unregister_dropped_devices(v
 
 /**
  *	device_resume - Restore state of each device in system.
+ *	@state: Operation to carry out.
  *
  *	Resume all the devices, unlock them all, and allow new
  *	devices to be registered once again.
  */
-void device_resume(void)
+void device_resume(pm_message_t state)
 {
 	might_sleep();
-	dpm_resume();
+	dpm_resume(state);
 	unregister_dropped_devices();
 }
 EXPORT_SYMBOL_GPL(device_resume);
@@ -284,37 +449,52 @@ EXPORT_SYMBOL_GPL(device_resume);
 
 /*------------------------- Suspend routines -------------------------*/
 
-static inline char *suspend_verb(u32 event)
+/**
+ *	resume_event - return a PM message representing the resume event
+ *	               corresponding to given sleep state.
+ *	@sleep_state - PM message representing a sleep state
+ */
+static pm_message_t resume_event(pm_message_t sleep_state)
 {
-	switch (event) {
-	case PM_EVENT_SUSPEND:	return "suspend";
-	case PM_EVENT_FREEZE:	return "freeze";
-	case PM_EVENT_PRETHAW:	return "prethaw";
-	default:		return "(unknown suspend event)";
+	switch (sleep_state.event) {
+	case PM_EVENT_SUSPEND:
+		return PMSG_RESUME;
+	case PM_EVENT_FREEZE:
+	case PM_EVENT_QUIESCE:
+		return PMSG_RECOVER;
+	case PM_EVENT_HIBERNATE:
+		return PMSG_RESTORE;
 	}
-}
-
-static void
-suspend_device_dbg(struct device *dev, pm_message_t state, char *info)
-{
-	dev_dbg(dev, "%s%s%s\n", info, suspend_verb(state.event),
-		((state.event == PM_EVENT_SUSPEND) && device_may_wakeup(dev)) ?
-		", may wakeup" : "");
+	return PMSG_ON;
 }
 
 /**
- *	suspend_device_late - Shut down one device (late suspend).
+ *	suspend_device_noirq - Shut down one device (late suspend).
  *	@dev:	Device.
- *	@state:	Power state device is entering.
+ *	@state:	PM message representing the operation to perform.
  *
  *	This is called with interrupts off and only a single CPU running.
  */
-static int suspend_device_late(struct device *dev, pm_message_t state)
+static int suspend_device_noirq(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
-	if (dev->bus && dev->bus->suspend_late) {
-		suspend_device_dbg(dev, state, "LATE ");
+	if (!dev->bus)
+		return 0;
+
+	if (dev->bus->pm_noirq) {
+		struct pm_ops *ops = dev->bus->pm_noirq;
+
+		pm_dev_dbg(dev, state, "LATE ");
+		if (ops->prepare)
+			error = ops->prepare(dev);
+		if (!error) {
+			error = pm_op(dev, ops, state);
+			if (error && ops->complete)
+				ops->complete(dev);
+		}
+	} else if (dev->bus->suspend_late) {
+		pm_dev_dbg(dev, state, "legacy LATE ");
 		error = dev->bus->suspend_late(dev, state);
 		suspend_report_result(dev->bus->suspend_late, error);
 	}
@@ -323,7 +503,7 @@ static int suspend_device_late(struct de
 
 /**
  *	device_power_down - Shut down special devices.
- *	@state:		Power state to enter.
+ *	@state:	PM message representing the operation to perform.
  *
  *	Power down devices that require interrupts to be disabled
  *	and move them from the dpm_off list to the dpm_off_irq list.
@@ -339,7 +519,7 @@ int device_power_down(pm_message_t state
 		struct list_head *entry = dpm_off.prev;
 		struct device *dev = to_device(entry);
 
-		error = suspend_device_late(dev, state);
+		error = suspend_device_noirq(dev, state);
 		if (error) {
 			printk(KERN_ERR "Could not power down device %s: "
 					"error %d\n",
@@ -353,12 +533,50 @@ int device_power_down(pm_message_t state
 	if (!error)
 		error = sysdev_suspend(state);
 	if (error)
-		dpm_power_up();
+		dpm_power_up(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_power_down);
 
 /**
+ *	prepare_device - Execute the ->prepare() callback(s) for given device.
+ *	@dev:	Device.
+ *	@state: PM operation we are preparing for.
+ */
+static int prepare_device(struct device *dev, pm_message_t state)
+{
+	int error = 0;
+
+	down(&dev->sem);
+
+	if (dev->bus && dev->bus->pm && dev->bus->pm->prepare) {
+		pm_dev_dbg(dev, state, "preparing ");
+		error = dev->bus->pm->prepare(dev);
+		suspend_report_result(dev->bus->pm->prepare, error);
+		if (error)
+			goto End;
+	}
+
+	if (dev->type && dev->type->pm && dev->type->pm->prepare) {
+		pm_dev_dbg(dev, state, "preparing type ");
+		error = dev->type->pm->prepare(dev);
+		suspend_report_result(dev->type->pm->prepare, error);
+		if (error)
+			goto End;
+	}
+
+	if (dev->class && dev->class->pm && dev->class->pm->prepare) {
+		pm_dev_dbg(dev, state, "preparing class ");
+		error = dev->class->pm->prepare(dev);
+		suspend_report_result(dev->class->pm->prepare, error);
+	}
+ End:
+	up(&dev->sem);
+
+	return error;
+}
+
+/**
  *	suspend_device - Save state of one device.
  *	@dev:	Device.
  *	@state:	Power state device is entering.
@@ -369,29 +587,43 @@ static int suspend_device(struct device 
 
 	down(&dev->sem);
 
-	if (dev->power.power_state.event) {
-		dev_dbg(dev, "PM: suspend %d-->%d\n",
-			dev->power.power_state.event, state.event);
-	}
-
-	if (dev->class && dev->class->suspend) {
-		suspend_device_dbg(dev, state, "class ");
-		error = dev->class->suspend(dev, state);
-		suspend_report_result(dev->class->suspend, error);
+	if (dev->class) {
+		if (dev->class->pm) {
+			pm_dev_dbg(dev, state, "class ");
+			error = pm_op(dev, dev->class->pm, state);
+		} else if (dev->class->suspend) {
+			pm_dev_dbg(dev, state, "legacy class ");
+			error = dev->class->suspend(dev, state);
+			suspend_report_result(dev->class->suspend, error);
+		}
+		if (error)
+			goto End;
 	}
 
-	if (!error && dev->type && dev->type->suspend) {
-		suspend_device_dbg(dev, state, "type ");
-		error = dev->type->suspend(dev, state);
-		suspend_report_result(dev->type->suspend, error);
+	if (dev->type) {
+		if (dev->type->pm) {
+			pm_dev_dbg(dev, state, "type ");
+			error = pm_op(dev, dev->type->pm, state);
+		} else if (dev->type->suspend) {
+			pm_dev_dbg(dev, state, "legacy type ");
+			error = dev->type->suspend(dev, state);
+			suspend_report_result(dev->type->suspend, error);
+		}
+		if (error)
+			goto End;
 	}
 
-	if (!error && dev->bus && dev->bus->suspend) {
-		suspend_device_dbg(dev, state, "");
-		error = dev->bus->suspend(dev, state);
-		suspend_report_result(dev->bus->suspend, error);
+	if (dev->bus) {
+		if (dev->bus->pm) {
+			pm_dev_dbg(dev, state, "");
+			error = pm_op(dev, dev->bus->pm, state);
+		} else if (dev->bus->suspend) {
+			pm_dev_dbg(dev, state, "legacy ");
+			error = dev->bus->suspend(dev, state);
+			suspend_report_result(dev->bus->suspend, error);
+		}
 	}
-
+ End:
 	up(&dev->sem);
 
 	return error;
@@ -401,13 +633,8 @@ static int suspend_device(struct device 
  *	dpm_suspend - Suspend every device.
  *	@state:	Power state to put each device in.
  *
- *	Walk the dpm_locked list.  Suspend each device and move it
+ *	Walk the dpm_active list.  Suspend each device and move it
  *	to the dpm_off list.
- *
- *	(For historical reasons, if it returns -EAGAIN, that used to mean
- *	that the device would be called again with interrupts disabled.
- *	These days, we use the "suspend_late()" callback for that, so we
- *	print a warning and consider it an error).
  */
 static int dpm_suspend(pm_message_t state)
 {
@@ -418,30 +645,53 @@ static int dpm_suspend(pm_message_t stat
 		struct list_head *entry = dpm_active.prev;
 		struct device *dev = to_device(entry);
 
-		WARN_ON(dev->parent && dev->parent->power.sleeping);
+		WARN_ON(dev->parent && dev->parent->power.status == DPM_OFF);
+		dev->power.status = DPM_SUSPENDING;
+		mutex_unlock(&dpm_list_mtx);
+
+		error = prepare_device(dev, state);
 
-		dev->power.sleeping = true;
+		mutex_lock(&dpm_list_mtx);
+		if (error) {
+			dev->power.status = DPM_ON;
+			if (error == -EAGAIN)
+				continue;
+			printk(KERN_ERR "Could not prepare device %s "
+				"for suspend: error %d\n",
+				kobject_name(&dev->kobj), error);
+			goto Unlock;
+		}
+		if (dev->power.status == DPM_ON) {
+			/* Child added during prepare_device() */
+			mutex_unlock(&dpm_list_mtx);
+
+			complete_device(dev, resume_event(state));
+
+			mutex_lock(&dpm_list_mtx);
+			continue;
+		}
+		dev->power.status = DPM_OFF;
 		mutex_unlock(&dpm_list_mtx);
+
 		error = suspend_device(dev, state);
+
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
+			dev->power.status = DPM_ON;
 			printk(KERN_ERR "Could not suspend device %s: "
-					"error %d%s\n",
-					kobject_name(&dev->kobj),
-					error,
-					(error == -EAGAIN ?
-					" (please convert to suspend_late)" :
-					""));
-			dev->power.sleeping = false;
-			break;
+				"error %d\n", kobject_name(&dev->kobj), error);
+			mutex_unlock(&dpm_list_mtx);
+
+			complete_device(dev, resume_event(state));
+			goto End;
 		}
 		if (!list_empty(&dev->power.entry))
 			list_move(&dev->power.entry, &dpm_off);
 	}
-	if (!error)
-		all_sleeping = true;
+	all_sleeping = true;
+ Unlock:
 	mutex_unlock(&dpm_list_mtx);
-
+ End:
 	return error;
 }
 
@@ -449,8 +699,7 @@ static int dpm_suspend(pm_message_t stat
  *	device_suspend - Save state and stop all devices in system.
  *	@state: new power management state
  *
- *	Prevent new devices from being registered, then lock all devices
- *	and suspend them.
+ *	Prepare and suspend all devices.
  */
 int device_suspend(pm_message_t state)
 {
@@ -459,7 +708,7 @@ int device_suspend(pm_message_t state)
 	might_sleep();
 	error = dpm_suspend(state);
 	if (error)
-		device_resume();
+		device_resume(resume_event(state));
 	return error;
 }
 EXPORT_SYMBOL_GPL(device_suspend);
Index: linux-2.6/include/linux/device.h
===================================================================
--- linux-2.6.orig/include/linux/device.h
+++ linux-2.6/include/linux/device.h
@@ -69,6 +69,9 @@ struct bus_type {
 	int (*resume_early)(struct device *dev);
 	int (*resume)(struct device *dev);
 
+	struct pm_ops *pm;
+	struct pm_ops *pm_noirq;
+
 	struct bus_type_private *p;
 };
 
@@ -202,6 +205,8 @@ struct class {
 
 	int (*suspend)(struct device *dev, pm_message_t state);
 	int (*resume)(struct device *dev);
+
+	struct pm_ops *pm;
 };
 
 extern int __must_check class_register(struct class *class);
@@ -345,8 +350,11 @@ struct device_type {
 	struct attribute_group **groups;
 	int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
 	void (*release)(struct device *dev);
+
 	int (*suspend)(struct device *dev, pm_message_t state);
 	int (*resume)(struct device *dev);
+
+	struct pm_ops *pm;
 };
 
 /* interface for exporting device attributes */
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -224,7 +224,7 @@ static int create_image(int platform_mod
 	/* NOTE:  device_power_up() is just a resume() for devices
 	 * that suspended with irqs off ... no overall powerup.
 	 */
-	device_power_up();
+	device_power_up(in_suspend ? PMSG_RECOVER : PMSG_RESTORE);
  Enable_irqs:
 	local_irq_enable();
 	return error;
@@ -280,7 +280,7 @@ int hibernation_snapshot(int platform_mo
  Finish:
 	platform_finish(platform_mode);
  Resume_devices:
-	device_resume();
+	device_resume(in_suspend ? PMSG_RECOVER : PMSG_RESTORE);
  Resume_console:
 	resume_console();
  Close:
@@ -301,7 +301,7 @@ static int resume_target_kernel(void)
 	int error;
 
 	local_irq_disable();
-	error = device_power_down(PMSG_PRETHAW);
+	error = device_power_down(PMSG_QUIESCE);
 	if (error) {
 		printk(KERN_ERR "PM: Some devices failed to power down, "
 			"aborting resume\n");
@@ -329,7 +329,7 @@ static int resume_target_kernel(void)
 	swsusp_free();
 	restore_processor_state();
 	touch_softlockup_watchdog();
-	device_power_up();
+	device_power_up(PMSG_THAW);
  Enable_irqs:
 	local_irq_enable();
 	return error;
@@ -350,7 +350,7 @@ int hibernation_restore(int platform_mod
 
 	pm_prepare_console();
 	suspend_console();
-	error = device_suspend(PMSG_PRETHAW);
+	error = device_suspend(PMSG_QUIESCE);
 	if (error)
 		goto Finish;
 
@@ -362,7 +362,7 @@ int hibernation_restore(int platform_mod
 		enable_nonboot_cpus();
 	}
 	platform_restore_cleanup(platform_mode);
-	device_resume();
+	device_resume(PMSG_RECOVER);
  Finish:
 	resume_console();
 	pm_restore_console();
@@ -419,7 +419,7 @@ int hibernation_platform_enter(void)
  Finish:
 	hibernation_ops->finish();
  Resume_devices:
-	device_resume();
+	device_resume(PMSG_RESTORE);
  Resume_console:
 	resume_console();
  Close:
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -239,7 +239,7 @@ static int suspend_enter(suspend_state_t
 	if (!suspend_test(TEST_CORE))
 		error = suspend_ops->enter(state);
 
-	device_power_up();
+	device_power_up(PMSG_RESUME);
  Done:
 	arch_suspend_enable_irqs();
 	BUG_ON(irqs_disabled());
@@ -291,7 +291,7 @@ int suspend_devices_and_enter(suspend_st
 	if (suspend_ops->finish)
 		suspend_ops->finish();
  Resume_devices:
-	device_resume();
+	device_resume(PMSG_RESUME);
  Resume_console:
 	resume_console();
  Close:
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1221,9 +1221,9 @@ static int suspend(int vetoable)
 	if (err != APM_SUCCESS)
 		apm_error("suspend", err);
 	err = (err == APM_SUCCESS) ? 0 : -EIO;
-	device_power_up();
+	device_power_up(PMSG_RESUME);
 	local_irq_enable();
-	device_resume();
+	device_resume(PMSG_RESUME);
 	pm_send_all(PM_RESUME, (void *)0);
 	queue_event(APM_NORMAL_RESUME, NULL);
  out:

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] PM: Separate suspend and hibernation callbacks (highest level) - updated
  2008-03-10 16:58           ` Rafael J. Wysocki
@ 2008-03-10 18:54             ` Alan Stern
  2008-03-12  9:37             ` David Brownell
  1 sibling, 0 replies; 10+ messages in thread
From: Alan Stern @ 2008-03-10 18:54 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: pm list, ACPI Devel Maling List, Alexey Starikovskiy, Len Brown,
	Pavel Machek, David Brownell

On Mon, 10 Mar 2008, Rafael J. Wysocki wrote:

> Okay, I reworked the patch so that ->complete() is not called after a failing
> ->prepare() (even if that returns -EAGAIN).  For this purpose I had to rework
> dpm_suspend() quite heavily (once again).  Hopefully, I didn't break it.

> The updated patch is appended.

It all looks good.  The next step is testing...

Alan Stern


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] PM: Separate suspend and hibernation callbacks (highest level) - updated
  2008-03-10 16:58           ` Rafael J. Wysocki
  2008-03-10 18:54             ` Alan Stern
@ 2008-03-12  9:37             ` David Brownell
  2008-03-12 21:22               ` Rafael J. Wysocki
  1 sibling, 1 reply; 10+ messages in thread
From: David Brownell @ 2008-03-12  9:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Alan Stern, pm list, ACPI Devel Maling List, Alexey Starikovskiy,
	Len Brown, Pavel Machek

On Monday 10 March 2008, Rafael J. Wysocki wrote:
> + * @poweroff: Hibernation-specific, executed after saving a hibernation image.
> + *     Quiesce the device, put it into a low power state appropriate for the
> + *     upcoming system state (such as PCI_D3hot), and enable wakeup events as
> + *     appropriate.

This seems uncomfortably similar to device_driver.shutdown().
The only obvious difference is wakeup event handling, and even
that is already a function of the target system state.

Are both methods needed?

Shouldn't this be more generic, not "hibernation-specific"?

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH] PM: Separate suspend and hibernation callbacks (highest level) - updated
  2008-03-12  9:37             ` David Brownell
@ 2008-03-12 21:22               ` Rafael J. Wysocki
  0 siblings, 0 replies; 10+ messages in thread
From: Rafael J. Wysocki @ 2008-03-12 21:22 UTC (permalink / raw)
  To: David Brownell
  Cc: Alan Stern, pm list, ACPI Devel Maling List, Alexey Starikovskiy,
	Len Brown, Pavel Machek

On Wednesday, 12 of March 2008, David Brownell wrote:
> On Monday 10 March 2008, Rafael J. Wysocki wrote:
> > + * @poweroff: Hibernation-specific, executed after saving a hibernation image.
> > + *     Quiesce the device, put it into a low power state appropriate for the
> > + *     upcoming system state (such as PCI_D3hot), and enable wakeup events as
> > + *     appropriate.
> 
> This seems uncomfortably similar to device_driver.shutdown().
> The only obvious difference is wakeup event handling, and even
> that is already a function of the target system state.
> 
> Are both methods needed?
> 
> Shouldn't this be more generic, not "hibernation-specific"?

Well, let's not make restrictions at this point.  There's nothing wrong with
pointing both at the same function and if everyone turns out to do that, we'll
remove one callback.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-03-12 21:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-09  1:20 [RFC][PATCH] PM: Separate suspend and hibernation callbacks (highest level) - updated Rafael J. Wysocki
2008-03-09  4:13 ` Alan Stern
2008-03-09 13:28   ` Rafael J. Wysocki
2008-03-09 19:56     ` Alan Stern
2008-03-09 20:50       ` Rafael J. Wysocki
2008-03-10  2:52         ` Alan Stern
2008-03-10 16:58           ` Rafael J. Wysocki
2008-03-10 18:54             ` Alan Stern
2008-03-12  9:37             ` David Brownell
2008-03-12 21:22               ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox