public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Solving suspend-level confusion
@ 2004-07-30 16:44 Pavel Machek
  2004-07-30 22:39 ` Benjamin Herrenschmidt
  2004-07-31  4:02 ` David Brownell
  0 siblings, 2 replies; 61+ messages in thread
From: Pavel Machek @ 2004-07-30 16:44 UTC (permalink / raw)
  To: benh, kernel list, Patrick Mochel, david-b, dbrownell

Hi!

There's quite big confusion what 'u32 state' in suspend callbacks
mean. Half code interprets it as a system-wide suspend level, and
second half thinks it is PCI Dx state. Bad.

What about this solution:

* system-wide suspend level is always passed down (it is more
detailed, and for example IDE driver cares)

* if you want to derive Dx state, just use provided inline function to
convert.

That should be acceptable to everyone. I plan on explicitly using
enums to prevent further confusion. Patch is likely to be pretty big:
ideally all prototypes of suspend function would be fixed so noone can
be confused.

What do you think?
								Pavel
PS: I'll be on holidays for a week, feel free to implement this or
something similar.. it is going to be lot of search/replace in drivers
:-(.

--- tmp/linux/include/linux/pci.h	2004-06-22 12:36:45.000000000 +0200
+++ linux/include/linux/pci.h	2004-07-30 18:24:02.000000000 +0200
@@ -593,7 +593,7 @@
 	const struct pci_device_id *id_table;	/* must be non-NULL for probe to be called */
 	int  (*probe)  (struct pci_dev *dev, const struct pci_device_id *id);	/* New device inserted */
 	void (*remove) (struct pci_dev *dev);	/* Device removed (NULL if not a hot-plug capable driver) */
-	int  (*suspend) (struct pci_dev *dev, u32 state);	/* Device suspended */
+	int  (*suspend) (struct pci_dev *dev, enum suspend_state reason);	/* Device suspended */
 	int  (*resume) (struct pci_dev *dev);	                /* Device woken up */
 	int  (*enable_wake) (struct pci_dev *dev, u32 state, int enable);   /* Enable wake event */
 
--- tmp/linux/include/linux/pm.h	2004-06-22 12:36:45.000000000 +0200
+++ linux/include/linux/pm.h	2004-07-30 18:23:11.000000000 +0200
@@ -193,11 +193,11 @@
 extern void (*pm_idle)(void);
 extern void (*pm_power_off)(void);
 
-enum {
-	PM_SUSPEND_ON,
-	PM_SUSPEND_STANDBY,
-	PM_SUSPEND_MEM,
-	PM_SUSPEND_DISK,
+enum system_state {
+	PM_SUSPEND_ON = 0,
+	PM_SUSPEND_STANDBY = 1,
+	PM_SUSPEND_MEM = 2,
+	PM_SUSPEND_DISK = 3,
 	PM_SUSPEND_MAX,
 };
 
@@ -241,11 +240,47 @@
 
 extern void device_pm_set_parent(struct device * dev, struct device * parent);
 
-extern int device_suspend(u32 state);
-extern int device_power_down(u32 state);
+enum suspend_state {
+	SNAPSHOT = 10,		/* For debugging, symbolic constants should be everywhere */
+	POWERDOWN,
+	RESTART,
+	RUNTIME_D1,
+	RUNTIME_D2,
+	RUNTIME_D3hot,
+	RUNTIME_D3cold
+};
+
+extern int device_suspend(enum suspend_state reason);
+extern int device_power_down(enum suspend_state reason);
 extern void device_power_up(void);
 extern void device_resume(void);
 
+enum pci_state {
+	D0 = 20,		/* For debugging, symbolic constants should be everywhere */
+	D1,
+	D2,
+	D3hot,
+	D3cold
+};
+
+static inline enum pci_state to_pci_state(enum suspend_state state)
+{
+	switch(state) {
+	case RUNTIME_D1:
+		return D1;
+	case RUNTIME_D2:
+		return D2;
+	case RUNTIME_D3hot:
+		return D3hot;
+	case SNAPSHOT:
+	case POWERDOWN:
+	case RESTART:
+	case RUNTIME_D3cold:
+		return D3cold;
+	default:
+		BUG();
+	}
+}
 
 #endif /* __KERNEL__ */
 
--- tmp/linux/kernel/power/disk.c	2004-05-20 23:08:36.000000000 +0200
+++ linux/kernel/power/disk.c	2004-07-30 18:18:04.000000000 +0200
@@ -46,20 +46,25 @@
 	int error = 0;
 
 	local_irq_save(flags);
-	device_power_down(PM_SUSPEND_DISK);
 	switch(mode) {
 	case PM_DISK_PLATFORM:
+		device_power_down(POWERDOWN);
 		error = pm_ops->enter(PM_SUSPEND_DISK);
 		break;
 	case PM_DISK_SHUTDOWN:
+		device_power_down(POWERDOWN);
 		printk("Powering off system\n");
 		machine_power_off();
 		break;
 	case PM_DISK_REBOOT:
+		device_power_down(RESTART);
 		machine_restart(NULL);
 		break;
 	}
 	machine_halt();
+	/* Valid image is on the disk, if we continue we risk serious data corruption
+	   after resume. */
+	while(1);
 	device_power_up();
 	local_irq_restore(flags);
 	return 0;

-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-07-30 16:44 Solving suspend-level confusion Pavel Machek
@ 2004-07-30 22:39 ` Benjamin Herrenschmidt
  2004-07-30 23:06   ` Pavel Machek
  2004-07-31  4:02 ` David Brownell
  1 sibling, 1 reply; 61+ messages in thread
From: Benjamin Herrenschmidt @ 2004-07-30 22:39 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Linux Kernel list, Patrick Mochel, David Brownell, dbrownell

On Sat, 2004-07-31 at 02:44, Pavel Machek wrote:
> Hi!
> 
> There's quite big confusion what 'u32 state' in suspend callbacks
> mean. Half code interprets it as a system-wide suspend level, and
> second half thinks it is PCI Dx state. Bad.
> 
> What about this solution:
> 
> * system-wide suspend level is always passed down (it is more
> detailed, and for example IDE driver cares)
> 
> * if you want to derive Dx state, just use provided inline function to
> convert.
> 
> That should be acceptable to everyone. I plan on explicitly using
> enums to prevent further confusion. Patch is likely to be pretty big:
> ideally all prototypes of suspend function would be fixed so noone can
> be confused.
> 
> What do you think?

Your constants are ugly ;) But the whole idea makes sense, I've started
implementing something on my side, though I didn't change the u32 to an
enum to avoid having to "fix" bazillions of drivers. Proper
documentation may just be enough...
								Pavel
> PS: I'll be on holidays for a week, feel free to implement this or
> something similar.. it is going to be lot of search/replace in drivers
> :-(.

I'll have some patches soon along with the PPC stuff.


> --- tmp/linux/include/linux/pci.h	2004-06-22 12:36:45.000000000 +0200
> +++ linux/include/linux/pci.h	2004-07-30 18:24:02.000000000 +0200
> @@ -593,7 +593,7 @@
>  	const struct pci_device_id *id_table;	/* must be non-NULL for probe to be called */
>  	int  (*probe)  (struct pci_dev *dev, const struct pci_device_id *id);	/* New device inserted */
>  	void (*remove) (struct pci_dev *dev);	/* Device removed (NULL if not a hot-plug capable driver) */
> -	int  (*suspend) (struct pci_dev *dev, u32 state);	/* Device suspended */
> +	int  (*suspend) (struct pci_dev *dev, enum suspend_state reason);	/* Device suspended */
>  	int  (*resume) (struct pci_dev *dev);	                /* Device woken up */
>  	int  (*enable_wake) (struct pci_dev *dev, u32 state, int enable);   /* Enable wake event */
>  
> --- tmp/linux/include/linux/pm.h	2004-06-22 12:36:45.000000000 +0200
> +++ linux/include/linux/pm.h	2004-07-30 18:23:11.000000000 +0200
> @@ -193,11 +193,11 @@
>  extern void (*pm_idle)(void);
>  extern void (*pm_power_off)(void);
>  
> -enum {
> -	PM_SUSPEND_ON,
> -	PM_SUSPEND_STANDBY,
> -	PM_SUSPEND_MEM,
> -	PM_SUSPEND_DISK,
> +enum system_state {
> +	PM_SUSPEND_ON = 0,
> +	PM_SUSPEND_STANDBY = 1,
> +	PM_SUSPEND_MEM = 2,
> +	PM_SUSPEND_DISK = 3,
>  	PM_SUSPEND_MAX,
>  };
>  
> @@ -241,11 +240,47 @@
>  
>  extern void device_pm_set_parent(struct device * dev, struct device * parent);
>  
> -extern int device_suspend(u32 state);
> -extern int device_power_down(u32 state);
> +enum suspend_state {
> +	SNAPSHOT = 10,		/* For debugging, symbolic constants should be everywhere */
> +	POWERDOWN,
> +	RESTART,
> +	RUNTIME_D1,
> +	RUNTIME_D2,
> +	RUNTIME_D3hot,
> +	RUNTIME_D3cold
> +};
> +
> +extern int device_suspend(enum suspend_state reason);
> +extern int device_power_down(enum suspend_state reason);
>  extern void device_power_up(void);
>  extern void device_resume(void);
>  
> +enum pci_state {
> +	D0 = 20,		/* For debugging, symbolic constants should be everywhere */
> +	D1,
> +	D2,
> +	D3hot,
> +	D3cold
> +};
> +
> +static inline enum pci_state to_pci_state(enum suspend_state state)
> +{
> +	switch(state) {
> +	case RUNTIME_D1:
> +		return D1;
> +	case RUNTIME_D2:
> +		return D2;
> +	case RUNTIME_D3hot:
> +		return D3hot;
> +	case SNAPSHOT:
> +	case POWERDOWN:
> +	case RESTART:
> +	case RUNTIME_D3cold:
> +		return D3cold;
> +	default:
> +		BUG();
> +	}
> +}
>  
>  #endif /* __KERNEL__ */
>  
> --- tmp/linux/kernel/power/disk.c	2004-05-20 23:08:36.000000000 +0200
> +++ linux/kernel/power/disk.c	2004-07-30 18:18:04.000000000 +0200
> @@ -46,20 +46,25 @@
>  	int error = 0;
>  
>  	local_irq_save(flags);
> -	device_power_down(PM_SUSPEND_DISK);
>  	switch(mode) {
>  	case PM_DISK_PLATFORM:
> +		device_power_down(POWERDOWN);
>  		error = pm_ops->enter(PM_SUSPEND_DISK);
>  		break;
>  	case PM_DISK_SHUTDOWN:
> +		device_power_down(POWERDOWN);
>  		printk("Powering off system\n");
>  		machine_power_off();
>  		break;
>  	case PM_DISK_REBOOT:
> +		device_power_down(RESTART);
>  		machine_restart(NULL);
>  		break;
>  	}
>  	machine_halt();
> +	/* Valid image is on the disk, if we continue we risk serious data corruption
> +	   after resume. */
> +	while(1);
>  	device_power_up();
>  	local_irq_restore(flags);
>  	return 0;
-- 
Benjamin Herrenschmidt <benh@kernel.crashing.org>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-07-30 22:39 ` Benjamin Herrenschmidt
@ 2004-07-30 23:06   ` Pavel Machek
  0 siblings, 0 replies; 61+ messages in thread
From: Pavel Machek @ 2004-07-30 23:06 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linux Kernel list, Patrick Mochel, David Brownell, dbrownell

Hi!

> > What do you think?
> 
> Your constants are ugly ;) But the whole idea makes sense, I've started
> implementing something on my side, though I didn't change the u32 to an
> enum to avoid having to "fix" bazillions of drivers. Proper
> documentation may just be enough...

Good. (Well, I'd still like to migrate to enums, eventually, so it
is not easy to confuse, but...)

> > PS: I'll be on holidays for a week, feel free to implement this or
> > something similar.. it is going to be lot of search/replace in drivers
> > :-(.
> 
> I'll have some patches soon along with the PPC stuff.

I'm looking forward...
								Pavel

-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-07-30 16:44 Solving suspend-level confusion Pavel Machek
  2004-07-30 22:39 ` Benjamin Herrenschmidt
@ 2004-07-31  4:02 ` David Brownell
  2004-07-31  4:36   ` Pavel Machek
  2004-07-31  5:49   ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 61+ messages in thread
From: David Brownell @ 2004-07-31  4:02 UTC (permalink / raw)
  To: Pavel Machek; +Cc: benh, kernel list, Patrick Mochel

On Friday 30 July 2004 09:44, Pavel Machek wrote:

> * system-wide suspend level is always passed down (it is more
> detailed, and for example IDE driver cares)

This bothers me -- why should a "system" suspend level matter
to a device-level suspend call?  Seems like if IDE cares, it's
probably being given the wrong device-level suspend state,
or it needs more information than the target device state.

The problem I'm clear on is with PCI suspend, which some
earlier driver model PM changes goofed up.  It's trying to
pass a system state to driver suspend() methods that are
expecting a value appropriate for pci_set_power_state().
You're proposing to fix that by changing the call semantics,
while I'd rather just see the call site fixed.

I trust nobody is now disagreeing that's the root cause of
several suspend problems ... and I suspect that API changes,
to pass enums, should be part of the "real" fix.  (As Len has
commented, C enums aren't type-safe ... but at least they
provide documentation which "u32 state" can't!)


> * if you want to derive Dx state, just use provided inline function to
> convert.

If the model is that there's some PM "layer" (for lack of better term)
that's in charge of "system suspend" (e.g. to ACPI S3), then I have
no qualms saying that layer is responsible for mapping the those
states into PCI D-states before calling PCI suspend routines.  But
we don't really seem to have such a layer.  MontaVista has some
"DPM" code, distinct from drivers/base/power calls with that TLA,
that are one example of such a layer ... allowing multiple power
configurations.  Not the only way to do it -- but also not quite the
limited "laptop into S3 now" kind of model either.

Point being that it should certainly be possible to selectively
suspend devices without trying to put the whole system into a
different state (just certain devices) ... and also without lying to
any device about the system state.  (In fact, devices should as
a rule not care about system power state, only their own state.)
It should be easy for one driver to suspend or resume another
one, without any changes to "system" state.


Some specific comments on the patch:

> +enum pci_state {
> +	D0 = 20,		/* For debugging, symbolic constants should be everywhere */
> +	D1,
> +	D2,
> +	D3hot,
> +	D3cold
> +};

Those would be better as PCI_D0, PCI_D2 etc ... so they're not confused
with ACPI_D0, ACPI_D2 etc.

> +
> +static inline enum pci_state to_pci_state(enum suspend_state state)
> +{
> +	switch(state) {
> +	case RUNTIME_D1:
> +		return D1;
> +	case RUNTIME_D2:
> +		return D2;
> +	case RUNTIME_D3hot:
> +		return D3hot;
> +	case SNAPSHOT:
> +	case POWERDOWN:
> +	case RESTART:
> +	case RUNTIME_D3cold:
> +		return D3cold;
> +	default:
> +		BUG();
> +	}
> +}
>  
>  #endif /* __KERNEL__ */

This stuff, if it's used, belongs in <linux/pci.h> not <linux/pm.h> ... it's
not generic to all busses.  And pci_set_power_state() should probably
be modified to take the enum ... though I don't much like that notion,
it'd require changing every driver that actually tries to use the PCI
PM calls (since they currently "know" D3hot==3 and D3cold==4).

- Dave

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-07-31  4:02 ` David Brownell
@ 2004-07-31  4:36   ` Pavel Machek
  2004-07-31  5:49   ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 61+ messages in thread
From: Pavel Machek @ 2004-07-31  4:36 UTC (permalink / raw)
  To: David Brownell; +Cc: benh, kernel list, Patrick Mochel

Hi!

> > * system-wide suspend level is always passed down (it is more
> > detailed, and for example IDE driver cares)
> 
> This bothers me -- why should a "system" suspend level matter
> to a device-level suspend call?  Seems like if IDE cares, it's
> probably being given the wrong device-level suspend state,
> or it needs more information than the target device state.

During swsusp, you need to make devices quiet; you do not need them to
enter any particular D0..D3, you just want them not to do DMA and
interrupts.

During reboot, you do not need devices in any of D0..D3; you do not
care, as long as BIOS can work with the devices after the reboot.

I do not think I'm giving them wrong device-level state.

> The problem I'm clear on is with PCI suspend, which some
> earlier driver model PM changes goofed up.  It's trying to
> pass a system state to driver suspend() methods that are
> expecting a value appropriate for pci_set_power_state().
> You're proposing to fix that by changing the call semantics,
> while I'd rather just see the call site fixed.

I do not think enough info can be fit into D0..D3.

> > * if you want to derive Dx state, just use provided inline function to
> > convert.
> 
> If the model is that there's some PM "layer" (for lack of better term)
> that's in charge of "system suspend" (e.g. to ACPI S3), then I have
> no qualms saying that layer is responsible for mapping the those
> states into PCI D-states before calling PCI suspend routines.  But
> we don't really seem to have such a layer.  MontaVista has some
> "DPM" code, distinct from drivers/base/power calls with that TLA,
> that are one example of such a layer ... allowing multiple power
> configurations.  Not the only way to do it -- but also not quite the
> limited "laptop into S3 now" kind of model either.

PCI suspend routines need more detailed info than D0..D3. We do not
want global "int real_system_state" that devices look at...

> Point being that it should certainly be possible to selectively
> suspend devices without trying to put the whole system into a
> different state (just certain devices) ... and also without lying to
> any device about the system state.  (In fact, devices should as
> a rule not care about system power state, only their own state.)

See above, devices care what system is doing. For example you do not
want to spin down disks during "make it quiet for swsusp". And you
need to turn off apic before reboot, not because you want to save
power, but because bios crashes if you don't do that.

> > +enum pci_state {
> > +	D0 = 20,		/* For debugging, symbolic constants should be everywhere */
> > +	D1,
> > +	D2,
> > +	D3hot,
> > +	D3cold
> > +};
> 
> Those would be better as PCI_D0, PCI_D2 etc ... so they're not confused
> with ACPI_D0, ACPI_D2 etc.

No problem with that.

> > +
> > +static inline enum pci_state to_pci_state(enum suspend_state state)
> > +{
> > +	switch(state) {
> > +	case RUNTIME_D1:
> > +		return D1;
> > +	case RUNTIME_D2:
> > +		return D2;
> > +	case RUNTIME_D3hot:
> > +		return D3hot;
> > +	case SNAPSHOT:
> > +	case POWERDOWN:
> > +	case RESTART:
> > +	case RUNTIME_D3cold:
> > +		return D3cold;
> > +	default:
> > +		BUG();
> > +	}
> > +}
> >  
> >  #endif /* __KERNEL__ */
> 
> This stuff, if it's used, belongs in <linux/pci.h> not <linux/pm.h> ... it's
> not generic to all busses.  And pci_set_power_state() should probably
> be modified to take the enum ... though I don't much like that notion,
> it'd require changing every driver that actually tries to use the PCI
> PM calls (since they currently "know" D3hot==3 and D3cold==4).

Moving include should not be a problem... Changing all PCI
drivers... Ben H. claimed there's so little of them he can change them
all ;-).
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-07-31  4:02 ` David Brownell
  2004-07-31  4:36   ` Pavel Machek
@ 2004-07-31  5:49   ` Benjamin Herrenschmidt
  2004-07-31 14:23     ` David Brownell
  1 sibling, 1 reply; 61+ messages in thread
From: Benjamin Herrenschmidt @ 2004-07-31  5:49 UTC (permalink / raw)
  To: David Brownell; +Cc: Pavel Machek, Linux Kernel list, Patrick Mochel

On Sat, 2004-07-31 at 14:02, David Brownell wrote:
> On Friday 30 July 2004 09:44, Pavel Machek wrote:
> 
> > * system-wide suspend level is always passed down (it is more
> > detailed, and for example IDE driver cares)
> 
> This bothers me -- why should a "system" suspend level matter
> to a device-level suspend call?  Seems like if IDE cares, it's
> probably being given the wrong device-level suspend state,
> or it needs more information than the target device state.

Well, as I explained you during OLS ... Drivers will act differently
for suspend-to-ram, suspend-to-disk, and eventually other states we
may introduce, like a system "idle" state.

Disks in general are an example (IDE beeing the one that is currently
implemented, but we'll probably have to do the same for SATA and SCSI
at one point), you want to spin them off (with proper cache flush
etc...) when suspending to RAM, while you don't when suspending to
disk, as you really don't want them to be spun up again right away to
write the suspend image.

USB is another example. Typically, suspend-to-RAM wants to do a bus
suspend, eventually enabling remote wakeup on some devices, and expects
to recover the bus on wakeup, while suspend-to-disk is roughtly
equivalent to a full shutdown & reconnect on wakeup.

I have other platform drivers that will behave differently. Most of the
time the difference is that suspend-to-disk will not bother actually
shutting down the HW, just freeze the state of the device, while suspend
to RAM will actually want to enter whatever D state is appropriate.

> The problem I'm clear on is with PCI suspend, which some
> earlier driver model PM changes goofed up.  It's trying to
> pass a system state to driver suspend() methods that are
> expecting a value appropriate for pci_set_power_state().
> You're proposing to fix that by changing the call semantics,
> while I'd rather just see the call site fixed.

No, I don't agree. It's a driver policy to decide what PCI state
to use based on the system suspend level.

> I trust nobody is now disagreeing that's the root cause of
> several suspend problems ... and I suspect that API changes,
> to pass enums, should be part of the "real" fix.  (As Len has
> commented, C enums aren't type-safe ... but at least they
> provide documentation which "u32 state" can't!)
> 
> 
> > * if you want to derive Dx state, just use provided inline function to
> > convert.
> 
> If the model is that there's some PM "layer" (for lack of better term)
> that's in charge of "system suspend" (e.g. to ACPI S3), then I have
> no qualms saying that layer is responsible for mapping the those
> states into PCI D-states before calling PCI suspend routines.  But
> we don't really seem to have such a layer.  MontaVista has some
> "DPM" code, distinct from drivers/base/power calls with that TLA,
> that are one example of such a layer ... allowing multiple power
> configurations.  Not the only way to do it -- but also not quite the
> limited "laptop into S3 now" kind of model either.
> 
> Point being that it should certainly be possible to selectively
> suspend devices without trying to put the whole system into a
> different state (just certain devices) ... and also without lying to
> any device about the system state.  (In fact, devices should as
> a rule not care about system power state, only their own state.)
> It should be easy for one driver to suspend or resume another
> one, without any changes to "system" state.
> 
> 
> Some specific comments on the patch:
> 
> > +enum pci_state {
> > +	D0 = 20,		/* For debugging, symbolic constants should be everywhere */
> > +	D1,
> > +	D2,
> > +	D3hot,
> > +	D3cold
> > +};
> 
> Those would be better as PCI_D0, PCI_D2 etc ... so they're not confused
> with ACPI_D0, ACPI_D2 etc.
> 
> > +
> > +static inline enum pci_state to_pci_state(enum suspend_state state)
> > +{
> > +	switch(state) {
> > +	case RUNTIME_D1:
> > +		return D1;
> > +	case RUNTIME_D2:
> > +		return D2;
> > +	case RUNTIME_D3hot:
> > +		return D3hot;
> > +	case SNAPSHOT:
> > +	case POWERDOWN:
> > +	case RESTART:
> > +	case RUNTIME_D3cold:
> > +		return D3cold;
> > +	default:
> > +		BUG();
> > +	}
> > +}
> >  
> >  #endif /* __KERNEL__ */
> 
> This stuff, if it's used, belongs in <linux/pci.h> not <linux/pm.h> ... it's
> not generic to all busses.  And pci_set_power_state() should probably
> be modified to take the enum ... though I don't much like that notion,
> it'd require changing every driver that actually tries to use the PCI
> PM calls (since they currently "know" D3hot==3 and D3cold==4).
> 
> - Dave
-- 
Benjamin Herrenschmidt <benh@kernel.crashing.org>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-07-31  5:49   ` Benjamin Herrenschmidt
@ 2004-07-31 14:23     ` David Brownell
  2004-07-31 17:01       ` Oliver Neukum
                         ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: David Brownell @ 2004-07-31 14:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Pavel Machek, Linux Kernel list, Patrick Mochel

On Friday 30 July 2004 22:49, Benjamin Herrenschmidt wrote:
> On Sat, 2004-07-31 at 14:02, David Brownell wrote:
> > On Friday 30 July 2004 09:44, Pavel Machek wrote:
> > 
> > > * system-wide suspend level is always passed down (it is more
> > > detailed, and for example IDE driver cares)
> > 
> > This bothers me -- why should a "system" suspend level matter
> > to a device-level suspend call?  Seems like if IDE cares, it's
> > probably being given the wrong device-level suspend state,
> > or it needs more information than the target device state.
> 
> Well, as I explained you during OLS ... Drivers will act differently
> for suspend-to-ram, suspend-to-disk, and eventually other states we
> may introduce, like a system "idle" state.

We didn't quite get into details enough to have it be "explain"
though ... it was more of "assert"!


> Disks in general are an example (IDE beeing the one that is currently
> implemented, but we'll probably have to do the same for SATA and SCSI
> at one point), you want to spin them off (with proper cache flush
> etc...) when suspending to RAM, while you don't when suspending to
> disk, as you really don't want them to be spun up again right away to
> write the suspend image.

So suspend-to-RAM more or less matches PCI D3hot, and
suspend-to-DISK matches PCI D3cold.  If those power states
were passed to the device suspend(), the disk driver could act
appropriately.  In my observation, D3cold was never passed
down, it was always D3hot.

These look to me like "wrong device-level suspend state" cases.


> USB is another example. Typically, suspend-to-RAM wants to do a bus
> suspend, eventually enabling remote wakeup on some devices, and expects
> to recover the bus on wakeup, while suspend-to-disk is roughtly
> equivalent to a full shutdown & reconnect on wakeup.

Same thing:  an HCD could do the right thing if it was told to go
into D1, D2, or D3hot (supports USB suspend) vs D3cold (doesn't).

Though the PM core doesn't cooperate at all there.  Neither the
suspend nor the resume codepaths cope well with disconnect
(and hence device removal), the PM core self-deadlocks since
suspend/resume outcalls are done while holding the semaphore
that device_pm_remove() needs, ugh.

And FWIW Greg's now merged the CONFIG_USB_SUSPEND code
into his tree, as experimental ... so now all the relevant integration
issues can start to get sorted out.  Eventually, that PM core deadlock
can get fixed.  And remote wakeup needs work ... x86 machines need
a way to tell ACPI to pay attention to PME# wakeups (which Len says
is in some recent ACPI patch), and maybe the PPC/Mac platforms
will need something similar.

 
> > The problem I'm clear on is with PCI suspend, which some
> > earlier driver model PM changes goofed up.  It's trying to
> > pass a system state to driver suspend() methods that are
> > expecting a value appropriate for pci_set_power_state().
> > You're proposing to fix that by changing the call semantics,
> > while I'd rather just see the call site fixed.
> 
> No, I don't agree. It's a driver policy to decide what PCI state
> to use based on the system suspend level.

You've not persuaded me on that point at all ...

Consider:  device PM calls  aren't only made as part of changing
a "system suspend level".  Minimally, there's the ability to suspend
a single device through sysfs.  And in general, we need to be able
to do that directly ... one device suspending **must** be able to
trigger another one suspending, without assuming that every
device on the system is suspending to the same state.

As for example, suspending a USB flash storage device must
be able to suspend its subsidiary storage and block devices,
without necessarily suspending the network link on an adjacent
hub port.  Or a USB port on a camera (or cell phone) that must
act as either host or peripheral (USB OTG) ... at most one of those
controllers should be active at a time.

Consider a system PM policy including "suspend all idle devices".
With N devices supporting only "on" and "suspend" states, that's
something like 2^N system suspend levels.  Or more; most
devices support more than two suspend states (add at least
"off", plus often light weight suspend states).  And N is system-specific,
so there's no way all those system states can be given small
integer values ... much less fit into a "u32 state".

 - Dave

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-07-31 14:23     ` David Brownell
@ 2004-07-31 17:01       ` Oliver Neukum
  2004-07-31 17:51         ` David Brownell
  2004-08-01  0:41         ` David Brownell
  2004-07-31 21:09       ` Benjamin Herrenschmidt
  2004-08-06 20:04       ` Pavel Machek
  2 siblings, 2 replies; 61+ messages in thread
From: Oliver Neukum @ 2004-07-31 17:01 UTC (permalink / raw)
  To: David Brownell
  Cc: Benjamin Herrenschmidt, Pavel Machek, Linux Kernel list,
	Patrick Mochel


> > Disks in general are an example (IDE beeing the one that is currently
> > implemented, but we'll probably have to do the same for SATA and SCSI
> > at one point), you want to spin them off (with proper cache flush
> > etc...) when suspending to RAM, while you don't when suspending to
> > disk, as you really don't want them to be spun up again right away to
> > write the suspend image.
> 
> So suspend-to-RAM more or less matches PCI D3hot, and
> suspend-to-DISK matches PCI D3cold.  If those power states
> were passed to the device suspend(), the disk driver could act
> appropriately.  In my observation, D3cold was never passed
> down, it was always D3hot.

Maybe a better approach would be to describe the required features to
the drivers rather than encoding them in a single integer. Rather
like passing a request that states "lowest power level with device state
retained, must not do DMA, enable remote wake up"
 
[..]
> Though the PM core doesn't cooperate at all there.  Neither the
> suspend nor the resume codepaths cope well with disconnect
> (and hence device removal), the PM core self-deadlocks since
> suspend/resume outcalls are done while holding the semaphore
> that device_pm_remove() needs, ugh.

Shouldn't we deal with this like a failed resume?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-07-31 17:01       ` Oliver Neukum
@ 2004-07-31 17:51         ` David Brownell
  2004-08-01  0:41         ` David Brownell
  1 sibling, 0 replies; 61+ messages in thread
From: David Brownell @ 2004-07-31 17:51 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Benjamin Herrenschmidt, Pavel Machek, Linux Kernel list,
	Patrick Mochel

On Saturday 31 July 2004 10:01, Oliver Neukum wrote:
> 
> > So suspend-to-RAM more or less matches PCI D3hot, and
> > suspend-to-DISK matches PCI D3cold.  If those power states
> > were passed to the device suspend(), the disk driver could act
> > appropriately.  In my observation, D3cold was never passed
> > down, it was always D3hot.
> 
> Maybe a better approach would be to describe the required features to
> the drivers rather than encoding them in a single integer. Rather
> like passing a request that states "lowest power level with device state
> retained, must not do DMA, enable remote wake up"

For PCI devices this is mostly defined by the parameter that says
what PCI power state to enter.  Even still-common "legacy" PCI
devices can basically fit into those definitions (though they may
not handle "low power" states other than "off").

Enabling remote wakeup is somewhat orthogonal.  Network
drivers have separate Wake-On-LAN calls, but that stuff should
arguably be part of the driver model PM framework.  Much of
the Linux PM work seems to be limited to "suspend when laptop
lid shuts, resume when it opens" ... which is a necessary start (one
that doesn't work universally yet either!) but not sufficient.


> [..]
> > Though the PM core doesn't cooperate at all there.  Neither the
> > suspend nor the resume codepaths cope well with disconnect
> > (and hence device removal), the PM core self-deadlocks since
> > suspend/resume outcalls are done while holding the semaphore
> > that device_pm_remove() needs, ugh.
> 
> Shouldn't we deal with this like a failed resume?

That was the first place I kept running into the deadlocks,
and I had to rewrite chunks of the OHCI driver to avoid
them.  The problem is that PM core uses a semaphore to
protect list membership, rather than a spinlock, and
that the semaphore is also trying to protect against more
than one thing at a a time.

- Dave


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-07-31 14:23     ` David Brownell
  2004-07-31 17:01       ` Oliver Neukum
@ 2004-07-31 21:09       ` Benjamin Herrenschmidt
  2004-08-02 16:40         ` David Brownell
  2004-08-06 20:04       ` Pavel Machek
  2 siblings, 1 reply; 61+ messages in thread
From: Benjamin Herrenschmidt @ 2004-07-31 21:09 UTC (permalink / raw)
  To: David Brownell; +Cc: Pavel Machek, Linux Kernel list, Patrick Mochel

On Sun, 2004-08-01 at 00:23, David Brownell wrote:

> So suspend-to-RAM more or less matches PCI D3hot, and
> suspend-to-DISK matches PCI D3cold.  If those power states
> were passed to the device suspend(), the disk driver could act
> appropriately.  In my observation, D3cold was never passed
> down, it was always D3hot.

Not really, they really have nothing to do with the PCI state
at all. Besides, the difference between PCI D3 hot and D3 cold
is dependent on what happens on the mobo after the chip gets
set to D3...

> These look to me like "wrong device-level suspend state" cases.

No. The driver interface provide a semantic that indicates in what
state the system is going to, it's driver policy to translate that into
appropriate actions, eventually involving some bus power state (or not,
for some archs, we'll haev no choice but have some driver level arch
hacks in there anyway, like some macs wanting video chips to be in D2
state etc...)

> > USB is another example. Typically, suspend-to-RAM wants to do a bus
> > suspend, eventually enabling remote wakeup on some devices, and expects
> > to recover the bus on wakeup, while suspend-to-disk is roughtly
> > equivalent to a full shutdown & reconnect on wakeup.
> 
> Same thing:  an HCD could do the right thing if it was told to go
> into D1, D2, or D3hot (supports USB suspend) vs D3cold (doesn't).

And your whole PM code would suddently become PCI specific ...

> Though the PM core doesn't cooperate at all there.  Neither the
> suspend nor the resume codepaths cope well with disconnect
> (and hence device removal), the PM core self-deadlocks since
> suspend/resume outcalls are done while holding the semaphore
> that device_pm_remove() needs, ugh.

That's an old problem, I don't know how to fix this one best in USB,
I've always had problem with the whole PM semaphore for that reasons
among others but I'll let Greg & Patrick find a solution there.

> And FWIW Greg's now merged the CONFIG_USB_SUSPEND code
> into his tree, as experimental ... so now all the relevant integration
> issues can start to get sorted out.  Eventually, that PM core deadlock
> can get fixed.  And remote wakeup needs work ... x86 machines need
> a way to tell ACPI to pay attention to PME# wakeups (which Len says
> is in some recent ACPI patch), and maybe the PPC/Mac platforms
> will need something similar.
> 
>  
> > > The problem I'm clear on is with PCI suspend, which some
> > > earlier driver model PM changes goofed up.  It's trying to
> > > pass a system state to driver suspend() methods that are
> > > expecting a value appropriate for pci_set_power_state().
> > > You're proposing to fix that by changing the call semantics,
> > > while I'd rather just see the call site fixed.
> > 
> > No, I don't agree. It's a driver policy to decide what PCI state
> > to use based on the system suspend level.
> 
> You've not persuaded me on that point at all ...
> 
> Consider:  device PM calls  aren't only made as part of changing
> a "system suspend level".  Minimally, there's the ability to suspend
> a single device through sysfs.  And in general, we need to be able
> to do that directly ... one device suspending **must** be able to
> trigger another one suspending, without assuming that every
> device on the system is suspending to the same state.
> 
> As for example, suspending a USB flash storage device must
> be able to suspend its subsidiary storage and block devices,
> without necessarily suspending the network link on an adjacent
> hub port.  Or a USB port on a camera (or cell phone) that must
> act as either host or peripheral (USB OTG) ... at most one of those
> controllers should be active at a time.

And who resumes it ? Or you just lockup as soon as your VM try to
swap in from the flash ? If it self-wakes up then it's driver internal
policy. Again, the PM interface exposed to the kernel and userland has,
imho, nothing to do with the actualy low level bus states.

> Consider a system PM policy including "suspend all idle devices".
> With N devices supporting only "on" and "suspend" states, that's
> something like 2^N system suspend levels.  Or more; most
> devices support more than two suspend states (add at least
> "off", plus often light weight suspend states).  And N is system-specific,
> so there's no way all those system states can be given small
> integer values ... much less fit into a "u32 state".

And ? that's not the point. "standby" is defined as a kernel PM state
that could be used to implement that. It's per-driver policy to decide
how to react to "standby" of the HW can only do on/off (which usually
turns into go to "off" if the transition back to "on" is fast, or do
nothing if not).

There's also one thing we haven't dealt with is since queues are blocked
etc..., there need to be some explicit action waking things up. We don't
yet have provisions for devices themselves to trigger wakeup, either of
the whole tree, or themselves individually, based on activity. things like
USB remote wakeup usually hook through the firmware/motherboard to trigger
the system wakeup, but all of this is useless for such a standby state
where we actually want a pending request to the disk, for example, to
wakeup the whole subsystem.

Ben.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-07-31 17:01       ` Oliver Neukum
  2004-07-31 17:51         ` David Brownell
@ 2004-08-01  0:41         ` David Brownell
  2004-08-01  1:34           ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 61+ messages in thread
From: David Brownell @ 2004-08-01  0:41 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Benjamin Herrenschmidt, Pavel Machek, Linux Kernel list,
	Patrick Mochel

On Saturday 31 July 2004 10:01, Oliver Neukum wrote:
> 
> Maybe a better approach would be to describe the required features to
> the drivers rather than encoding them in a single integer. Rather
> like passing a request that states "lowest power level with device state
> retained, must not do DMA, enable remote wake up"

A pointer to some sort of struct could be generic and typesafe;
better than an integer or enum.

	// <linux/device.h>
	struct device {
		... unchanged; but fix or remove this:
		struct dev_pm_state *power_state;
		...
	}; 

	// <linux/pm.>
	struct dev_pm_state {
		char *name;
		unsigned number;
		// maybe more
	};
	struct dev_pm_info {
		struct dev_pm_state	*power_state;
		struct dev_pm_state	**supported;
		unsigned could_wakeup:1;
		unsigned should_wakeup:1;
		... plus the rest
	};

There would be bus-specific rules about how those get used,
for example what device power states exist. 

	// <linux/pci.h>
	extern const struct dev_pm_state PCI_D0, PCI_D1, PCI_D2,
				PCI_D3cold, PCI_D3hot;

	... eventually signatures change:
	struct pci_driver {
		int  (*suspend) (struct pci_dev *dev, struct dev_pm_state *state);
		int  (*enable_wake) (struct pci_dev *dev, struct dev_pm *state,
					int enable);
	};
	int pci_set_power_state(struct pci_dev *dev, struct dev_pm *state);
	int pci_enable_wake(struct pci_dev *dev, struct dev_pm *state, int enable);

	// <asm/arch/bus.h>
	... support for whatever non-PCI device states work here
	... maybe even special states for the platform's busses:
	struct mybus_dev_pm_state {
		struct dev_pm_state public;
		// private state:  clock hooks, power switches,
		// linkage to devices sharing clock or power, etc
	};

So for example maybe dev->power.power_state == PCI_D0, and
dev->power->supported == { PCI_D0, PCI_D3hot, PCI_D3cold, NULL }
on some controller, and sysfs would say "D3hot" not "3" (or "2" or
whatever).  Other busses would report different states..

- Dave

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-01  0:41         ` David Brownell
@ 2004-08-01  1:34           ` Benjamin Herrenschmidt
  2004-08-02 16:38             ` David Brownell
  0 siblings, 1 reply; 61+ messages in thread
From: Benjamin Herrenschmidt @ 2004-08-01  1:34 UTC (permalink / raw)
  To: David Brownell
  Cc: Oliver Neukum, Pavel Machek, Linux Kernel list, Patrick Mochel

On Sun, 2004-08-01 at 10:41, David Brownell wrote:
> On Saturday 31 July 2004 10:01, Oliver Neukum wrote:
> > 
> > Maybe a better approach would be to describe the required features to
> > the drivers rather than encoding them in a single integer. Rather
> > like passing a request that states "lowest power level with device state
> > retained, must not do DMA, enable remote wake up"
> 
> A pointer to some sort of struct could be generic and typesafe;
> better than an integer or enum.

Well... if it gets that complicated, drivers will get it wrong...

I'm pretty sure that in real life, drivers only care about 2 states
at this point, which is the ones needed for suspend to disk and suspend
to RAM, the later doing a D3. An assitional "standby" state could be
added later if we want.

If we really want to separate the system "target" state from the device
state, then we could indeed define a structure, but I doubt this is necessary,
I think we can get everything working with the current scheme by just slightly
adjusting the constants to properly differenciate the 2 above defined system
states.

The only thing we need to make sure of at this point is that drivers ignore
states they don't understand so we keep some flexibility to extending the list
of states.

Part of the confusion at this point is that we are playing with 2 different
concepts, which are the driver operating state and the device power state.

System suspend wants all drivers to suspend (freeze activity so that a
consistent state of the driver is kept in memory). It may or may not
be followed by a device power down.

Dynamic per-device power management would alter the device power state,
but without putting the driver into a suspended state (actually, the driver
itself would be responsible to turning the device back on as soon as some kind
of request comes in).

I think we don't need at this point to provide hooks or abstractions to
represent these. Doing so would break everything, we don't need to do it
at this point, we can at least get something reasonably working with our
current scheme. The device power state policy can remain under driver
control imho and don't need to be abstracted. At least not now. Let's get
things working with what we have, for system suspend, and drivers who want
to do dynamic PM can implement it locally.

Ben.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-01  1:34           ` Benjamin Herrenschmidt
@ 2004-08-02 16:38             ` David Brownell
  2004-08-03  0:38               ` Benjamin Herrenschmidt
  2004-08-06 21:21               ` Solving suspend-level confusion Pavel Machek
  0 siblings, 2 replies; 61+ messages in thread
From: David Brownell @ 2004-08-02 16:38 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Oliver Neukum, Pavel Machek, Linux Kernel list, Patrick Mochel

On Saturday 31 July 2004 18:34, Benjamin Herrenschmidt wrote:
> On Sun, 2004-08-01 at 10:41, David Brownell wrote:
> > On Saturday 31 July 2004 10:01, Oliver Neukum wrote:
> > > 
> > > Maybe a better approach would be to describe the required features to
> > > the drivers rather than encoding them in a single integer. Rather
> > > like passing a request that states "lowest power level with device state
> > > retained, must not do DMA, enable remote wake up"
> > 
> > A pointer to some sort of struct could be generic and typesafe;
> > better than an integer or enum.
> 
> Well... if it gets that complicated, drivers will get it wrong...

It's already broken though!   Type-safe calls might at least
trigger compiler warning when folk do things like, for example,
pass a system power policy where a device power policy is
needed.  So long as the API uses integers (or enums), it falls
in the category of "oversimplified, impossible to get right".

But such a massive API change sounds to me like a 2.7
thing.


> I'm pretty sure that in real life, drivers only care about 2 states
> at this point, which is the ones needed for suspend to disk and suspend
> to RAM, the later doing a D3. An assitional "standby" state could be
> added later if we want.

The current problems relate to the fact that some drivers are
bothering to actually look at the parameters they're getting, and
notice that they're not meaningful device power states.

It's not an issue of how many states exist.  For example, the
drivers that care about both D3hot and D3cold are actually
caring about three states (including D0), so they can't rely on
that "all states other than 0 are D3hot" assumption.  And of
course, when D2 is passed for a device that doesn't do D2,
that's a problem in the caller.


> The only thing we need to make sure of at this point is that drivers ignore
> states they don't understand so we keep some flexibility to extending
> the list of states.

That just hides the bug:  upper level code passing nonsense
states down to the driver.


> Part of the confusion at this point is that we are playing with 2 different
> concepts, which are the driver operating state and the device power state.

Make that "3 different concepts":  system suspend state too.
Or maybe "2" is right, since I think you've already said you
define the driver operating state as something that should
be invisible from outside.  (So it's not what I described.)


> System suspend wants all drivers to suspend (freeze activity so that a
> consistent state of the driver is kept in memory). It may or may not
> be followed by a device power down.

I'd say "suspend wants drivers to quiesce" (using a word whose meaning
isn't so overloaded).  But why shouldn't some coprocessors -- bus controller,
DSP, CPU, etc -- continue as active?  At least on some kinds of system.  They
might need to be active in order to serve as a wakeup source.  Not all
Linux systems revolve so exclusively around one CPU that suspending it
CPU means the system should stop operating.  (That does seem to be
an integral part of the ACPI model though; must be part of why I keep
thinking it's the wrong "core" model for Linux PM.)


> Dynamic per-device power management would alter the device power state,
> but without putting the driver into a suspended state (actually, the driver
> itself would be responsible to turning the device back on as soon as some
> kind of request comes in).

There are two models happening here:

 - What you're describing, where this wouldn't be visible from outside
   a driver.  It wouldn't even need to be packaged through PM operations.

 - The scenario I mentioned, where in order for one device to suspend
   (example was usb-storage flash memory) it's got to first suspend a
   few subsidiary devices (SCSI host, SCSI disk, and a block device).

I don't see a good way to apply that first model to the scenario I was
describing.


> I think we don't need at this point to provide hooks or abstractions to
> represent these. Doing so would break everything, we don't need to do it
> at this point, we can at least get something reasonably working with our
> current scheme. The device power state policy can remain under driver
> control imho and don't need to be abstracted.

I don't think we need to add any abstractions, and don't see what your
concern is about "breaking" something (that's already broken).  We
do however need to make the existing abstractions (and APIs) work.

- Dave


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-07-31 21:09       ` Benjamin Herrenschmidt
@ 2004-08-02 16:40         ` David Brownell
  2004-08-03  0:50           ` Benjamin Herrenschmidt
  2004-08-06 21:10           ` Pavel Machek
  0 siblings, 2 replies; 61+ messages in thread
From: David Brownell @ 2004-08-02 16:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Pavel Machek, Linux Kernel list, Patrick Mochel

On Saturday 31 July 2004 14:09, Benjamin Herrenschmidt wrote:
> On Sun, 2004-08-01 at 00:23, David Brownell wrote:
> 
> > So suspend-to-RAM more or less matches PCI D3hot, and
> > suspend-to-DISK matches PCI D3cold.  If those power states
> > were passed to the device suspend(), the disk driver could act
> > appropriately.  In my observation, D3cold was never passed
> > down, it was always D3hot.
> 
> Not really, they really have nothing to do with the PCI state
> at all.

The PCI power state is rather fundamental to what PCI suspend()
does!  Even if you want to redefine the "u32 state" parameter
to be something that's first got to be mapped _into_ a PCI
power state before calling pci_set_power_state().


> Besides, the difference between PCI D3 hot and D3 cold 
> is dependent on what happens on the mobo after the chip gets
> set to D3...

Except then you've deprived the driver up-front of the knowledge
that this device will be fully powered off (D3cold) ... which is what
you were saying (wrongly IMO) could only be provided by passing
a _system_ power state into a _device_ suspend call.  The API is there,
but there's some confusion about what it means ... affecting the
drivers that can suspend() to states other than D3hot.


> > These look to me like "wrong device-level suspend state" cases.
> 
> No. The driver interface provide a semantic that indicates in what
> state the system is going to, it's driver policy to translate that into
> appropriate actions,

How does your answer change when you go down the path I was
describing:  the drivers are passed a device-specific state?  Which for
PCI means no API changes; the drivers already expect to see
the PCI state number.


> > > USB is another example. Typically, suspend-to-RAM wants to do a bus
> > > suspend, eventually enabling remote wakeup on some devices, and expects
> > > to recover the bus on wakeup, while suspend-to-disk is roughtly
> > > equivalent to a full shutdown & reconnect on wakeup.
> > 
> > Same thing:  an HCD could do the right thing if it was told to go
> > into D1, D2, or D3hot (supports USB suspend) vs D3cold (doesn't).
> 
> And your whole PM code would suddently become PCI specific ...

Nope -- only the PCI drivers.  After all, they already have PCI-specific
PM rules they need to follow.  


> > Though the PM core doesn't cooperate at all there.  Neither the
> > suspend nor the resume codepaths cope well with disconnect
> > (and hence device removal), the PM core self-deadlocks since
> > suspend/resume outcalls are done while holding the semaphore
> > that device_pm_remove() needs, ugh.
> 
> That's an old problem, I don't know how to fix this one best in USB,
> I've always had problem with the whole PM semaphore for that reasons
> among others but I'll let Greg & Patrick find a solution there.

Patrick once posted a patch that morphed the semaphore into a
spinlock (in at least some cases), which worked for me, but that
never got merged.


> > > No, I don't agree. It's a driver policy to decide what PCI state
> > > to use based on the system suspend level.
> > 
> > You've not persuaded me on that point at all ...
> > 
> > Consider:  device PM calls  aren't only made as part of changing
> > a "system suspend level".  ...
> 
> And who resumes it ? Or you just lockup as soon as your VM try to
> swap in from the flash ? If it self-wakes up then it's driver internal
> policy. Again, the PM interface exposed to the kernel and userland has,
> imho, nothing to do with the actualy low level bus states.

If you're still using the device for swap, isn't a user error to suspend
the device?  Drivers could certainly have self-resume modes ,and
should use them if they  self-suspend.  But I'd expect a default policy
more like "suspend all idle devices", and anything mounted (for
swap, root fs, etc) wouldn't be idle unless there was a self-suspend
mechanism in some driver layer.  (IDE? SCSI? Block? PCI? USB?)


> > Consider a system PM policy including "suspend all idle devices".
> > With N devices supporting only "on" and "suspend" states, that's
> > something like 2^N system suspend levels.  Or more; most
> > devices support more than two suspend states (add at least
> > "off", plus often light weight suspend states).  And N is system-specific,
> > so there's no way all those system states can be given small
> > integer values ... much less fit into a "u32 state".
> 
> And ? that's not the point. "standby" is defined as a kernel PM state
> that could be used to implement that. It's per-driver policy to decide
> how to react to "standby" of the HW can only do on/off (which usually
> turns into go to "off" if the transition back to "on" is fast, or do
> nothing if not).

Actually no; I was describing what would be an ACPI G0/S0 state
where most devices/drivers are powered down, not the G1/S1 state
described as "suspend".   CPU would be running, for example.  It's
a power management policy, not a suspend policy ... it kicks in
during normal operation.


> There's also one thing we haven't dealt with is since queues are blocked
> etc..., there need to be some explicit action waking things up. We don't
> yet have provisions for devices themselves to trigger wakeup,

The PCI API certainly has hooks, and some network drivers try to
use the Wake-on-LAN capabilities.  So presumably they work in
at least some configurations.


> either of 
> the whole tree, or themselves individually, based on activity. things like
> USB remote wakeup usually hook through the firmware/motherboard to trigger
> the system wakeup, but all of this is useless for such a standby state
> where we actually want a pending request to the disk, for example, to
> wakeup the whole subsystem.

That's the difference between wakeup (== external event trigger,
flipper meets keyboard) and resume (== internal event trigger,
request meets queue).  There's a lot of common code in the
resume() path ... the wakeup "IRQ" just needs to arrange for
something to call resume().

- Dave

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-02 16:38             ` David Brownell
@ 2004-08-03  0:38               ` Benjamin Herrenschmidt
  2004-08-04  2:28                 ` David Brownell
  2004-08-06 21:21               ` Solving suspend-level confusion Pavel Machek
  1 sibling, 1 reply; 61+ messages in thread
From: Benjamin Herrenschmidt @ 2004-08-03  0:38 UTC (permalink / raw)
  To: David Brownell
  Cc: Oliver Neukum, Pavel Machek, Linux Kernel list, Patrick Mochel


> The current problems relate to the fact that some drivers are
> bothering to actually look at the parameters they're getting, and
> notice that they're not meaningful device power states.
> 
> It's not an issue of how many states exist.  For example, the
> drivers that care about both D3hot and D3cold are actually
> caring about three states (including D0), so they can't rely on
> that "all states other than 0 are D3hot" assumption.  And of
> course, when D2 is passed for a device that doesn't do D2,
> that's a problem in the caller.

Nope. I don't think it makes sense to have the caller understand
anything about D states. And their actual impact on hardware tend
to depends from one chip to another too...

> That just hides the bug:  upper level code passing nonsense
> states down to the driver.

Nobody can get a D state right anyway, so it makes no sense to pass
that to drivers. Only the driver itself may know it's hardware well
enough (eventually with platform specific tweaks) to figure out what
D state to apply for a global policy.

What we need to pass to drivers are policies. Right now, we defined
suspend-to-ram and suspend-to-disk system wide policies, we may want
to define some additional ones especially those that can be triggered
by userspace to individual drivers.

> Make that "3 different concepts":  system suspend state too.
> Or maybe "2" is right, since I think you've already said you
> define the driver operating state as something that should
> be invisible from outside.  (So it's not what I described.)

Well, system suspend state too, right, it defines the driver
state we are asking for.

> I'd say "suspend wants drivers to quiesce" (using a word whose meaning
> isn't so overloaded).  But why shouldn't some coprocessors -- bus controller,
> DSP, CPU, etc -- continue as active?  At least on some kinds of system.  They
> might need to be active in order to serve as a wakeup source.  Not all
> Linux systems revolve so exclusively around one CPU that suspending it
> CPU means the system should stop operating.  (That does seem to be
> an integral part of the ACPI model though; must be part of why I keep
> thinking it's the wrong "core" model for Linux PM.)

Sure, and that already works, but it's a per driver policy at this
point, I don't think we can do much better at the driver model level.

> > Dynamic per-device power management would alter the device power state,
> > but without putting the driver into a suspended state (actually, the driver
> > itself would be responsible to turning the device back on as soon as some
> > kind of request comes in).
> 
> There are two models happening here:
> 
>  - What you're describing, where this wouldn't be visible from outside
>    a driver.  It wouldn't even need to be packaged through PM operations.
> 
>  - The scenario I mentioned, where in order for one device to suspend
>    (example was usb-storage flash memory) it's got to first suspend a
>    few subsidiary devices (SCSI host, SCSI disk, and a block device).
> 
> I don't see a good way to apply that first model to the scenario I was
> describing.

I don't understand what you are talking about... in the case of
usb-storage, it has scsi disk as it's child in the device tree,
the child should get the suspend call first of course, and yes, I know
the current sysfs semantics for userland-originated suspend are broken.

Right now, we are trying to at least get the full system suspend
working, we can work on fixing partial-tree suspend (which is what you
want there) later.

> I don't think we need to add any abstractions, and don't see what your
> concern is about "breaking" something (that's already broken).  We
> do however need to make the existing abstractions (and APIs) work.

Yup, but I still consider that it's a non-sense to pass D states to
driver suspend() callback ;)

Ben.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-02 16:40         ` David Brownell
@ 2004-08-03  0:50           ` Benjamin Herrenschmidt
  2004-08-07 23:30             ` David Brownell
  2004-08-06 21:10           ` Pavel Machek
  1 sibling, 1 reply; 61+ messages in thread
From: Benjamin Herrenschmidt @ 2004-08-03  0:50 UTC (permalink / raw)
  To: David Brownell; +Cc: Pavel Machek, Linux Kernel list, Patrick Mochel

On Tue, 2004-08-03 at 02:40, David Brownell wrote:

> The PCI power state is rather fundamental to what PCI suspend()
> does!  Even if you want to redefine the "u32 state" parameter
> to be something that's first got to be mapped _into_ a PCI
> power state before calling pci_set_power_state().

no no no

you ask a driver to quiesce, it's the driver's policy to decide what is
the best suitable HW state based on what system state we are going into
(or what is asked by userland, that is quiesce, shutdown, or whatever
semantics we'll provide once we clanup the sysfs side of things).

only the driver knows what the chip can do and if it can wake it up
from d3 cold at all... though there's always the problem of knowing what
the motherboard will do on system suspend, but that I didn't find a
proper way to represent yet.

> > Besides, the difference between PCI D3 hot and D3 cold 
> > is dependent on what happens on the mobo after the chip gets
> > set to D3...

> Except then you've deprived the driver up-front of the knowledge
> that this device will be fully powered off (D3cold) ... which is what
> you were saying (wrongly IMO) could only be provided by passing
> a _system_ power state into a _device_ suspend call.  The API is there,
> but there's some confusion about what it means ... affecting the
> drivers that can suspend() to states other than D3hot.

But we don't know that at upper level neither. If you read old mailing
list threads, I was actually advocating passing 2 parameters to drivers,
one is what will "happen" to the device (be powered down for example, or
just be unclocked, or nothing) and one be the system state we are
entering. That was rejected back then. At this point, only the platform
code may have a clue about what will be happening.

> > > These look to me like "wrong device-level suspend state" cases.
> > 
> > No. The driver interface provide a semantic that indicates in what
> > state the system is going to, it's driver policy to translate that into
> > appropriate actions,
> 
> How does your answer change when you go down the path I was
> describing:  the drivers are passed a device-specific state?  Which for
> PCI means no API changes; the drivers already expect to see
> the PCI state number.

Change ? How so ? I always said the same thing... and no, that is NOT a
PCI state number, since that does NOT map to system states, (and I also
think it's not a good idea to have the parameters to suspend() be
different for different bus types anyway).

> > > > USB is another example. Typically, suspend-to-RAM wants to do a bus
> > > > suspend, eventually enabling remote wakeup on some devices, and expects
> > > > to recover the bus on wakeup, while suspend-to-disk is roughtly
> > > > equivalent to a full shutdown & reconnect on wakeup.
> > > 
> > > Same thing:  an HCD could do the right thing if it was told to go
> > > into D1, D2, or D3hot (supports USB suspend) vs D3cold (doesn't).
> > 
> > And your whole PM code would suddently become PCI specific ...
> 
> Nope -- only the PCI drivers.  After all, they already have PCI-specific
> PM rules they need to follow.  

At the driver level, not at the upper level.
> 
> > > Though the PM core doesn't cooperate at all there.  Neither the
> > > suspend nor the resume codepaths cope well with disconnect
> > > (and hence device removal), the PM core self-deadlocks since
> > > suspend/resume outcalls are done while holding the semaphore
> > > that device_pm_remove() needs, ugh.
> > 
> > That's an old problem, I don't know how to fix this one best in USB,
> > I've always had problem with the whole PM semaphore for that reasons
> > among others but I'll let Greg & Patrick find a solution there.
> 
> Patrick once posted a patch that morphed the semaphore into a
> spinlock (in at least some cases), which worked for me, but that
> never got merged.

Patrick ? Do you still have that around ?

> If you're still using the device for swap, isn't a user error to suspend
> the device? 

But you don't _know_ unless you are sure you don't have any file mapped
on it whatsoever, no executable running from it, and if you do, your
userland error suddenly stuck the kernel with unkillable processes stuck
in page fault...

>  Drivers could certainly have self-resume modes ,and
> should use them if they  self-suspend.  But I'd expect a default policy
> more like "suspend all idle devices", and anything mounted (for
> swap, root fs, etc) wouldn't be idle unless there was a self-suspend
> mechanism in some driver layer.  (IDE? SCSI? Block? PCI? USB?)

Low level drivers, at this point, don't really know if they are in use
in some cases, do they ? But anyway, to get userland originated suspend
to possibly do anything useful, we first need to fix sysfs so that it
does partial tree suspend and not individual deviecs suspend only.
There's also a semantic breakage between who updates the power state in
struct device, system suspend relies on the driver doing so afaik, while
sysfs originated calls to the driver suspend will do it in sysfs (which
I think is the wrong thing to do).

> Actually no; I was describing what would be an ACPI G0/S0 state
> where most devices/drivers are powered down, not the G1/S1 state
> described as "suspend".   CPU would be running, for example.  It's
> a power management policy, not a suspend policy ... it kicks in
> during normal operation.

And who wakes up the devices in this ACPI state ? (just curious)
> 
> > There's also one thing we haven't dealt with is since queues are blocked
> > etc..., there need to be some explicit action waking things up. We don't
> > yet have provisions for devices themselves to trigger wakeup,
> 
> The PCI API certainly has hooks, and some network drivers try to
> use the Wake-on-LAN capabilities.  So presumably they work in
> at least some configurations.

Yes, but it's not something we have abstracted in any useful way (nor I
think we should try to at this point)
> 
> > either of 
> > the whole tree, or themselves individually, based on activity. things like
> > USB remote wakeup usually hook through the firmware/motherboard to trigger
> > the system wakeup, but all of this is useless for such a standby state
> > where we actually want a pending request to the disk, for example, to
> > wakeup the whole subsystem.
> 
> That's the difference between wakeup (== external event trigger,
> flipper meets keyboard) and resume (== internal event trigger,
> request meets queue).  There's a lot of common code in the
> resume() path ... the wakeup "IRQ" just needs to arrange for
> something to call resume().

Note that there's the same problem here than what we have with sysfs
originated suspend/resume, we need to resume more than one device, but a
whole branch of the PM tree.

Ben.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-04  2:28                 ` David Brownell
@ 2004-08-04  2:26                   ` Nigel Cunningham
  2004-08-04  2:53                     ` Benjamin Herrenschmidt
  2004-08-05  1:29                     ` David Brownell
  2004-08-04  2:56                   ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 61+ messages in thread
From: Nigel Cunningham @ 2004-08-04  2:26 UTC (permalink / raw)
  To: David Brownell
  Cc: Benjamin Herrenschmidt, Oliver Neukum, Pavel Machek,
	Linux Kernel Mailing List, Patrick Mochel

Hi.

On Wed, 2004-08-04 at 12:28, David Brownell wrote:
> > Right now, we are trying to at least get the full system suspend
> > working, we can work on fixing partial-tree suspend (which is what you
> > want there) later.
> 
> The "partial tree suspend" works only for the degenerate
> tree:  every device in the system!  Wakeup events also need
> some more attention.
> 
> There's now some partial-tree code in CONFIG_USB_SUSPEND (for
> developers only), but jumping from USB into the next level driver
> (SCSI, video, etc) raises questions.

I've also done partial-tree support for suspend2 by making a new list
(along side the active, off and off_irq lists) and simply moving devices
I want to keep on (plus their parents) to this list prior to calling
device_suspend. Works well for keeping alive the ide devices being used
write the image.

Regards,

Nigel


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-03  0:38               ` Benjamin Herrenschmidt
@ 2004-08-04  2:28                 ` David Brownell
  2004-08-04  2:26                   ` Nigel Cunningham
  2004-08-04  2:56                   ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 61+ messages in thread
From: David Brownell @ 2004-08-04  2:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Oliver Neukum, Pavel Machek, Linux Kernel list, Patrick Mochel

On Monday 02 August 2004 17:38, Benjamin Herrenschmidt wrote:

> Nope. I don't think it makes sense to have the caller understand
> anything about D states. And their actual impact on hardware tend
> to depends from one chip to another too...

Even if you don't think that, it's what the current PCI drivers are
expecting.  You're arguing to change about a dozen in-tree drivers
(doable), and some unknown number of others.   When I do changes
like that, I _really_ like to change the argument type enough to make
the compiler start reporting broken drivers!

Maybe rather than changing from values 0-4 into values 0-3, it
should change into string constants naming the states ... if you don't
like typed struct pointers for such things!


> What we need to pass to drivers are policies. Right now, we defined
> suspend-to-ram and suspend-to-disk system wide policies, we may want
> to define some additional ones especially those that can be triggered
> by userspace to individual drivers.

/sys/power/state defines also "standby"; these are the PM_SUSPEND_*
enumeration values (other than "on").

But the /sys/bus/*/devices/*/power/state files just read and write numbers.
Those get sent to individual drivers.  Passing (and printing) symbols
would make it more obvious what's going on there.  It'd be good if those
symbols could include bus-specific state names (driver-specific?), as
well as policies like "standby", "mem", and "disk".


> > Make that "3 different concepts":  system suspend state too.
> > Or maybe "2" is right, since I think you've already said you
> > define the driver operating state as something that should
> > be invisible from outside.  (So it's not what I described.)
> 
> Well, system suspend state too, right, it defines the driver
> state we are asking for.

System state != driver state though.   I'd agree that there
can be semi-generic policy names though, which various
software (ACPI, platform- or bus-specific code, even drivers)
could try mapping to hardware-specific states.


> I don't understand what you are talking about... in the case of
> usb-storage, it has scsi disk as it's child in the device tree,
> the child should get the suspend call first of course, and yes, I know
> the current sysfs semantics for userland-originated suspend are broken.

At one point I heard one of those issues described as "semantics are
bus-specific", but that's only part of it... :(


> Right now, we are trying to at least get the full system suspend
> working, we can work on fixing partial-tree suspend (which is what you
> want there) later.

The "partial tree suspend" works only for the degenerate
tree:  every device in the system!  Wakeup events also need
some more attention.

There's now some partial-tree code in CONFIG_USB_SUSPEND (for
developers only), but jumping from USB into the next level driver
(SCSI, video, etc) raises questions.


> > I don't think we need to add any abstractions, and don't see what your
> > concern is about "breaking" something (that's already broken).  We
> > do however need to make the existing abstractions (and APIs) work.
> 
> Yup, but I still consider that it's a non-sense to pass D states to
> driver suspend() callback ;)

Passing PM_SUSPEND_* values (0-3) instead of PCI power states (0-4),
as PCI drivers have previously expected, looks like it ought to be a
simplifying assumption.  Simpler is often good ... though this makes
me suspect "too simple".  Especially since it pushes policy choices
into device drivers (normally not good, except for quirks like "this
driver+hardware can't do D3, so let's use D2")..

- Dave

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-04  2:53                     ` Benjamin Herrenschmidt
@ 2004-08-04  2:52                       ` Nigel Cunningham
  2004-08-04  4:14                         ` Benjamin Herrenschmidt
  2004-08-06 21:26                         ` Pavel Machek
  0 siblings, 2 replies; 61+ messages in thread
From: Nigel Cunningham @ 2004-08-04  2:52 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Brownell, Oliver Neukum, Pavel Machek,
	Linux Kernel Mailing List, Patrick Mochel

Hi.

On Wed, 2004-08-04 at 12:53, Benjamin Herrenschmidt wrote:
> > I've also done partial-tree support for suspend2 by making a new list
> > (along side the active, off and off_irq lists) and simply moving devices
> > I want to keep on (plus their parents) to this list prior to calling
> > device_suspend. Works well for keeping alive the ide devices being used
> > write the image.
> 
> How so ? By not calling suspend for it at all ? That's broken, the
> driver wants suspend to match the resume it will get when the image
> is reloaded, that's the only way the driver can guarantee a sane state
> saved in the suspend image.

Yes, I don't call suspend for it because I can be sure the drivers are
idle (before beginning to write the image, freeze all process, flush all
dirty buffers and suspend all other drivers, I then wait on my own I/O
until it is flushed too). I know it's broken to do so, but it was a good
work around for wearing out the thing by spinning it down and then
immediately spinning it back up, and I wasn't sure what the right state
to try to put it in is (sound familiar?!). If you want to tell me how I
could tell it to quiesce without spin down, I'll happily do that.

The sooner these issues get sorted, the better.

Regards,

Nigel


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-04  2:26                   ` Nigel Cunningham
@ 2004-08-04  2:53                     ` Benjamin Herrenschmidt
  2004-08-04  2:52                       ` Nigel Cunningham
  2004-08-05  1:29                     ` David Brownell
  1 sibling, 1 reply; 61+ messages in thread
From: Benjamin Herrenschmidt @ 2004-08-04  2:53 UTC (permalink / raw)
  To: ncunningham
  Cc: David Brownell, Oliver Neukum, Pavel Machek,
	Linux Kernel Mailing List, Patrick Mochel


> I've also done partial-tree support for suspend2 by making a new list
> (along side the active, off and off_irq lists) and simply moving devices
> I want to keep on (plus their parents) to this list prior to calling
> device_suspend. Works well for keeping alive the ide devices being used
> write the image.

How so ? By not calling suspend for it at all ? That's broken, the
driver wants suspend to match the resume it will get when the image
is reloaded, that's the only way the driver can guarantee a sane state
saved in the suspend image.

Ben.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-04  2:28                 ` David Brownell
  2004-08-04  2:26                   ` Nigel Cunningham
@ 2004-08-04  2:56                   ` Benjamin Herrenschmidt
  2004-08-04  3:30                     ` David Brownell
  1 sibling, 1 reply; 61+ messages in thread
From: Benjamin Herrenschmidt @ 2004-08-04  2:56 UTC (permalink / raw)
  To: David Brownell
  Cc: Oliver Neukum, Pavel Machek, Linux Kernel list, Patrick Mochel


> 
> Passing PM_SUSPEND_* values (0-3) instead of PCI power states (0-4),
> as PCI drivers have previously expected, looks like it ought to be a
> simplifying assumption.  Simpler is often good ... though this makes
> me suspect "too simple".  Especially since it pushes policy choices
> into device drivers (normally not good, except for quirks like "this
> driver+hardware can't do D3, so let's use D2")..

Actually, I took a shortcut with my PPC implementation of swsusp,
which was to tweak the numbering of PM_SUSPEND_* so that

 PM_SUSPEND_STANDBY	= 1
 PM_SUSPEND_MEM		= 3
 PM_SUSPEND_DISK	= 4

Which has the "side effect" of matching S states and mostly D states
with the exception of disk, for the few drivers that care...

But in the long run, this may add confusion instead of clearing things,
I agree we should rather move to completely different types, though I
don't feel like re-typing every callbacks in the tree right now...

Ben.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-04  2:56                   ` Benjamin Herrenschmidt
@ 2004-08-04  3:30                     ` David Brownell
  2004-08-04  4:19                       ` Benjamin Herrenschmidt
  2004-08-04  4:47                       ` What PM should be and do (Was Re: Solving suspend-level confusion) Nigel Cunningham
  0 siblings, 2 replies; 61+ messages in thread
From: David Brownell @ 2004-08-04  3:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Oliver Neukum, Pavel Machek, Linux Kernel list, Patrick Mochel

On Tuesday 03 August 2004 19:56, Benjamin Herrenschmidt wrote:

> Actually, I took a shortcut with my PPC implementation of swsusp,
> which was to tweak the numbering of PM_SUSPEND_* so that
> 
>  PM_SUSPEND_STANDBY	= 1
>  PM_SUSPEND_MEM		= 3
>  PM_SUSPEND_DISK	= 4
> 
> Which has the "side effect" of matching S states and mostly D states
> with the exception of disk, for the few drivers that care...

So long as there's a comment explaining what's going on there
("original PCI PM API compatibility") this wins hugely on expedience!


> But in the long run, this may add confusion instead of clearing things,
> I agree we should rather move to completely different types, though I
> don't feel like re-typing every callbacks in the tree right now...

Me either.  But renumbering the PM_SUSPEND_* values would let folk
start discussing what PM should be (and do) without that particular
pressure.

- Dave


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-04  2:52                       ` Nigel Cunningham
@ 2004-08-04  4:14                         ` Benjamin Herrenschmidt
  2004-08-04  4:25                           ` Nigel Cunningham
  2004-08-06 21:29                           ` Pavel Machek
  2004-08-06 21:26                         ` Pavel Machek
  1 sibling, 2 replies; 61+ messages in thread
From: Benjamin Herrenschmidt @ 2004-08-04  4:14 UTC (permalink / raw)
  To: ncunningham
  Cc: David Brownell, Oliver Neukum, Pavel Machek,
	Linux Kernel Mailing List, Patrick Mochel


> Yes, I don't call suspend for it because I can be sure the drivers are
> idle (before beginning to write the image, freeze all process, flush all
> dirty buffers and suspend all other drivers, I then wait on my own I/O
> until it is flushed too). I know it's broken to do so, but it was a good
> work around for wearing out the thing by spinning it down and then
> immediately spinning it back up, and I wasn't sure what the right state
> to try to put it in is (sound familiar?!). If you want to tell me how I
> could tell it to quiesce without spin down, I'll happily do that.

Very easy... with the current code, just use state 4 for the round
of suspend callbacks, ide-disk will then avoid spinning down.

> The sooner these issues get sorted, the better.
> 
> Regards,
> 
> Nigel
-- 
Benjamin Herrenschmidt <benh@kernel.crashing.org>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-04  3:30                     ` David Brownell
@ 2004-08-04  4:19                       ` Benjamin Herrenschmidt
  2004-08-04  4:47                       ` What PM should be and do (Was Re: Solving suspend-level confusion) Nigel Cunningham
  1 sibling, 0 replies; 61+ messages in thread
From: Benjamin Herrenschmidt @ 2004-08-04  4:19 UTC (permalink / raw)
  To: David Brownell
  Cc: Oliver Neukum, Pavel Machek, Linux Kernel list, Patrick Mochel


> Me either.  But renumbering the PM_SUSPEND_* values would let folk
> start discussing what PM should be (and do) without that particular
> pressure.

Agree... I need to clean up my patch as some of the code in
kernel/power/main.c expect the numbers to be contiguous and I did
ugly hacks in there.

Ben.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-04  4:14                         ` Benjamin Herrenschmidt
@ 2004-08-04  4:25                           ` Nigel Cunningham
  2004-08-04  4:52                             ` Benjamin Herrenschmidt
  2004-08-06 21:29                           ` Pavel Machek
  1 sibling, 1 reply; 61+ messages in thread
From: Nigel Cunningham @ 2004-08-04  4:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Brownell, Oliver Neukum, Pavel Machek,
	Linux Kernel Mailing List, Patrick Mochel

Hi.

On Wed, 2004-08-04 at 14:14, Benjamin Herrenschmidt wrote:
> > If you want to tell me how I
> > could tell it to quiesce without spin down, I'll happily do that.
> 
> Very easy... with the current code, just use state 4 for the round
> of suspend callbacks, ide-disk will then avoid spinning down.

Hmm. That's what I was doing and do do for the remainder of the devices.
Oh well. I'll give it a try again. What would 3 do? (There was a stage
when all three implementations used 3; I've just played sheep in
changing to 4).

It occurs to me that I'm going to need to extend the partial tree
handling anyway. When I get this working, I'm going to want to resume
only the storage devices, and haven't added code for that yet.

Regards,

Nigel


^ permalink raw reply	[flat|nested] 61+ messages in thread

* What PM should be and do (Was Re: Solving suspend-level confusion)
  2004-08-04  3:30                     ` David Brownell
  2004-08-04  4:19                       ` Benjamin Herrenschmidt
@ 2004-08-04  4:47                       ` Nigel Cunningham
  2004-08-04  4:53                         ` Benjamin Herrenschmidt
                                           ` (2 more replies)
  1 sibling, 3 replies; 61+ messages in thread
From: Nigel Cunningham @ 2004-08-04  4:47 UTC (permalink / raw)
  To: David Brownell
  Cc: Benjamin Herrenschmidt, Oliver Neukum, Pavel Machek,
	Linux Kernel Mailing List, Patrick Mochel

Howdy.

On Wed, 2004-08-04 at 13:30, David Brownell wrote:
> Me either.  But renumbering the PM_SUSPEND_* values would let folk
> start discussing what PM should be (and do) without that particular
> pressure.

I'll happily give a go at kicking off a new thread along these lines
(roughly!).

>From my point of view, I really want the core PM code to provide:

- support for telling what class of device a driver is handling (I'm
particularly interested in keeping the keyboard, screen and storage
devices alive while suspending).
- support for state management of part of the tree (I want to put the
other devices to sleep at the start of suspending)
- support for grouping together a bunch of devices into an arbitrary
subtree (quiesce that keyboard, screen and storage device - and their
parents - while I do my lowlevel stuff... okay, now resume them so I can
save the rest of the image)

Perhaps the way to achieve the partial tree stuff is to make the code
for handling device lists more generic, so that there would be groups of
devices and each group has an dpm_active, dpm_off and dpm_off_irq list
of its own. Devices could go into a 'main group' by default, and be
shifted to a different group for operations like the above. Suspending
and resuming then moves the devices within the lists for the group.
Parents would only need to be moved in a case like mine, where the main
group was going to sleep.

As far as PM_SUSPEND_* values go, then, I guess I'd like to see:

PM_SUSPEND_LIVE
PM_SUSPEND_PAUSED (live but not doing or accepting work, state saved in
persistent memory)
PM_SUSPEND_DOWN (paused, maybe powered down)

I'd thus be keeping the keyboard etc between LIVE and PAUSED until the
image is completely written, and putting other devices in DOWN virtually
straight away.

Nigel


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-04  4:25                           ` Nigel Cunningham
@ 2004-08-04  4:52                             ` Benjamin Herrenschmidt
  2004-08-04  4:54                               ` Nigel Cunningham
  0 siblings, 1 reply; 61+ messages in thread
From: Benjamin Herrenschmidt @ 2004-08-04  4:52 UTC (permalink / raw)
  To: ncunningham
  Cc: David Brownell, Oliver Neukum, Pavel Machek,
	Linux Kernel Mailing List, Patrick Mochel


> Hmm. That's what I was doing and do do for the remainder of the devices.
> Oh well. I'll give it a try again. What would 3 do? (There was a stage
> when all three implementations used 3; I've just played sheep in
> changing to 4).

3 would be S3 -> suspend to RAM. We may still want to fix drivers to
pass 4 as a PCI state though ;)

> It occurs to me that I'm going to need to extend the partial tree
> handling anyway. When I get this working, I'm going to want to resume
> only the storage devices, and haven't added code for that yet.
> 
> Regards,
> 
> Nigel
-- 
Benjamin Herrenschmidt <benh@kernel.crashing.org>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: What PM should be and do (Was Re: Solving suspend-level confusion)
  2004-08-04  4:47                       ` What PM should be and do (Was Re: Solving suspend-level confusion) Nigel Cunningham
@ 2004-08-04  4:53                         ` Benjamin Herrenschmidt
  2004-08-04  4:59                           ` Nigel Cunningham
  2004-08-05 18:19                         ` Greg KH
  2004-08-08  0:54                         ` David Brownell
  2 siblings, 1 reply; 61+ messages in thread
From: Benjamin Herrenschmidt @ 2004-08-04  4:53 UTC (permalink / raw)
  To: ncunningham
  Cc: David Brownell, Oliver Neukum, Pavel Machek,
	Linux Kernel Mailing List, Patrick Mochel


> I really want the core PM code to provide:
> 
> - support for telling what class of device a driver is handling (I'm
> particularly interested in keeping the keyboard, screen and storage
> devices alive while suspending).

Well, they have to be suspended some way to keep a consistent state in
the suspend image, at least until the pages are snapshoted... Unless
the driver knows how to deal with an inconsistent state (I'm toying
with that for fbdev at least right now)

> - support for state management of part of the tree (I want to put the
> other devices to sleep at the start of suspending)
> - support for grouping together a bunch of devices into an arbitrary
> subtree (quiesce that keyboard, screen and storage device - and their
> parents - while I do my lowlevel stuff... okay, now resume them so I can
> save the rest of the image)
> 
> Perhaps the way to achieve the partial tree stuff is to make the code
> for handling device lists more generic, so that there would be groups of
> devices and each group has an dpm_active, dpm_off and dpm_off_irq list
> of its own. Devices could go into a 'main group' by default, and be
> shifted to a different group for operations like the above. Suspending
> and resuming then moves the devices within the lists for the group.
> Parents would only need to be moved in a case like mine, where the main
> group was going to sleep.
> 
> As far as PM_SUSPEND_* values go, then, I guess I'd like to see:
> 
> PM_SUSPEND_LIVE
> PM_SUSPEND_PAUSED (live but not doing or accepting work, state saved in
> persistent memory)
> PM_SUSPEND_DOWN (paused, maybe powered down)
> 
> I'd thus be keeping the keyboard etc between LIVE and PAUSED until the
> image is completely written, and putting other devices in DOWN virtually
> straight away.
> 
> Nigel
-- 
Benjamin Herrenschmidt <benh@kernel.crashing.org>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-04  4:52                             ` Benjamin Herrenschmidt
@ 2004-08-04  4:54                               ` Nigel Cunningham
  2004-08-04  5:03                                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 61+ messages in thread
From: Nigel Cunningham @ 2004-08-04  4:54 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Brownell, Oliver Neukum, Pavel Machek,
	Linux Kernel Mailing List, Patrick Mochel

Hi again.

On Wed, 2004-08-04 at 14:52, Benjamin Herrenschmidt wrote:
> > Hmm. That's what I was doing and do do for the remainder of the devices.
> > Oh well. I'll give it a try again. What would 3 do? (There was a stage
> > when all three implementations used 3; I've just played sheep in
> > changing to 4).
> 
> 3 would be S3 -> suspend to RAM. We may still want to fix drivers to
> pass 4 as a PCI state though ;)

Okay. Wasn't sure whether it was D3 or S3 or something-else-3! I take it
the ide driver does the same thing for S3 and S4?

Nigel


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: What PM should be and do (Was Re: Solving suspend-level confusion)
  2004-08-04  4:53                         ` Benjamin Herrenschmidt
@ 2004-08-04  4:59                           ` Nigel Cunningham
  2004-08-08 16:54                             ` Pavel Machek
  0 siblings, 1 reply; 61+ messages in thread
From: Nigel Cunningham @ 2004-08-04  4:59 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Brownell, Oliver Neukum, Pavel Machek,
	Linux Kernel Mailing List, Patrick Mochel

Hi.

On Wed, 2004-08-04 at 14:53, Benjamin Herrenschmidt wrote:
> > I really want the core PM code to provide:
> > 
> > - support for telling what class of device a driver is handling (I'm
> > particularly interested in keeping the keyboard, screen and storage
> > devices alive while suspending).
> 
> Well, they have to be suspended some way to keep a consistent state in
> the suspend image, at least until the pages are snapshoted... Unless
> the driver knows how to deal with an inconsistent state (I'm toying
> with that for fbdev at least right now)

Yes. I'm not trying to give drivers an inconsistent state, just delaying
suspending some until the last minute....

Suspend 2 algorithm:

1. Prepare image (freeze processes, allocate memory, eat memory etc)
2. Power down all drivers not used while writing image
3. Write LRU pages. ('pageset 2')
4. Quiesce remaining drivers, save CPU state, to atomic copy of
remaining ram.
5. Resume quiesced drivers.
6. Write atomic copy.
7. Power down used drivers.
8. Enter S4 if ACPI enabled; otherwise reboot or power down.

Nigel


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-04  4:54                               ` Nigel Cunningham
@ 2004-08-04  5:03                                 ` Benjamin Herrenschmidt
  2004-08-04  5:05                                   ` Nigel Cunningham
                                                     ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Benjamin Herrenschmidt @ 2004-08-04  5:03 UTC (permalink / raw)
  To: ncunningham
  Cc: David Brownell, Oliver Neukum, Pavel Machek,
	Linux Kernel Mailing List, Patrick Mochel

On Wed, 2004-08-04 at 14:54, Nigel Cunningham wrote:
> Hi again.
> 
> On Wed, 2004-08-04 at 14:52, Benjamin Herrenschmidt wrote:
> > > Hmm. That's what I was doing and do do for the remainder of the devices.
> > > Oh well. I'll give it a try again. What would 3 do? (There was a stage
> > > when all three implementations used 3; I've just played sheep in
> > > changing to 4).
> > 
> > 3 would be S3 -> suspend to RAM. We may still want to fix drivers to
> > pass 4 as a PCI state though ;)
> 
> Okay. Wasn't sure whether it was D3 or S3 or something-else-3! I take it
> the ide driver does the same thing for S3 and S4?

That's where the whole confusion is indeed... and why we need to make
that clear. The IDE driver will sleep the disk for 3 and keep it spinning
for 4

Ben.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-04  5:03                                 ` Benjamin Herrenschmidt
@ 2004-08-04  5:05                                   ` Nigel Cunningham
  2004-08-05 10:05                                   ` Nigel Cunningham
  2004-08-05 22:31                                   ` Nigel Cunningham
  2 siblings, 0 replies; 61+ messages in thread
From: Nigel Cunningham @ 2004-08-04  5:05 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Brownell, Oliver Neukum, Pavel Machek,
	Linux Kernel Mailing List, Patrick Mochel

Hi.

On Wed, 2004-08-04 at 15:03, Benjamin Herrenschmidt wrote:
> On Wed, 2004-08-04 at 14:54, Nigel Cunningham wrote:
> > On Wed, 2004-08-04 at 14:52, Benjamin Herrenschmidt wrote:
> > > > Hmm. That's what I was doing and do do for the remainder of the devices.
> > > > Oh well. I'll give it a try again. What would 3 do? (There was a stage
> > > > when all three implementations used 3; I've just played sheep in
> > > > changing to 4).
> > > 
> > > 3 would be S3 -> suspend to RAM. We may still want to fix drivers to
> > > pass 4 as a PCI state though ;)
> > 
> > Okay. Wasn't sure whether it was D3 or S3 or something-else-3! I take it
> > the ide driver does the same thing for S3 and S4?
> 
> That's where the whole confusion is indeed... and why we need to make
> that clear. The IDE driver will sleep the disk for 3 and keep it spinning
> for 4

Okee doke. Maybe I did the partial tree code before the switch from 3 to
4.

Nigel


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-04  2:26                   ` Nigel Cunningham
  2004-08-04  2:53                     ` Benjamin Herrenschmidt
@ 2004-08-05  1:29                     ` David Brownell
  2004-08-05 10:19                       ` Nigel Cunningham
  1 sibling, 1 reply; 61+ messages in thread
From: David Brownell @ 2004-08-05  1:29 UTC (permalink / raw)
  To: ncunningham
  Cc: Benjamin Herrenschmidt, Oliver Neukum, Pavel Machek,
	Linux Kernel Mailing List, Patrick Mochel

On Tuesday 03 August 2004 19:26, Nigel Cunningham wrote:

> > There's now some partial-tree code in CONFIG_USB_SUSPEND (for
> > developers only), but jumping from USB into the next level driver
> > (SCSI, video, etc) raises questions.
> 
> I've also done partial-tree support for suspend2 by making a new list
> (along side the active, off and off_irq lists) and simply moving devices
> I want to keep on (plus their parents) to this list prior to calling
> device_suspend. Works well for keeping alive the ide devices being used
> write the image.

What I'd need out of the PM framework would be "suspend this subtree",
and its cousin "resume this subtree".  Where the subtree starts with a
given device ... and, if it's got a driver, any abstract devices created
by that driver.  (And their children, etc.)

I'm not sure what to think about the desire of "suspend2" to prevent
a subtree from suspending.  In fact, I'm not at all sure how to even
interpret a "can't suspend" failure code... device in trouble, likely.

- Dave


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-04  5:03                                 ` Benjamin Herrenschmidt
  2004-08-04  5:05                                   ` Nigel Cunningham
@ 2004-08-05 10:05                                   ` Nigel Cunningham
  2004-08-05 22:31                                   ` Nigel Cunningham
  2 siblings, 0 replies; 61+ messages in thread
From: Nigel Cunningham @ 2004-08-05 10:05 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Brownell, Oliver Neukum, Pavel Machek,
	Linux Kernel Mailing List, Patrick Mochel

Hi.

On Wed, 2004-08-04 at 15:03, Benjamin Herrenschmidt wrote:
> That's where the whole confusion is indeed... and why we need to make
> that clear. The IDE driver will sleep the disk for 3 and keep it spinning
> for 4

Discovered the problem; another layer of confusion! I was using
PM_SUSPEND_DISK, which actually evaluates to 3, not 4 as I thought. Now
that I'm using the literal 4, it works fine.

Nigel


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-05  1:29                     ` David Brownell
@ 2004-08-05 10:19                       ` Nigel Cunningham
  2004-08-06  0:32                         ` David Brownell
  0 siblings, 1 reply; 61+ messages in thread
From: Nigel Cunningham @ 2004-08-05 10:19 UTC (permalink / raw)
  To: David Brownell
  Cc: Benjamin Herrenschmidt, Oliver Neukum, Pavel Machek,
	Linux Kernel Mailing List, Patrick Mochel

Hi.

On Thu, 2004-08-05 at 11:29, David Brownell wrote:
> On Tuesday 03 August 2004 19:26, Nigel Cunningham wrote:
> 
> > > There's now some partial-tree code in CONFIG_USB_SUSPEND (for
> > > developers only), but jumping from USB into the next level driver
> > > (SCSI, video, etc) raises questions.
> > 
> > I've also done partial-tree support for suspend2 by making a new list
> > (along side the active, off and off_irq lists) and simply moving devices
> > I want to keep on (plus their parents) to this list prior to calling
> > device_suspend. Works well for keeping alive the ide devices being used
> > write the image.
> 
> What I'd need out of the PM framework would be "suspend this subtree",
> and its cousin "resume this subtree".  Where the subtree starts with a
> given device ... and, if it's got a driver, any abstract devices created
> by that driver.  (And their children, etc.)

I've just finished implementing a 'suspend this subtree' patch, which
I'll happily post for comments/testing. I'll try to do that shortly.

> I'm not sure what to think about the desire of "suspend2" to prevent
> a subtree from suspending.  In fact, I'm not at all sure how to even
> interpret a "can't suspend" failure code... device in trouble, likely.

I didn't implement it as devices saying 'can't suspend'. Instead, I
added a layer of abstraction. Imagine the current dpm_active, dpm_off
and dpm_off_irq lists as representing the states of a subtree of the
devices. We pop them into a struct partial_device_tree (device_subtree
better?) and provide facilities for creating and destroying new struct
partial_device_trees, moving a-device-and-its-parents between trees and
merging trees back. We also adjust the current suspend, resume, power-up
and power-down routines so they're tree-aware and set up a
'default_device_tree' that the devices sit in for normal operation.
That's what my patch does. I kept the existing api untouched so that:

device_resume();

is actually a wrapper for

device_resume_tree(&default_device_tree);

Proof of the pudding coming :>

Nigel


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: What PM should be and do (Was Re: Solving suspend-level confusion)
  2004-08-04  4:47                       ` What PM should be and do (Was Re: Solving suspend-level confusion) Nigel Cunningham
  2004-08-04  4:53                         ` Benjamin Herrenschmidt
@ 2004-08-05 18:19                         ` Greg KH
  2004-08-05 22:14                           ` Nigel Cunningham
  2004-08-08  0:54                         ` David Brownell
  2 siblings, 1 reply; 61+ messages in thread
From: Greg KH @ 2004-08-05 18:19 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: David Brownell, Benjamin Herrenschmidt, Oliver Neukum,
	Pavel Machek, Linux Kernel Mailing List, Patrick Mochel

On Wed, Aug 04, 2004 at 02:47:52PM +1000, Nigel Cunningham wrote:
> - support for telling what class of device a driver is handling (I'm
> particularly interested in keeping the keyboard, screen and storage
> devices alive while suspending).

You can see that info today from userspace by looking in
/sys/class/input, /sys/class/graphics, and /sys/block

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: What PM should be and do (Was Re: Solving suspend-level confusion)
  2004-08-05 18:19                         ` Greg KH
@ 2004-08-05 22:14                           ` Nigel Cunningham
  2004-08-07  0:08                             `  Éric Brunet
  2004-08-11 21:23                             ` Greg KH
  0 siblings, 2 replies; 61+ messages in thread
From: Nigel Cunningham @ 2004-08-05 22:14 UTC (permalink / raw)
  To: Greg KH
  Cc: David Brownell, Benjamin Herrenschmidt, Oliver Neukum,
	Pavel Machek, Linux Kernel Mailing List, Patrick Mochel

Hi.

On Fri, 2004-08-06 at 04:19, Greg KH wrote:
> On Wed, Aug 04, 2004 at 02:47:52PM +1000, Nigel Cunningham wrote:
> > - support for telling what class of device a driver is handling (I'm
> > particularly interested in keeping the keyboard, screen and storage
> > devices alive while suspending).
> 
> You can see that info today from userspace by looking in
> /sys/class/input, /sys/class/graphics, and /sys/block

Does this apply to all devices, and how do I tell it 'programmatically'?
Ie, is the class an element in a struct somewhere?

Regards,

Nigel


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-04  5:03                                 ` Benjamin Herrenschmidt
  2004-08-04  5:05                                   ` Nigel Cunningham
  2004-08-05 10:05                                   ` Nigel Cunningham
@ 2004-08-05 22:31                                   ` Nigel Cunningham
  2004-08-06  0:39                                     ` Benjamin Herrenschmidt
  2004-08-06 21:30                                     ` Pavel Machek
  2 siblings, 2 replies; 61+ messages in thread
From: Nigel Cunningham @ 2004-08-05 22:31 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Brownell, Oliver Neukum, Pavel Machek,
	Linux Kernel Mailing List, Patrick Mochel

Hi again.

On Wed, 2004-08-04 at 15:03, Benjamin Herrenschmidt wrote:
> That's where the whole confusion is indeed... and why we need to make
> that clear. The IDE driver will sleep the disk for 3 and keep it spinning
> for 4

Okay. So that leaves me calling device_suspend(4) when I want to quiesce
the driver but not spin down and device_suspend(3) when I want to power
down. Does that sound right? It sounds hairy to me. (Do other drivers
treat 3 and 4 in the same way?)

Nigel


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-05 10:19                       ` Nigel Cunningham
@ 2004-08-06  0:32                         ` David Brownell
       [not found]                           ` <1091772799.2532.50.camel@laptop.cunninghams>
  0 siblings, 1 reply; 61+ messages in thread
From: David Brownell @ 2004-08-06  0:32 UTC (permalink / raw)
  To: ncunningham
  Cc: Benjamin Herrenschmidt, Oliver Neukum, Pavel Machek,
	Linux Kernel Mailing List, Patrick Mochel

On Thursday 05 August 2004 03:19, Nigel Cunningham wrote:

> That's what my patch does. I kept the existing api untouched so that:
> 
> device_resume();
> 
> is actually a wrapper for
> 
> device_resume_tree(&default_device_tree);
> 
> Proof of the pudding coming :>

Sounds good.  Will it be possible to remove devices during
these tree operations?  Probably never the current one.

And (evil chuckle) how will it behave if two tasks are doing
that concurrently?  The no-overlap case would be fully
parallel, I'd hope!

- Dave

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-05 22:31                                   ` Nigel Cunningham
@ 2004-08-06  0:39                                     ` Benjamin Herrenschmidt
  2004-08-06 21:30                                     ` Pavel Machek
  1 sibling, 0 replies; 61+ messages in thread
From: Benjamin Herrenschmidt @ 2004-08-06  0:39 UTC (permalink / raw)
  To: ncunningham
  Cc: David Brownell, Oliver Neukum, Pavel Machek,
	Linux Kernel Mailing List, Patrick Mochel

On Fri, 2004-08-06 at 08:31, Nigel Cunningham wrote:
> Hi again.
> 
> On Wed, 2004-08-04 at 15:03, Benjamin Herrenschmidt wrote:
> > That's where the whole confusion is indeed... and why we need to make
> > that clear. The IDE driver will sleep the disk for 3 and keep it spinning
> > for 4
> 
> Okay. So that leaves me calling device_suspend(4) when I want to quiesce
> the driver but not spin down and device_suspend(3) when I want to power
> down. Does that sound right? It sounds hairy to me. (Do other drivers
> treat 3 and 4 in the same way?)

Well, or just continue using the PM_ constants and change them to 1,3,4
like I did. It's the simplest way at this point imho.

Ben.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-07-31 14:23     ` David Brownell
  2004-07-31 17:01       ` Oliver Neukum
  2004-07-31 21:09       ` Benjamin Herrenschmidt
@ 2004-08-06 20:04       ` Pavel Machek
  2004-08-07 22:14         ` David Brownell
  2 siblings, 1 reply; 61+ messages in thread
From: Pavel Machek @ 2004-08-06 20:04 UTC (permalink / raw)
  To: David Brownell; +Cc: Benjamin Herrenschmidt, Linux Kernel list, Patrick Mochel

Hi!

> > Disks in general are an example (IDE beeing the one that is currently
> > implemented, but we'll probably have to do the same for SATA and SCSI
> > at one point), you want to spin them off (with proper cache flush
> > etc...) when suspending to RAM, while you don't when suspending to
> > disk, as you really don't want them to be spun up again right away to
> > write the suspend image.
> 
> So suspend-to-RAM more or less matches PCI D3hot, and
> suspend-to-DISK matches PCI D3cold.  If those power states
> were passed to the device suspend(), the disk driver could act
> appropriately.  In my observation, D3cold was never passed
> down, it was always D3hot.
> 
> These look to me like "wrong device-level suspend state" cases.

Actually, suspend-to-disk has to suspend all devices *twices*. Once it
wants them in "D0 but DMA/interrupts stopped", and once in "D3cold but
I do not really care power is going to be cut anyway". I do not think
this can be expressed with PCI states.
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-02 16:40         ` David Brownell
  2004-08-03  0:50           ` Benjamin Herrenschmidt
@ 2004-08-06 21:10           ` Pavel Machek
  2004-08-07 23:23             ` David Brownell
  1 sibling, 1 reply; 61+ messages in thread
From: Pavel Machek @ 2004-08-06 21:10 UTC (permalink / raw)
  To: David Brownell; +Cc: Benjamin Herrenschmidt, Linux Kernel list, Patrick Mochel

Hi!

> > > So suspend-to-RAM more or less matches PCI D3hot, and
> > > suspend-to-DISK matches PCI D3cold.  If those power states
> > > were passed to the device suspend(), the disk driver could act
> > > appropriately.  In my observation, D3cold was never passed
> > > down, it was always D3hot.
> > 
> > Not really, they really have nothing to do with the PCI state
> > at all.
> 
> The PCI power state is rather fundamental to what PCI suspend()
> does!  Even if you want to redefine the "u32 state" parameter
> to be something that's first got to be mapped _into_ a PCI
> power state before calling pci_set_power_state().

For example harddrive wants to be in D0 during reboot. Now imagine
that BIOS can't handle soundcard in D0 during reboot (I believe some
such examples exists).... You only have D0, D1, D2, D3hot, D3cold; but
low-level driver really needs more info.

Now, you could have quirks at PCI layer saying "this device needs to
be D3 during reboot", but I believe thats ugly.

I believe right solution is to pass system state into suspend(), and
provide system_to_pci_state(), so that drivers that only care about
PCI state can use.

If you are right, every PCI driver will use system_to_pci_state.. but
I do not think that's the case. For example:

static void ide_device_shutdown(struct device *dev)
{
        ide_drive_t *drive = container_of(dev, ide_drive_t, gendev);

#ifdef  CONFIG_ALPHA
        /* On Alpha, halt(8) doesn't actually turn the machine off,
           it puts you into the sort of firmware monitor. Typically,
           it's used to boot another kernel image, so it's not much
           different from reboot(8). Therefore, we don't need to
           spin down the disk in this case, especially since Alpha
           firmware doesn't handle disks in standby mode properly.
           On the other hand, it's reasonably safe to turn the power
           off when the shutdown process reaches the firmware prompt,
           as the firmware initialization takes rather long time -
           at least 10 seconds, which should be sufficient for
           the disk to expire its write cache. */
        if (system_state != SYSTEM_POWER_OFF) {
#else
        if (system_state == SYSTEM_RESTART) {
#endif

...drivers are weird. Please provide them with full information.

								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-02 16:38             ` David Brownell
  2004-08-03  0:38               ` Benjamin Herrenschmidt
@ 2004-08-06 21:21               ` Pavel Machek
  1 sibling, 0 replies; 61+ messages in thread
From: Pavel Machek @ 2004-08-06 21:21 UTC (permalink / raw)
  To: David Brownell
  Cc: Benjamin Herrenschmidt, Oliver Neukum, Linux Kernel list,
	Patrick Mochel

Hi!

> > > > Maybe a better approach would be to describe the required features to
> > > > the drivers rather than encoding them in a single integer. Rather
> > > > like passing a request that states "lowest power level with device state
> > > > retained, must not do DMA, enable remote wake up"
> > > 
> > > A pointer to some sort of struct could be generic and typesafe;
> > > better than an integer or enum.
> > 
> > Well... if it gets that complicated, drivers will get it wrong...
> 
> It's already broken though!   Type-safe calls might at least
> trigger compiler warning when folk do things like, for example,
> pass a system power policy where a device power policy is
> needed.  So long as the API uses integers (or enums), it falls
> in the category of "oversimplified, impossible to get right".
> 
> But such a massive API change sounds to me like a 2.7
> thing.

Good.. So, for 2.6, can we agree on doing "enum xxx" thing? That at
least serves as a documentation. We could teach sparse to check for
it... (Anything that does 

enum foo x;
enum bar y = x;

is broken. If gcc does not warn about this, we should fix gcc...)

I believe "enum system_state" passed around, even to PCI drivers, and 
enum pci_state to_pci_state(enum system_state) helper is the right
thing to do for 2.6. Can we agree on that? 
								Pavel

-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-04  2:52                       ` Nigel Cunningham
  2004-08-04  4:14                         ` Benjamin Herrenschmidt
@ 2004-08-06 21:26                         ` Pavel Machek
  1 sibling, 0 replies; 61+ messages in thread
From: Pavel Machek @ 2004-08-06 21:26 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Benjamin Herrenschmidt, David Brownell, Oliver Neukum,
	Linux Kernel Mailing List, Patrick Mochel

Hi!

> > > I've also done partial-tree support for suspend2 by making a new list
> > > (along side the active, off and off_irq lists) and simply moving devices
> > > I want to keep on (plus their parents) to this list prior to calling
> > > device_suspend. Works well for keeping alive the ide devices being used
> > > write the image.
> > 
> > How so ? By not calling suspend for it at all ? That's broken, the
> > driver wants suspend to match the resume it will get when the image
> > is reloaded, that's the only way the driver can guarantee a sane state
> > saved in the suspend image.
> 
> Yes, I don't call suspend for it because I can be sure the drivers are
> idle (before beginning to write the image, freeze all process, flush all
> dirty buffers and suspend all other drivers, I then wait on my own I/O
> until it is flushed too). I know it's broken to do so, but it was a good
> work around for wearing out the thing by spinning it down and then
> immediately spinning it back up, and I wasn't sure what the right state
> to try to put it in is (sound familiar?!). If you want to tell me how I
> could tell it to quiesce without spin down, I'll happily do that.
> 
> The sooner these issues get sorted, the better.

Hmm, I can't agree more... Yesterday was too late...
								Pavel

-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-04  4:14                         ` Benjamin Herrenschmidt
  2004-08-04  4:25                           ` Nigel Cunningham
@ 2004-08-06 21:29                           ` Pavel Machek
  2004-08-06 22:27                             ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 61+ messages in thread
From: Pavel Machek @ 2004-08-06 21:29 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: ncunningham, David Brownell, Oliver Neukum,
	Linux Kernel Mailing List, Patrick Mochel

Hi!

> > Yes, I don't call suspend for it because I can be sure the drivers are
> > idle (before beginning to write the image, freeze all process, flush all
> > dirty buffers and suspend all other drivers, I then wait on my own I/O
> > until it is flushed too). I know it's broken to do so, but it was a good
> > work around for wearing out the thing by spinning it down and then
> > immediately spinning it back up, and I wasn't sure what the right state
> > to try to put it in is (sound familiar?!). If you want to tell me how I
> > could tell it to quiesce without spin down, I'll happily do that.
> 
> Very easy... with the current code, just use state 4 for the round
> of suspend callbacks, ide-disk will then avoid spinning down.

There are some network drivers that test for "4" and fails suspend
with something like "invalid suspend state" :-(.

								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-05 22:31                                   ` Nigel Cunningham
  2004-08-06  0:39                                     ` Benjamin Herrenschmidt
@ 2004-08-06 21:30                                     ` Pavel Machek
  1 sibling, 0 replies; 61+ messages in thread
From: Pavel Machek @ 2004-08-06 21:30 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Benjamin Herrenschmidt, David Brownell, Oliver Neukum,
	Linux Kernel Mailing List, Patrick Mochel

Hi!

> > That's where the whole confusion is indeed... and why we need to make
> > that clear. The IDE driver will sleep the disk for 3 and keep it spinning
> > for 4
> 
> Okay. So that leaves me calling device_suspend(4) when I want to quiesce
> the driver but not spin down and device_suspend(3) when I want to power
> down. Does that sound right? It sounds hairy to me. (Do other drivers
> treat 3 and 4 in the same way?)

This really needs to be cleaned up properly. What you do is pretty
much okay, but we need to switch to enums or char * or something and
kill the confusion.
								Pavel 
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-06 21:29                           ` Pavel Machek
@ 2004-08-06 22:27                             ` Benjamin Herrenschmidt
  2004-08-06 22:37                               ` Pavel Machek
  0 siblings, 1 reply; 61+ messages in thread
From: Benjamin Herrenschmidt @ 2004-08-06 22:27 UTC (permalink / raw)
  To: Pavel Machek
  Cc: ncunningham, David Brownell, Oliver Neukum,
	Linux Kernel Mailing List, Patrick Mochel


> > Very easy... with the current code, just use state 4 for the round
> > of suspend callbacks, ide-disk will then avoid spinning down.
> 
> There are some network drivers that test for "4" and fails suspend
> with something like "invalid suspend state" :-(.

Easily fixed. Again, i'm not afraid of fixing driver, few enough of
them care at all at this point. I'll send some patches this week-end
to patrick for his bk tree adding the basic ppc support and renumbering
the PM callbacks, I havne't changed the type yet though, that's a more
tedious work and I'm a lazy guy ;)

I have taken care of various fbdev's too, though for some like atyfb,
I'm blocked until the new rewritten version gets upstream. I'm trying
to get it to -mm at least, news soon on this front.

I'll do other drivers asap.

Ben.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-06 22:27                             ` Benjamin Herrenschmidt
@ 2004-08-06 22:37                               ` Pavel Machek
  0 siblings, 0 replies; 61+ messages in thread
From: Pavel Machek @ 2004-08-06 22:37 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: ncunningham, David Brownell, Oliver Neukum,
	Linux Kernel Mailing List, Patrick Mochel

Hi!

> > > Very easy... with the current code, just use state 4 for the round
> > > of suspend callbacks, ide-disk will then avoid spinning down.
> > 
> > There are some network drivers that test for "4" and fails suspend
> > with something like "invalid suspend state" :-(.
> 
> Easily fixed. Again, i'm not afraid of fixing driver, few enough of
> them care at all at this point. I'll send some patches this week-end
> to patrick for his bk tree adding the basic ppc support and renumbering
> the PM callbacks, I havne't changed the type yet though, that's a more
> tedious work and I'm a lazy guy ;)

Well, changing "int" to "enum something" as you go might serve you as
"I've already fixed this one" marker ;-). And as it does not even
create a warning, it might be feasible to do incrementally...

Or even typedef suspend_state_t u32...

								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: What PM should be and do (Was Re: Solving suspend-level confusion)
  2004-08-05 22:14                           ` Nigel Cunningham
@ 2004-08-07  0:08                             `  Éric Brunet
  2004-08-08 19:48                               ` Pavel Machek
  2004-08-11 21:23                             ` Greg KH
  1 sibling, 1 reply; 61+ messages in thread
From:  Éric Brunet @ 2004-08-07  0:08 UTC (permalink / raw)
  To: linux-kernel

Dans ens.mailing-lists.linux-kernel, vous avez écrit :
>> > - support for telling what class of device a driver is handling (I'm
>> > particularly interested in keeping the keyboard, screen and storage
>> > devices alive while suspending).
>> 
>> You can see that info today from userspace by looking in
>> /sys/class/input, /sys/class/graphics, and /sys/block

It is a minor point, but as many people are working on swsuspend right
now, I thought I'd mentionned it. It seems (as of 2.6.8.rc1) that the
screen is not shut down or put in a low power state when suspending to
disk.

I guess that for 99.5 % of the population, it is not an issue as the
monitor is usually plugged in the power supply of the computer and
power is cut when the computer shuts down. My monitor, however, is
directly plugged in the mains outlet and, after a suspend to disk, it
displays indefinitely an information box stating that it has no video
signal coming in.

The X server knows how to shutdown (DPMS) the screen afer some
inactivity, so I guess the kernel could do that while suspending. And it
would be very nice if it would. But I believe there is no device
driver handling the monitor, so I don't know where to do it.

Éric Brunet

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-06 20:04       ` Pavel Machek
@ 2004-08-07 22:14         ` David Brownell
  2004-08-07 23:53           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 61+ messages in thread
From: David Brownell @ 2004-08-07 22:14 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Benjamin Herrenschmidt, Linux Kernel list, Patrick Mochel

On Friday 06 August 2004 13:04, Pavel Machek wrote:

> > These look to me like "wrong device-level suspend state" cases.
> 
> Actually, suspend-to-disk has to suspend all devices *twices*. Once it
> wants them in "D0 but DMA/interrupts stopped", and once in "D3cold but
> I do not really care power is going to be cut anyway". I do not think
> this can be expressed with PCI states.

How are those different from "PCI_D1" then later "PCI_D3hot"?
I'd understood that loss of VAUX was always possible, so robust
drivers always had to handle resume  from PCI_D3cold.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
       [not found]                           ` <1091772799.2532.50.camel@laptop.cunninghams>
@ 2004-08-07 22:24                             ` David Brownell
  0 siblings, 0 replies; 61+ messages in thread
From: David Brownell @ 2004-08-07 22:24 UTC (permalink / raw)
  To: ncunningham
  Cc: Benjamin Herrenschmidt, Oliver Neukum, Pavel Machek,
	Linux Kernel Mailing List, Patrick Mochel

On Thursday 05 August 2004 23:13, Nigel Cunningham wrote:

> > > device_resume_tree(&default_device_tree);
> > > 
> > > Proof of the pudding coming :>
> > 
> > Sounds good.  Will it be possible to remove devices during
> > these tree operations?  Probably never the current one.
> 
> Ummm. I suppose so. It's only affecting the PM section and not the
> device tree proper, so I don't see why it should cause any failures.

If you're not changing dpm_sem usage, it's a self-deadlock
situation.  I guess that bug needs to be fixed in its own right.


> > And (evil chuckle) how will it behave if two tasks are doing
> > that concurrently?  The no-overlap case would be fully
> > parallel, I'd hope!
> 
> All of the operations still use dpm_sem, so really strange things
> shouldn't happen. That said, if you're trying to suspend to disk and ram
> at the same time, you get what you deserve, 

Telling one subsystem to suspend as deeply as possible doesn't
mean no other subsystem will be needed much sooner.  And
why you'd suspend a system "to disk" when the system only
has some flash ram?  :)

- Dave

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-06 21:10           ` Pavel Machek
@ 2004-08-07 23:23             ` David Brownell
  2004-08-08 17:16               ` Pavel Machek
  0 siblings, 1 reply; 61+ messages in thread
From: David Brownell @ 2004-08-07 23:23 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Benjamin Herrenschmidt, Linux Kernel list, Patrick Mochel

[-- Attachment #1: Type: text/plain, Size: 792 bytes --]


> I believe right solution is to pass system state into suspend(), and
> provide system_to_pci_state(), so that drivers that only care about
> PCI state can use.

So long as tools (gcc by preference, or sparse) warn about API changes,
I can be happy.  See the attached "pmcore-0807.patch", a subset of
yours, against a recent rc3.

I'd rather address the PCI issue with Ben's renumbering:

enum system_state {
        PM_SUSPEND_ON = 0,
        PM_SUSPEND_STANDBY = 1,
        PM_SUSPEND_MEM = 3,
        PM_SUSPEND_DISK = 4,
        PM_SUSPEND_MAX,
};

That way sysfs won't get all wierd with PCI, and by the way
no drivers should need to change ... :)


> static void ide_device_shutdown(struct device *dev)

Ah, that needs attention too.  Should USB do anything special
there?

- Dave




[-- Attachment #2: pmcore-0807.patch --]
[-- Type: text/x-diff, Size: 4172 bytes --]

Cleanup of some of the "new 2.6" suspend call path, using
"enum system_state" rather than "u32".  This mostly just
documents existing practice for non-PCI drivers.


--- 1.16/include/linux/pm.h	Thu Jul  1 22:23:53 2004
+++ edited/include/linux/pm.h	Sat Aug  7 14:32:44 2004
@@ -193,11 +193,11 @@
 extern void (*pm_idle)(void);
 extern void (*pm_power_off)(void);
 
-enum {
-	PM_SUSPEND_ON,
-	PM_SUSPEND_STANDBY,
-	PM_SUSPEND_MEM,
-	PM_SUSPEND_DISK,
+enum system_state {
+	PM_SUSPEND_ON = 0,
+	PM_SUSPEND_STANDBY = 1,
+	PM_SUSPEND_MEM = 2,
+	PM_SUSPEND_DISK = 3,
 	PM_SUSPEND_MAX,
 };
 
@@ -241,11 +241,13 @@
 
 extern void device_pm_set_parent(struct device * dev, struct device * parent);
 
-extern int device_suspend(u32 state);
-extern int device_power_down(u32 state);
+/*
+ * apply system suspend policy to all devices
+ */
+extern int device_suspend(enum system_state reason);
+extern int device_power_down(enum system_state reason);
 extern void device_power_up(void);
 extern void device_resume(void);
-
 
 #endif /* __KERNEL__ */
 
--- 1.86/kernel/power/swsusp.c	Thu Jul  1 22:23:48 2004
+++ edited/kernel/power/swsusp.c	Sat Aug  7 14:32:44 2004
@@ -699,7 +699,7 @@
 	else
 #endif
 	{
-		device_suspend(3);
+		device_suspend(PM_SUSPEND_DISK);
 		device_shutdown();
 		machine_power_off();
 	}
@@ -720,7 +720,7 @@
 	mb();
 	spin_lock_irq(&suspend_pagedir_lock);	/* Done to disable interrupts */ 
 
-	device_power_down(3);
+	device_power_down(PM_SUSPEND_DISK);
 	PRINTK( "Waiting for DMAs to settle down...\n");
 	mdelay(1000);	/* We do not want some readahead with DMA to corrupt our memory, right?
 			   Do it with disabled interrupts for best effect. That way, if some
@@ -789,7 +789,7 @@
 {
 	int is_problem;
 	read_swapfiles();
-	device_power_down(3);
+	device_power_down(PM_SUSPEND_DISK);
 	is_problem = suspend_prepare_image();
 	device_power_up();
 	spin_unlock_irq(&suspend_pagedir_lock);
@@ -845,7 +845,7 @@
 		disable_nonboot_cpus();
 		/* Save state of all device drivers, and stop them. */
 		printk("Suspending devices... ");
-		if ((res = device_suspend(3))==0) {
+		if ((res = device_suspend(PM_SUSPEND_DISK))==0) {
 			/* If stopping device drivers worked, we proceed basically into
 			 * suspend_save_image.
 			 *
@@ -1200,7 +1200,7 @@
 		goto read_failure;
 	/* FIXME: Should we stop processes here, just to be safer? */
 	disable_nonboot_cpus();
-	device_suspend(3);
+	device_suspend(PM_SUSPEND_DISK);
 	do_magic(1);
 	panic("This never returns");
 
--- 1.7/drivers/base/power/power.h	Wed Jun  9 23:34:24 2004
+++ edited/drivers/base/power/power.h	Sat Aug  7 14:32:44 2004
@@ -66,14 +66,14 @@
 /*
  * suspend.c
  */
-extern int suspend_device(struct device *, u32);
+extern int suspend_device(struct device *, enum system_state);
 
 
 /*
  * runtime.c
  */
 
-extern int dpm_runtime_suspend(struct device *, u32);
+extern int dpm_runtime_suspend(struct device *, enum system_state);
 extern void dpm_runtime_resume(struct device *);
 
 #else /* CONFIG_PM */
@@ -88,7 +88,7 @@
 
 }
 
-static inline int dpm_runtime_suspend(struct device * dev, u32 state)
+static inline int dpm_runtime_suspend(struct device * dev, enum system_state state)
 {
 	return 0;
 }
--- 1.32/drivers/base/power/shutdown.c	Wed Jun  9 23:34:24 2004
+++ edited/drivers/base/power/shutdown.c	Sat Aug  7 14:32:44 2004
@@ -29,7 +29,7 @@
 			dev->driver->shutdown(dev);
 		return 0;
 	}
-	return dpm_runtime_suspend(dev, dev->detach_state);
+	return dpm_runtime_suspend(dev, (enum system_state) dev->detach_state);
 }
 
 
--- 1.16/drivers/base/power/suspend.c	Wed Jun  9 23:34:24 2004
+++ edited/drivers/base/power/suspend.c	Sat Aug  7 14:32:44 2004
@@ -35,7 +35,7 @@
  *	@state:	Power state device is entering.
  */
 
-int suspend_device(struct device * dev, u32 state)
+int suspend_device(struct device * dev, enum system_state state)
 {
 	int error = 0;
 
@@ -70,7 +70,7 @@
  *
  */
 
-int device_suspend(u32 state)
+int device_suspend(enum system_state state)
 {
 	int error = 0;
 
@@ -112,7 +112,7 @@
  *	done, power down system devices.
  */
 
-int device_power_down(u32 state)
+int device_power_down(enum system_state state)
 {
 	int error = 0;
 	struct device * dev;

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-03  0:50           ` Benjamin Herrenschmidt
@ 2004-08-07 23:30             ` David Brownell
  0 siblings, 0 replies; 61+ messages in thread
From: David Brownell @ 2004-08-07 23:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Pavel Machek, Linux Kernel list, Patrick Mochel

[-- Attachment #1: Type: text/plain, Size: 3327 bytes --]

On Monday 02 August 2004 17:50, Benjamin Herrenschmidt wrote:

> you ask a driver to quiesce, it's the driver's policy to decide what is
> the best suitable HW state based on what system state we are going into
> (or what is asked by userland, that is quiesce, shutdown, or whatever
> semantics we'll provide once we clanup the sysfs side of things).

So long as it's policy, I want drivers _needing_ to handle it to be the
very last choice on the list ... not the very first one!

I think the "cleanup" needs to handle many different kinds of
suspend things ... policy objects, bus-specific states, driver-specific
states, system policy rules, and so on.  Step one, stop using "u32"
for all of them, instead of some kind of typed pointer!


> > How does your answer change when you go down the path I was
> > describing:  the drivers are passed a device-specific state?  Which for
> > PCI means no API changes; the drivers already expect to see
> > the PCI state number.
> 
> Change ? How so ? I always said the same thing... and no, that is NOT a
> PCI state number, 

Since 2.4, the PCI calls have said it's a PCI state number;
that'd be the change, making it NOT be a PCI state number.


> > > >			the PM core self-deadlocks since
> > > > suspend/resume outcalls are done while holding the semaphore
> > > > that device_pm_remove() needs, ugh.
> > > 
> > > That's an old problem,....
> > 
> > Patrick once posted a patch that morphed the semaphore into a
> > spinlock (in at least some cases), which worked for me, but that
> > never got merged.
> 
> Patrick ? Do you still have that around ?

I dug it up; attached, that machine hasn't quite entirely died yet!


> Low level drivers, at this point, don't really know if they are in use
> in some cases, do they ? 

There are levels of usage they can see, others they can't.
Higher level drivers can't tell what lower level ones
know either ... :)


> But anyway, to get userland originated suspend 
> to possibly do anything useful, we first need to fix sysfs so that it
> does partial tree suspend and not individual deviecs suspend only.

Yes, something that turns a tree into a list (that nobody else can
morph!) and then walks the list ... recursion ok only for prototypes.


> There's also a semantic breakage between who updates the power state in
> struct device, system suspend relies on the driver doing so afaik, while
> sysfs originated calls to the driver suspend will do it in sysfs (which
> I think is the wrong thing to do).

That is, drivers should always do this?  That makes sense to me; how
else are they going to be able to report actual suspend state (which
may not be as requested)? And hmm, sysfs PCI state would be better
read (and written!) as "PCI_D3" rather than "ACPI_D2".


> > Actually no; I was describing what would be an ACPI G0/S0 state
> > where most devices/drivers are powered down, not the G1/S1 state
> > described as "suspend".   CPU would be running, for example.  It's
> > a power management policy, not a suspend policy ... it kicks in
> > during normal operation.
> 
> And who wakes up the devices in this ACPI state ? (just curious)

Whoever suspended the device should be able to resume it;
they're the ones with the local knowledge, after all!  Unless
you're maybe thinking that global rules should apply too... :)

- Dave



[-- Attachment #2: pm-pat.patch --]
[-- Type: text/x-diff, Size: 8062 bytes --]

Date: Mon, 13 Oct 2003 15:29:03 -0700 (PDT)
From: Patrick Mochel <mochel@osdl.org>

Oh man, I don't know what I was thinking, but there was some seriously 
wonky code in there. Appended is a patch that changes the semaphore to a 
spin lock and only keeps it held over operations dealing with the list and 
the dev->power.power_state field. 

Date: Wed, 15 Oct 2003 20:19:06 -0700 (PDT)
From: David Brownell <dbrownell@users.sourceforge.net>

I did the "apm -s" test on that machine, and this problem is gone.
(And looks more reasonable than my 2.6.0-test7 patch.)


--- 1.12/drivers/base/power/main.c	Tue Sep  9 11:22:35 2003
+++ edited/drivers/base/power/main.c	Wed Oct 15 17:37:19 2003
@@ -28,7 +28,7 @@
 LIST_HEAD(dpm_off);
 LIST_HEAD(dpm_off_irq);
 
-DECLARE_MUTEX(dpm_sem);
+spinlock_t dpm_lock = SPIN_LOCK_UNLOCKED;
 
 /*
  * PM Reference Counting.
@@ -76,12 +76,12 @@
 	pr_debug("PM: Adding info for %s:%s\n",
 		 dev->bus ? dev->bus->name : "No Bus", dev->kobj.name);
 	atomic_set(&dev->power.pm_users,0);
-	down(&dpm_sem);
-	list_add_tail(&dev->power.entry,&dpm_active);
 	device_pm_set_parent(dev,dev->parent);
-	if ((error = dpm_sysfs_add(dev)))
-		list_del(&dev->power.entry);
-	up(&dpm_sem);
+	if (!(error = dpm_sysfs_add(dev))) {
+		spin_lock(&dpm_lock);
+		list_add_tail(&dev->power.entry,&dpm_active);
+		spin_unlock(&dpm_lock);
+	}
 	return error;
 }
 
@@ -89,11 +89,11 @@
 {
 	pr_debug("PM: Removing info for %s:%s\n",
 		 dev->bus ? dev->bus->name : "No Bus", dev->kobj.name);
-	down(&dpm_sem);
 	dpm_sysfs_remove(dev);
 	device_pm_release(dev->power.pm_parent);
+	spin_lock(&dpm_lock);
 	list_del(&dev->power.entry);
-	up(&dpm_sem);
+	spin_unlock(&dpm_lock);
 }
 
 
===== drivers/base/power/power.h 1.6 vs edited =====
--- 1.6/drivers/base/power/power.h	Mon Aug 25 11:08:21 2003
+++ edited/drivers/base/power/power.h	Wed Oct 15 17:37:19 2003
@@ -25,7 +25,7 @@
 /*
  * Used to synchronize global power management operations.
  */
-extern struct semaphore dpm_sem;
+extern spinlock_t dpm_lock;
 
 /* 
  * The PM lists.
===== drivers/base/power/resume.c 1.11 vs edited =====
--- 1.11/drivers/base/power/resume.c	Mon Aug 25 11:08:21 2003
+++ edited/drivers/base/power/resume.c	Wed Oct 15 17:37:19 2003
@@ -22,25 +22,23 @@
 
 int resume_device(struct device * dev)
 {
-	if (dev->bus && dev->bus->resume)
-		return dev->bus->resume(dev);
-	return 0;
-}
-
+	int error = 0;
 
+	if (dev->bus && dev->bus->resume)
+		error = dev->bus->resume(dev);
 
-void dpm_resume(void)
-{
-	while(!list_empty(&dpm_off)) {
-		struct list_head * entry = dpm_off.next;
-		struct device * dev = to_device(entry);
-		list_del_init(entry);
-		resume_device(dev);
-		list_add_tail(entry,&dpm_active);
-	}
+	spin_lock(&dpm_lock);
+	if (!error) {
+		list_add(&dev->power.entry,&dpm_active);
+		dev->power.power_state = 0;
+	} else
+		list_add(&dev->power.entry,&dpm_off);
+	spin_unlock(&dpm_lock);
+	return error;
 }
 
 
+
 /**
  *	device_resume - Restore state of each device in system.
  *
@@ -50,35 +48,21 @@
 
 void device_resume(void)
 {
-	down(&dpm_sem);
-	dpm_resume();
-	up(&dpm_sem);
+	spin_lock(&dpm_lock);
+	while(!list_empty(&dpm_off)) {
+		struct list_head * entry = dpm_off.next;
+		struct device * dev = to_device(entry);
+		list_del_init(entry);
+		spin_unlock(&dpm_lock);
+		resume_device(dev);
+		spin_lock(&dpm_lock);
+	}
+	spin_unlock(&dpm_lock);
 }
 
 EXPORT_SYMBOL(device_resume);
 
 
-/**
- *	device_power_up_irq - Power on some devices. 
- *
- *	Walk the dpm_off_irq list and power each device up. This 
- *	is used for devices that required they be powered down with
- *	interrupts disabled. As devices are powered on, they are moved to
- *	the dpm_suspended list.
- *
- *	Interrupts must be disabled when calling this. 
- */
-
-void dpm_power_up(void)
-{
-	while(!list_empty(&dpm_off_irq)) {
-		struct list_head * entry = dpm_off_irq.next;
-		list_del_init(entry);
-		resume_device(to_device(entry));
-		list_add_tail(entry,&dpm_active);
-	}
-}
-
 
 /**
  *	device_pm_power_up - Turn on all devices that need special attention.
@@ -91,7 +75,6 @@
 void device_power_up(void)
 {
 	sysdev_resume();
-	dpm_power_up();
 }
 
 EXPORT_SYMBOL(device_power_up);
===== drivers/base/power/runtime.c 1.2 vs edited =====
--- 1.2/drivers/base/power/runtime.c	Tue Aug 19 23:23:32 2003
+++ edited/drivers/base/power/runtime.c	Wed Oct 15 17:37:19 2003
@@ -10,14 +10,6 @@
 #include "power.h"
 
 
-static void runtime_resume(struct device * dev)
-{
-	if (!dev->power.power_state)
-		return;
-	resume_device(dev);
-}
-
-
 /**
  *	dpm_runtime_resume - Power one device back on.
  *	@dev:	Device.
@@ -30,9 +22,15 @@
 
 void dpm_runtime_resume(struct device * dev)
 {
-	down(&dpm_sem);
-	runtime_resume(dev);
-	up(&dpm_sem);
+	spin_lock(&dpm_lock);
+	if (list_empty(&dev->power.entry) || 
+	    !(dev->power.power_state)) {
+		spin_unlock(&dpm_lock);
+		return;
+	}
+	list_del_init(&dev->power.entry);
+	spin_unlock(&dpm_lock);
+	resume_device(dev);
 }
 
 
@@ -44,20 +42,17 @@
 
 int dpm_runtime_suspend(struct device * dev, u32 state)
 {
-	int error = 0;
-
-	down(&dpm_sem);
-	if (dev->power.power_state == state)
-		goto Done;
-
+	spin_lock(&dpm_lock);
+	if (list_empty(&dev->power.entry) || 
+	    (dev->power.power_state == state)) {
+		spin_unlock(&dpm_lock);
+		return 0;
+	}
+	list_del_init(&dev->power.entry);
+	spin_unlock(&dpm_lock);
 	if (dev->power.power_state)
 		dpm_runtime_resume(dev);
-
-	if (!(error = suspend_device(dev,state)))
-		dev->power.power_state = state;
- Done:
-	up(&dpm_sem);
-	return error;
+	return suspend_device(dev,state);
 }
 
 
@@ -73,7 +68,7 @@
  */
 void dpm_set_power_state(struct device * dev, u32 state)
 {
-	down(&dpm_sem);
+	spin_lock(&dpm_lock);
 	dev->power.power_state = state;
-	up(&dpm_sem);
+	spin_unlock(&dpm_lock);
 }
===== drivers/base/power/suspend.c 1.11 vs edited =====
--- 1.11/drivers/base/power/suspend.c	Mon Aug 25 11:08:21 2003
+++ edited/drivers/base/power/suspend.c	Wed Oct 15 17:37:19 2003
@@ -42,13 +42,13 @@
 	if (dev->bus && dev->bus->suspend)
 		error = dev->bus->suspend(dev,state);
 
+	spin_lock(&dpm_lock);
 	if (!error) {
-		list_del(&dev->power.entry);
 		list_add(&dev->power.entry,&dpm_off);
-	} else if (error == -EAGAIN) {
-		list_del(&dev->power.entry);
-		list_add(&dev->power.entry,&dpm_off_irq);
-	}
+		dev->power.power_state = state;
+	} else 
+		list_add(&dev->power.entry,&dpm_active);
+	spin_unlock(&dpm_lock);
 	return error;
 }
 
@@ -63,36 +63,30 @@
  *	the device to the dpm_off list. If it returns -EAGAIN, we move it to 
  *	the dpm_off_irq list. If we get a different error, try and back out. 
  *
- *	If we hit a failure with any of the devices, call device_resume()
+ *	If we hit a failure with any of the devices, call dpm_resume()
  *	above to bring the suspended devices back to life. 
  *
- *	Note this function leaves dpm_sem held to
- *	a) block other devices from registering.
- *	b) prevent other PM operations from happening after we've begun.
- *	c) make sure we're exclusive when we disable interrupts.
- *
  */
 
 int device_suspend(u32 state)
 {
 	int error = 0;
 
-	down(&dpm_sem);
+	spin_lock(&dpm_lock);
 	while(!list_empty(&dpm_active)) {
 		struct list_head * entry = dpm_active.prev;
 		struct device * dev = to_device(entry);
-		if ((error = suspend_device(dev,state))) {
-			if (error != -EAGAIN)
-				goto Error;
-			else
-				error = 0;
-		}
+		list_del_init(entry);
+		spin_unlock(&dpm_lock);
+		if ((error = suspend_device(dev,state)))
+			goto Error;
+		spin_lock(&dpm_lock);
 	}
+	spin_unlock(&dpm_lock);
  Done:
-	up(&dpm_sem);
 	return error;
  Error:
-	dpm_resume();
+	device_resume();
 	goto Done;
 }
 
@@ -110,23 +104,7 @@
 
 int device_power_down(u32 state)
 {
-	int error = 0;
-	struct device * dev;
-
-	list_for_each_entry_reverse(dev,&dpm_off_irq,power.entry) {
-		if ((error = suspend_device(dev,state)))
-			break;
-	} 
-	if (error)
-		goto Error;
-	if ((error = sysdev_suspend(state)))
-		goto Error;
- Done:
-	return error;
- Error:
-	dpm_power_up();
-	goto Done;
+	return sysdev_suspend(state);
 }
 
 EXPORT_SYMBOL(device_power_down);
-

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-07 22:14         ` David Brownell
@ 2004-08-07 23:53           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 61+ messages in thread
From: Benjamin Herrenschmidt @ 2004-08-07 23:53 UTC (permalink / raw)
  To: David Brownell; +Cc: Pavel Machek, Linux Kernel list, Patrick Mochel

On Sun, 2004-08-08 at 08:14, David Brownell wrote:
> On Friday 06 August 2004 13:04, Pavel Machek wrote:
> 
> > > These look to me like "wrong device-level suspend state" cases.
> > 
> > Actually, suspend-to-disk has to suspend all devices *twices*. Once it
> > wants them in "D0 but DMA/interrupts stopped", and once in "D3cold but
> > I do not really care power is going to be cut anyway". I do not think
> > this can be expressed with PCI states.
> 
> How are those different from "PCI_D1" then later "PCI_D3hot"?

D1 is a real HW state, we don't really need to enter it at all. On some
chip, suspending to D1 require some mess that we don't need here. We
just need to block the driver.

> I'd understood that loss of VAUX was always possible, so robust
> drivers always had to handle resume  from PCI_D3cold.

When they can ....

Ben.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: What PM should be and do (Was Re: Solving suspend-level confusion)
  2004-08-04  4:47                       ` What PM should be and do (Was Re: Solving suspend-level confusion) Nigel Cunningham
  2004-08-04  4:53                         ` Benjamin Herrenschmidt
  2004-08-05 18:19                         ` Greg KH
@ 2004-08-08  0:54                         ` David Brownell
  2 siblings, 0 replies; 61+ messages in thread
From: David Brownell @ 2004-08-08  0:54 UTC (permalink / raw)
  To: ncunningham
  Cc: Benjamin Herrenschmidt, Oliver Neukum, Pavel Machek,
	Linux Kernel Mailing List, Patrick Mochel


> - support for state management of part of the tree (I want to put the
> other devices to sleep at the start of suspending)

Hmm, sounds like maybe you're thinking about something that
I might prefer to think of as a set.   Maybe organized by role
("display", "disk array 2", etc) or maybe by something that
monitors device usage (and which might help you by keeping
most things suspended most of the time).


> - support for grouping together a bunch of devices into an arbitrary
> subtree (quiesce that keyboard, screen and storage device - and their
> parents - while I do my lowlevel stuff... okay, now resume them so I can
> save the rest of the image)

OK, that's definitely a set not a tree.  And given USB, putting a keyboard
into that a set creates a problem until remote wakeup works through PCI!

Example of a simple tree:  a USB flash "disk" as the only device
attached to a laptop.  That's actually several devices, something
along these lines:

  - PCI device, maybe bound to ohci_hcd
  - USB root hub implemented by that driver
  - USB devices, at least for device and one interface
  - SCSI host implemented by usb-storage
  - SCSI disk implemented by various SCSI layers 
  - block device implemented by SCSI
  - maybe a couple partitions

Suspending that device should cause almost all of those to suspend.
If there were another USB device on that root hub, it might not be
possible to suspend the root hub.
 

> Perhaps the way to achieve the partial tree stuff is to make the code
> for handling device lists more generic, so that there would be groups of
> devices and each group has an dpm_active, dpm_off and dpm_off_irq list
> of its own. Devices could go into a 'main group' by default, and be
> shifted to a different group for operations like the above. Suspending
> and resuming then moves the devices within the lists for the group.
> Parents would only need to be moved in a case like mine, where the main
> group was going to sleep.

Hmm, I'd rather have the lists be dynamically constructed.  For example,
"this device and all its children" should be suspended bottom-up,
and resumed top-down  Would those groups be there for users to
work with?

And I'm not sure I understand what you mean about changing parents.
After all, a given board will only be wired in one way, and no change
in software can remove constraints like "A must suspend before B"
or "if one of these N devices needs the 48 MHz clock, that PLL must
still be running and so the system can't sleep".

- Dave


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: What PM should be and do (Was Re: Solving suspend-level confusion)
  2004-08-04  4:59                           ` Nigel Cunningham
@ 2004-08-08 16:54                             ` Pavel Machek
  2004-08-08 21:55                               ` Nigel Cunningham
  0 siblings, 1 reply; 61+ messages in thread
From: Pavel Machek @ 2004-08-08 16:54 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Benjamin Herrenschmidt, David Brownell, Oliver Neukum,
	Linux Kernel Mailing List, Patrick Mochel

Hi!

> Yes. I'm not trying to give drivers an inconsistent state, just delaying
> suspending some until the last minute....
> 
> Suspend 2 algorithm:
> 
> 1. Prepare image (freeze processes, allocate memory, eat memory etc)
> 2. Power down all drivers not used while writing image
> 3. Write LRU pages. ('pageset 2')
> 4. Quiesce remaining drivers, save CPU state, to atomic copy of
> remaining ram.
> 5. Resume quiesced drivers.

Hmm, this means pretty complex subtree handling.. Perhaps it would be
possible to make "quiesce/unquiesce" support in drivers so that this
is not needed?
								Pavel

-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: Solving suspend-level confusion
  2004-08-07 23:23             ` David Brownell
@ 2004-08-08 17:16               ` Pavel Machek
  0 siblings, 0 replies; 61+ messages in thread
From: Pavel Machek @ 2004-08-08 17:16 UTC (permalink / raw)
  To: David Brownell; +Cc: Benjamin Herrenschmidt, Linux Kernel list, Patrick Mochel

Hi!

> > I believe right solution is to pass system state into suspend(), and
> > provide system_to_pci_state(), so that drivers that only care about
> > PCI state can use.
> 
> So long as tools (gcc by preference, or sparse) warn about API changes,
> I can be happy.  See the attached "pmcore-0807.patch", a subset of
> yours, against a recent rc3.

It looks very good to me... Now.. who has working sparse?

But I think it should be merged ASAP, it should clean up some
confusion.

								Pavel

> Cleanup of some of the "new 2.6" suspend call path, using
> "enum system_state" rather than "u32".  This mostly just
> documents existing practice for non-PCI drivers.
> 
> 
> --- 1.16/include/linux/pm.h	Thu Jul  1 22:23:53 2004
> +++ edited/include/linux/pm.h	Sat Aug  7 14:32:44 2004
> @@ -193,11 +193,11 @@
>  extern void (*pm_idle)(void);
>  extern void (*pm_power_off)(void);
>  
> -enum {
> -	PM_SUSPEND_ON,
> -	PM_SUSPEND_STANDBY,
> -	PM_SUSPEND_MEM,
> -	PM_SUSPEND_DISK,
> +enum system_state {
> +	PM_SUSPEND_ON = 0,
> +	PM_SUSPEND_STANDBY = 1,
> +	PM_SUSPEND_MEM = 2,
> +	PM_SUSPEND_DISK = 3,
>  	PM_SUSPEND_MAX,
>  };
>  
> @@ -241,11 +241,13 @@
>  
>  extern void device_pm_set_parent(struct device * dev, struct device * parent);
>  
> -extern int device_suspend(u32 state);
> -extern int device_power_down(u32 state);
> +/*
> + * apply system suspend policy to all devices
> + */
> +extern int device_suspend(enum system_state reason);
> +extern int device_power_down(enum system_state reason);
>  extern void device_power_up(void);
>  extern void device_resume(void);
> -
>  
>  #endif /* __KERNEL__ */
>  
> --- 1.86/kernel/power/swsusp.c	Thu Jul  1 22:23:48 2004
> +++ edited/kernel/power/swsusp.c	Sat Aug  7 14:32:44 2004
> @@ -699,7 +699,7 @@
>  	else
>  #endif
>  	{
> -		device_suspend(3);
> +		device_suspend(PM_SUSPEND_DISK);
>  		device_shutdown();
>  		machine_power_off();
>  	}
> @@ -720,7 +720,7 @@
>  	mb();
>  	spin_lock_irq(&suspend_pagedir_lock);	/* Done to disable interrupts */ 
>  
> -	device_power_down(3);
> +	device_power_down(PM_SUSPEND_DISK);
>  	PRINTK( "Waiting for DMAs to settle down...\n");
>  	mdelay(1000);	/* We do not want some readahead with DMA to corrupt our memory, right?
>  			   Do it with disabled interrupts for best effect. That way, if some
> @@ -789,7 +789,7 @@
>  {
>  	int is_problem;
>  	read_swapfiles();
> -	device_power_down(3);
> +	device_power_down(PM_SUSPEND_DISK);
>  	is_problem = suspend_prepare_image();
>  	device_power_up();
>  	spin_unlock_irq(&suspend_pagedir_lock);
> @@ -845,7 +845,7 @@
>  		disable_nonboot_cpus();
>  		/* Save state of all device drivers, and stop them. */
>  		printk("Suspending devices... ");
> -		if ((res = device_suspend(3))==0) {
> +		if ((res = device_suspend(PM_SUSPEND_DISK))==0) {
>  			/* If stopping device drivers worked, we proceed basically into
>  			 * suspend_save_image.
>  			 *
> @@ -1200,7 +1200,7 @@
>  		goto read_failure;
>  	/* FIXME: Should we stop processes here, just to be safer? */
>  	disable_nonboot_cpus();
> -	device_suspend(3);
> +	device_suspend(PM_SUSPEND_DISK);
>  	do_magic(1);
>  	panic("This never returns");
>  
> --- 1.7/drivers/base/power/power.h	Wed Jun  9 23:34:24 2004
> +++ edited/drivers/base/power/power.h	Sat Aug  7 14:32:44 2004
> @@ -66,14 +66,14 @@
>  /*
>   * suspend.c
>   */
> -extern int suspend_device(struct device *, u32);
> +extern int suspend_device(struct device *, enum system_state);
>  
>  
>  /*
>   * runtime.c
>   */
>  
> -extern int dpm_runtime_suspend(struct device *, u32);
> +extern int dpm_runtime_suspend(struct device *, enum system_state);
>  extern void dpm_runtime_resume(struct device *);
>  
>  #else /* CONFIG_PM */
> @@ -88,7 +88,7 @@
>  
>  }
>  
> -static inline int dpm_runtime_suspend(struct device * dev, u32 state)
> +static inline int dpm_runtime_suspend(struct device * dev, enum system_state state)
>  {
>  	return 0;
>  }
> --- 1.32/drivers/base/power/shutdown.c	Wed Jun  9 23:34:24 2004
> +++ edited/drivers/base/power/shutdown.c	Sat Aug  7 14:32:44 2004
> @@ -29,7 +29,7 @@
>  			dev->driver->shutdown(dev);
>  		return 0;
>  	}
> -	return dpm_runtime_suspend(dev, dev->detach_state);
> +	return dpm_runtime_suspend(dev, (enum system_state) dev->detach_state);
>  }
>  
>  
> --- 1.16/drivers/base/power/suspend.c	Wed Jun  9 23:34:24 2004
> +++ edited/drivers/base/power/suspend.c	Sat Aug  7 14:32:44 2004
> @@ -35,7 +35,7 @@
>   *	@state:	Power state device is entering.
>   */
>  
> -int suspend_device(struct device * dev, u32 state)
> +int suspend_device(struct device * dev, enum system_state state)
>  {
>  	int error = 0;
>  
> @@ -70,7 +70,7 @@
>   *
>   */
>  
> -int device_suspend(u32 state)
> +int device_suspend(enum system_state state)
>  {
>  	int error = 0;
>  
> @@ -112,7 +112,7 @@
>   *	done, power down system devices.
>   */
>  
> -int device_power_down(u32 state)
> +int device_power_down(enum system_state state)
>  {
>  	int error = 0;
>  	struct device * dev;


-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: What PM should be and do (Was Re: Solving suspend-level confusion)
  2004-08-07  0:08                             `  Éric Brunet
@ 2004-08-08 19:48                               ` Pavel Machek
  0 siblings, 0 replies; 61+ messages in thread
From: Pavel Machek @ 2004-08-08 19:48 UTC (permalink / raw)
  To: Éric Brunet; +Cc: linux-kernel

Hi!

> >> > - support for telling what class of device a driver is handling (I'm
> >> > particularly interested in keeping the keyboard, screen and storage
> >> > devices alive while suspending).
> >> 
> >> You can see that info today from userspace by looking in
> >> /sys/class/input, /sys/class/graphics, and /sys/block
> 
> It is a minor point, but as many people are working on swsuspend right
> now, I thought I'd mentionned it. It seems (as of 2.6.8.rc1) that the
> screen is not shut down or put in a low power state when suspending to
> disk.
> 
> I guess that for 99.5 % of the population, it is not an issue as the
> monitor is usually plugged in the power supply of the computer and
> power is cut when the computer shuts down. My monitor, however, is
> directly plugged in the mains outlet and, after a suspend to disk, it
> displays indefinitely an information box stating that it has no video
> signal coming in.

Hmm, pretty stupid monitor, it should timeout and poweroff itself.

> The X server knows how to shutdown (DPMS) the screen afer some
> inactivity, so I guess the kernel could do that while
> suspending. And it

If you shutdown monitor via DPMS then hard-turn the machine off, what
happens? Does monitor stay turned off? [If not, all hopes are off.]

> would be very nice if it would. But I believe there is no device
> driver handling the monitor, so I don't know where to do it.

radeonfb (etc) could do that, but support for this is poor, I agree.

								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: What PM should be and do (Was Re: Solving suspend-level confusion)
  2004-08-08 16:54                             ` Pavel Machek
@ 2004-08-08 21:55                               ` Nigel Cunningham
  2004-08-09  8:42                                 ` Pavel Machek
  0 siblings, 1 reply; 61+ messages in thread
From: Nigel Cunningham @ 2004-08-08 21:55 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Benjamin Herrenschmidt, David Brownell, Oliver Neukum,
	Linux Kernel Mailing List, Patrick Mochel

Hi.

On Mon, 2004-08-09 at 02:54, Pavel Machek wrote:
> Hi!
> 
> > Yes. I'm not trying to give drivers an inconsistent state, just delaying
> > suspending some until the last minute....
> > 
> > Suspend 2 algorithm:
> > 
> > 1. Prepare image (freeze processes, allocate memory, eat memory etc)
> > 2. Power down all drivers not used while writing image
> > 3. Write LRU pages. ('pageset 2')
> > 4. Quiesce remaining drivers, save CPU state, to atomic copy of
> > remaining ram.
> > 5. Resume quiesced drivers.
> 
> Hmm, this means pretty complex subtree handling.. Perhaps it would be
> possible to make "quiesce/unquiesce" support in drivers so that this
> is not needed?

That would be great from my point of view. It's why I talked before in
terms of quiesced etc rather than S3, S4 and so on.

Nigel


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: What PM should be and do (Was Re: Solving suspend-level confusion)
  2004-08-08 21:55                               ` Nigel Cunningham
@ 2004-08-09  8:42                                 ` Pavel Machek
  0 siblings, 0 replies; 61+ messages in thread
From: Pavel Machek @ 2004-08-09  8:42 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Benjamin Herrenschmidt, David Brownell, Oliver Neukum,
	Linux Kernel Mailing List, Patrick Mochel

Hi!

> > > Yes. I'm not trying to give drivers an inconsistent state, just delaying
> > > suspending some until the last minute....
> > > 
> > > Suspend 2 algorithm:
> > > 
> > > 1. Prepare image (freeze processes, allocate memory, eat memory etc)
> > > 2. Power down all drivers not used while writing image
> > > 3. Write LRU pages. ('pageset 2')
> > > 4. Quiesce remaining drivers, save CPU state, to atomic copy of
> > > remaining ram.
> > > 5. Resume quiesced drivers.
> > 
> > Hmm, this means pretty complex subtree handling.. Perhaps it would be
> > possible to make "quiesce/unquiesce" support in drivers so that this
> > is not needed?
> 
> That would be great from my point of view. It's why I talked before in
> terms of quiesced etc rather than S3, S4 and so on.

Well, for "common" hardware (like all modern notebooks), I guess
making it reasonably fast should be easy to do... Uncommon hardware
will be harder, but if someone tries to suspend server with firewire,
he can probably survive some delay...
								Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: What PM should be and do (Was Re: Solving suspend-level confusion)
  2004-08-05 22:14                           ` Nigel Cunningham
  2004-08-07  0:08                             `  Éric Brunet
@ 2004-08-11 21:23                             ` Greg KH
  1 sibling, 0 replies; 61+ messages in thread
From: Greg KH @ 2004-08-11 21:23 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: David Brownell, Benjamin Herrenschmidt, Oliver Neukum,
	Pavel Machek, Linux Kernel Mailing List, Patrick Mochel

On Fri, Aug 06, 2004 at 08:14:34AM +1000, Nigel Cunningham wrote:
> Hi.
> 
> On Fri, 2004-08-06 at 04:19, Greg KH wrote:
> > On Wed, Aug 04, 2004 at 02:47:52PM +1000, Nigel Cunningham wrote:
> > > - support for telling what class of device a driver is handling (I'm
> > > particularly interested in keeping the keyboard, screen and storage
> > > devices alive while suspending).
> > 
> > You can see that info today from userspace by looking in
> > /sys/class/input, /sys/class/graphics, and /sys/block
> 
> Does this apply to all devices

No.

>, and how do I tell it 'programmatically'?

 From within the kernel?  I don't think you really can, sorry.

greg k-h

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2004-08-11 21:31 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-30 16:44 Solving suspend-level confusion Pavel Machek
2004-07-30 22:39 ` Benjamin Herrenschmidt
2004-07-30 23:06   ` Pavel Machek
2004-07-31  4:02 ` David Brownell
2004-07-31  4:36   ` Pavel Machek
2004-07-31  5:49   ` Benjamin Herrenschmidt
2004-07-31 14:23     ` David Brownell
2004-07-31 17:01       ` Oliver Neukum
2004-07-31 17:51         ` David Brownell
2004-08-01  0:41         ` David Brownell
2004-08-01  1:34           ` Benjamin Herrenschmidt
2004-08-02 16:38             ` David Brownell
2004-08-03  0:38               ` Benjamin Herrenschmidt
2004-08-04  2:28                 ` David Brownell
2004-08-04  2:26                   ` Nigel Cunningham
2004-08-04  2:53                     ` Benjamin Herrenschmidt
2004-08-04  2:52                       ` Nigel Cunningham
2004-08-04  4:14                         ` Benjamin Herrenschmidt
2004-08-04  4:25                           ` Nigel Cunningham
2004-08-04  4:52                             ` Benjamin Herrenschmidt
2004-08-04  4:54                               ` Nigel Cunningham
2004-08-04  5:03                                 ` Benjamin Herrenschmidt
2004-08-04  5:05                                   ` Nigel Cunningham
2004-08-05 10:05                                   ` Nigel Cunningham
2004-08-05 22:31                                   ` Nigel Cunningham
2004-08-06  0:39                                     ` Benjamin Herrenschmidt
2004-08-06 21:30                                     ` Pavel Machek
2004-08-06 21:29                           ` Pavel Machek
2004-08-06 22:27                             ` Benjamin Herrenschmidt
2004-08-06 22:37                               ` Pavel Machek
2004-08-06 21:26                         ` Pavel Machek
2004-08-05  1:29                     ` David Brownell
2004-08-05 10:19                       ` Nigel Cunningham
2004-08-06  0:32                         ` David Brownell
     [not found]                           ` <1091772799.2532.50.camel@laptop.cunninghams>
2004-08-07 22:24                             ` David Brownell
2004-08-04  2:56                   ` Benjamin Herrenschmidt
2004-08-04  3:30                     ` David Brownell
2004-08-04  4:19                       ` Benjamin Herrenschmidt
2004-08-04  4:47                       ` What PM should be and do (Was Re: Solving suspend-level confusion) Nigel Cunningham
2004-08-04  4:53                         ` Benjamin Herrenschmidt
2004-08-04  4:59                           ` Nigel Cunningham
2004-08-08 16:54                             ` Pavel Machek
2004-08-08 21:55                               ` Nigel Cunningham
2004-08-09  8:42                                 ` Pavel Machek
2004-08-05 18:19                         ` Greg KH
2004-08-05 22:14                           ` Nigel Cunningham
2004-08-07  0:08                             `  Éric Brunet
2004-08-08 19:48                               ` Pavel Machek
2004-08-11 21:23                             ` Greg KH
2004-08-08  0:54                         ` David Brownell
2004-08-06 21:21               ` Solving suspend-level confusion Pavel Machek
2004-07-31 21:09       ` Benjamin Herrenschmidt
2004-08-02 16:40         ` David Brownell
2004-08-03  0:50           ` Benjamin Herrenschmidt
2004-08-07 23:30             ` David Brownell
2004-08-06 21:10           ` Pavel Machek
2004-08-07 23:23             ` David Brownell
2004-08-08 17:16               ` Pavel Machek
2004-08-06 20:04       ` Pavel Machek
2004-08-07 22:14         ` David Brownell
2004-08-07 23:53           ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox