Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)

public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <Pine.LNX.4.44L0.0912111938310.32493-100000@netrider.rowland.org>
@ 2009-12-12 17:35 ` Rafael J. Wysocki
  0 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-12 17:35 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Saturday 12 December 2009, Alan Stern wrote:
> On Sat, 12 Dec 2009, Rafael J. Wysocki wrote:
> 
> > Below is a patch I've just tested, but there's a lockdep problem in it I don't
> > know how to solve.  Namely, lockdep is apparently unhappy with us not releasing
> > the lock taken in device_suspend() and it complains we take it twice in a row
> > (which we do, but for another device).  I need to use down_read_non_owner()
> > to make it shut up and then I also need to use up_read_non_owner() in
> > __device_suspend(), although there's the comment in include/linux/rwsem.h
> > saying exatly this about that:
> > 
> > /*
> >  * Take/release a lock when not the owner will release it.
> >  *
> >  * [ This API should be avoided as much as possible - the
> >  *   proper abstraction for this case is completions. ]
> >  */
> > 
> > (I'd like to know your opinion about that).  Yet, that's not all, because next
> > it complains during resume that __device_resume() releases a lock it didn't
> > acquire, which it clearly does, but that is intentional.  Unfortunately,
> > there's no up_write_non_owner() ...
> 
> Hah!  I knew it!
> 
> How come lockdep didn't complain earlier?  What's different about this 
> patch?  Only the nesting annotations?  Why should adding annotations 
> make lockdep less happy?

I'm not sure.  Perhaps I made a mistake during the previous tests.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <Pine.LNX.4.44L0.0912201434340.27137-100000@netrider.rowland.org>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <Pine.LNX.4.44L0.0912201434340.27137-100000@netrider.rowland.org>
@ 2009-12-20 19:51 ` Rafael J. Wysocki
  0 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-20 19:51 UTC (permalink / raw)
  To: Alan Stern
  Cc: Dmitry Torokhov, LKML, ACPI Devel Maling List, Linus Torvalds,
	pm list

On Sunday 20 December 2009, Alan Stern wrote:
> On Sun, 20 Dec 2009, Rafael J. Wysocki wrote:
> 
> > BTW, what's the right place to call device_enable_async_suspend() for USB
> > devices?
> 
> For USB devices, it's in drivers/usb/core/hub.c:usb_new_device() 
> anywhere before the call to usb_device_add().
> 
> For USB interfaces, it's in 
> drivers/usb/core/message.c:usb_set_configuration() before the call to 
> device_add().
> 
> For USB endpoints, it's in 
> drivers/usb/core/endpoint.c:usb_create_ep_devs() before the call to 
> device_register().

Thanks!

> However you won't need to do it for interfaces and endpoints if you 
> automatically treat as async any device without suspend/resume 
> callbacks.

I don't do that right now and I need these settings just for testing at the
moment.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912201910.26895.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <200912201910.26895.rjw@sisk.pl>
@ 2009-12-20 19:38 ` Alan Stern
  0 siblings, 0 replies; 98+ messages in thread
From: Alan Stern @ 2009-12-20 19:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Dmitry Torokhov, LKML, ACPI Devel Maling List, Linus Torvalds,
	pm list

On Sun, 20 Dec 2009, Rafael J. Wysocki wrote:

> BTW, what's the right place to call device_enable_async_suspend() for USB
> devices?

For USB devices, it's in drivers/usb/core/hub.c:usb_new_device() 
anywhere before the call to usb_device_add().

For USB interfaces, it's in 
drivers/usb/core/message.c:usb_set_configuration() before the call to 
device_add().

For USB endpoints, it's in 
drivers/usb/core/endpoint.c:usb_create_ep_devs() before the call to 
device_register().

However you won't need to do it for interfaces and endpoints if you 
automatically treat as async any device without suspend/resume 
callbacks.

Alan Stern

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <Pine.LNX.4.44L0.0912201210300.24162-100000@netrider.rowland.org>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <Pine.LNX.4.44L0.0912201210300.24162-100000@netrider.rowland.org>
@ 2009-12-20 18:10 ` Rafael J. Wysocki
  0 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-20 18:10 UTC (permalink / raw)
  To: Alan Stern
  Cc: Dmitry Torokhov, LKML, ACPI Devel Maling List, Linus Torvalds,
	pm list

On Sunday 20 December 2009, Alan Stern wrote:
> On Sun, 20 Dec 2009, Rafael J. Wysocki wrote:
> 
> > > It's too early to come to this sort of conclusion (i.e., that suspend
> > > and resume react very differently to an asynchronous approach).  Unless
> > > you have some definite _reason_ for thinking that resume will benefit
> > > more than suspend, you shouldn't try to generalize so much from tests
> > > on only two systems.
> > 
> > In fact I have one reason.  Namely, the things that drivers do on suspend and
> > resume are evidently quite different and on these two systems I was able to
> > test they apparently took different amounts of time to complete.
> > 
> > The very fact that on both systems resume is substantially longer than suspend,
> > even if all devices are suspended and resumed synchronously, is quite
> > interesting.
> 
> Yes, it is.  But it doesn't mean that suspend won't benefit from 
> asynchronicity; it just means that the benefits might not be as large 
> as they are for resume.

Agreed, although that rises the question whether they are sufficiently
significant.  I guess time will tell.  With the i8042 done asynchronously they
are IMO.

BTW, what's the right place to call device_enable_async_suspend() for USB
devices?

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912201352.07689.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <200912201352.07689.rjw@sisk.pl>
@ 2009-12-20 17:12 ` Alan Stern
  0 siblings, 0 replies; 98+ messages in thread
From: Alan Stern @ 2009-12-20 17:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Dmitry Torokhov, LKML, ACPI Devel Maling List, Linus Torvalds,
	pm list

On Sun, 20 Dec 2009, Rafael J. Wysocki wrote:

> > It's too early to come to this sort of conclusion (i.e., that suspend
> > and resume react very differently to an asynchronous approach).  Unless
> > you have some definite _reason_ for thinking that resume will benefit
> > more than suspend, you shouldn't try to generalize so much from tests
> > on only two systems.
> 
> In fact I have one reason.  Namely, the things that drivers do on suspend and
> resume are evidently quite different and on these two systems I was able to
> test they apparently took different amounts of time to complete.
> 
> The very fact that on both systems resume is substantially longer than suspend,
> even if all devices are suspended and resumed synchronously, is quite
> interesting.

Yes, it is.  But it doesn't mean that suspend won't benefit from 
asynchronicity; it just means that the benefits might not be as large 
as they are for resume.

Alan Stern

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <Pine.LNX.4.44L0.0912192232360.6618-100000@netrider.rowland.org>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <Pine.LNX.4.44L0.0912192232360.6618-100000@netrider.rowland.org>
@ 2009-12-20 12:55 ` Rafael J. Wysocki
  0 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-20 12:55 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Sunday 20 December 2009, Alan Stern wrote:
> On Sat, 19 Dec 2009, Rafael J. Wysocki wrote:
> 
> > On Friday 18 December 2009, Alan Stern wrote:
> > > On Fri, 18 Dec 2009, Rafael J. Wysocki wrote:
> > > 
> > > > I didn't manage to do that, but I was able to mark sd and i8042 as async and
> > > > see the impact of this.
> > > 
> > > Apparently this didn't do what you wanted.  In the nx6325
> > > sd+i8042+async+extra log, the 0:0:0:0 device (which is a SCSI disk) was
> 
> To be precise, the device is an ATA or SATA disk but it is managed by 
> the sd driver.
> 
> > > suspended by the main thread instead of an async thread.
> > 
> > Hm, that's odd, because there's a noticeable time difference between the
> > two cases in which the sd is sync and async.  I'll look into it further.
> 
> I don't know what the whole story is, but the PID number tells the 
> tale.
> 
> > > There's an important point I neglected to mention before.  Your logs 
> > > don't show anything for devices with no suspend callbacks at all.  
> > > Nevertheless, these devices sit on the device list and prevent other
> > > devices from suspending or resuming as soon as they could.
> > 
> > Unless they are async, that is.
> 
> Yes.  It would be simpler to make them async.  But first we ought to
> know what they are.  Can you add an extra line to the log for such
> devices?

Sure, I'll do that.

> What I'm afraid of is that there might be a "normal" device with a
> "normal" ancestor but with "abnormal" devices in between (where
> "normal" means there is a suspend or resume routine and "abnormal"  
> means all the method pointers are NULL).  I know that this happens when
> there's a USB mass-storage device, for example.  If we complete the
> intermediate devices immediately, then there won't be anything to
> prevent the ancestor from suspending before the device or the device
> from resuming before the ancestor.

I'm afraid of that too.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <Pine.LNX.4.44L0.0912192253200.6618-100000@netrider.rowland.org>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <Pine.LNX.4.44L0.0912192253200.6618-100000@netrider.rowland.org>
@ 2009-12-20 12:52 ` Rafael J. Wysocki
  0 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-20 12:52 UTC (permalink / raw)
  To: Alan Stern
  Cc: Dmitry Torokhov, LKML, ACPI Devel Maling List, Linus Torvalds,
	pm list

On Sunday 20 December 2009, Alan Stern wrote:
> On Sun, 20 Dec 2009, Rafael J. Wysocki wrote:
> 
> > So, seriously, do you think it makes sense to do asynchronous suspend at all?
> > I'm asking, because we're likely to get into troubles like this during suspend
> > for other kinds of devices too and without resolving them we won't get any
> > significant speedup from asynchronous suspend.
> > 
> > That said, to me it's definitely worth doing asynchronous resume with the
> > "start asynch threads upfront" modification, as the results of the tests show
> > that quite clearly.  I hope you agree.
> 
> It's too early to come to this sort of conclusion (i.e., that suspend
> and resume react very differently to an asynchronous approach).  Unless
> you have some definite _reason_ for thinking that resume will benefit
> more than suspend, you shouldn't try to generalize so much from tests
> on only two systems.

In fact I have one reason.  Namely, the things that drivers do on suspend and
resume are evidently quite different and on these two systems I was able to
test they apparently took different amounts of time to complete.

The very fact that on both systems resume is substantially longer than suspend,
even if all devices are suspended and resumed synchronously, is quite
interesting.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912192241.03991.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <200912192241.03991.rjw@sisk.pl>
@ 2009-12-20  3:48 ` Alan Stern
  0 siblings, 0 replies; 98+ messages in thread
From: Alan Stern @ 2009-12-20  3:48 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Sat, 19 Dec 2009, Rafael J. Wysocki wrote:

> On Friday 18 December 2009, Alan Stern wrote:
> > On Fri, 18 Dec 2009, Rafael J. Wysocki wrote:
> > 
> > > I didn't manage to do that, but I was able to mark sd and i8042 as async and
> > > see the impact of this.
> > 
> > Apparently this didn't do what you wanted.  In the nx6325
> > sd+i8042+async+extra log, the 0:0:0:0 device (which is a SCSI disk) was

To be precise, the device is an ATA or SATA disk but it is managed by 
the sd driver.

> > suspended by the main thread instead of an async thread.
> 
> Hm, that's odd, because there's a noticeable time difference between the
> two cases in which the sd is sync and async.  I'll look into it further.

I don't know what the whole story is, but the PID number tells the 
tale.

> > There's an important point I neglected to mention before.  Your logs 
> > don't show anything for devices with no suspend callbacks at all.  
> > Nevertheless, these devices sit on the device list and prevent other
> > devices from suspending or resuming as soon as they could.
> 
> Unless they are async, that is.

Yes.  It would be simpler to make them async.  But first we ought to
know what they are.  Can you add an extra line to the log for such
devices?

What I'm afraid of is that there might be a "normal" device with a
"normal" ancestor but with "abnormal" devices in between (where
"normal" means there is a suspend or resume routine and "abnormal"  
means all the method pointers are NULL).  I know that this happens when
there's a USB mass-storage device, for example.  If we complete the
intermediate devices immediately, then there won't be anything to
prevent the ancestor from suspending before the device or the device
from resuming before the ancestor.  Forcing the "abnormal" devices to
be async, even if they aren't marked that way, would avoid these
problems.

> > For example, the fingerprint sensor (3-1) took the most time to resume.  
> > But other devices were delayed until after it finished because it had
> > children with no callbacks, and they delayed the devices following
> > them in the list.
> > 
> > What would happen if you completed these devices immediately, as part 
> > of the first pass?
> 
> OK.  How do the PM core is supposed to check if a device has null suspend
> and resume?  Check all of the function pointers in the first pass?

All the relevant pointers (including the legacy pointers).  That is, 
you check only the suspend pointers during the first suspend pass, and 
likewise for resume.

Alan Stern

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <Pine.LNX.4.44L0.0912181205290.2987-100000@iolanthe.rowland.org>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <Pine.LNX.4.44L0.0912181205290.2987-100000@iolanthe.rowland.org>
@ 2009-12-19 21:41 ` Rafael J. Wysocki
  0 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-19 21:41 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Friday 18 December 2009, Alan Stern wrote:
> On Fri, 18 Dec 2009, Rafael J. Wysocki wrote:
> 
> > I didn't manage to do that, but I was able to mark sd and i8042 as async and
> > see the impact of this.
> 
> Apparently this didn't do what you wanted.  In the nx6325
> sd+i8042+async+extra log, the 0:0:0:0 device (which is a SCSI disk) was
> suspended by the main thread instead of an async thread.

Hm, that's odd, because there's a noticeable time difference between the
two cases in which the sd is sync and async.  I'll look into it further.

> There's an important point I neglected to mention before.  Your logs 
> don't show anything for devices with no suspend callbacks at all.  
> Nevertheless, these devices sit on the device list and prevent other
> devices from suspending or resuming as soon as they could.

Unless they are async, that is.

> For example, the fingerprint sensor (3-1) took the most time to resume.  
> But other devices were delayed until after it finished because it had
> children with no callbacks, and they delayed the devices following
> them in the list.
> 
> What would happen if you completed these devices immediately, as part 
> of the first pass?

OK.  How do the PM core is supposed to check if a device has null suspend
and resume?  Check all of the function pointers in the first pass?

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <Pine.LNX.4.44L0.0912171444040.2645-100000@iolanthe.rowland.org>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <Pine.LNX.4.44L0.0912171444040.2645-100000@iolanthe.rowland.org>
@ 2009-12-17 20:36 ` Rafael J. Wysocki
  0 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-17 20:36 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Thursday 17 December 2009, Alan Stern wrote:
> On Thu, 17 Dec 2009, Rafael J. Wysocki wrote:
> 
> > That actually is correct.  On the nx6325 suspend is totally dominated by disk
> > spindown, almost everything else is negligible compared to it (well, except for
> > the audio), so we can't go down below 1 s during suspend on this box.
> > 
> > On the Wind, disk spindown time is comparable with serio suspend time,
> > so at least in principle we should be able to get .5 s suspend on this box - 
> > if the disk spindown in async.
> > 
> > In turn, the resume on the Wind is dominated by disk spinup, so we can't
> > go below 1.5 s on this box during resume (notice that the "async+extra"
> > approach brings us close to this limit, although we could save .5 s more in
> > principle by making more devices async).
> > 
> > Resume on the nx6325 is a different story, though, as it is dominated by USB
> > and PCI devices, so marking those as async would probably bring us close to
> > the limit.
> 
> The implications seem pretty clear.  If the following sorts of devices
> were async:
> 
> 	USB (devices and interfaces), PCI, serio, SCSI (hosts, targets,
> 	devices)

Plus ACPI battery.

> then we would reap close to the maximum benefit -- providing:
> 
> 	async threads are started in a first pass without waiting
> 	for synchronous devices, and

Agreed.
 
> It's not clear that making all these types of devices async will really 
> work, but it's worth testing.

I'm working on it.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <Pine.LNX.4.44L0.0912161753540.2643-100000@iolanthe.rowland.org>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <Pine.LNX.4.44L0.0912161753540.2643-100000@iolanthe.rowland.org>
@ 2009-12-16 23:18 ` Rafael J. Wysocki
       [not found] ` <200912170018.05175.rjw@sisk.pl>
  1 sibling, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-16 23:18 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Thursday 17 December 2009, Alan Stern wrote:
> On Wed, 16 Dec 2009, Rafael J. Wysocki wrote:
> 
> > I've just put the first set of data, for the HP nx6325 at:
> > http://www.sisk.pl/kernel/data/nx6325/
> >  
> > The *-dmesg.log files contain full dmesg outputs starting from a cold boot and
> > including one suspend-resume cycle in each case, with debug_initcall enabled.
> > 
> > The *-suspend.log files are excerpts from the *-dmesg.log files containing
> > the suspend messages only, and analogously for *-resume.log.
> 
> I've just started looking at the sync-suspend.log file.  What are all 
> the '+' characters and " @ 3368" strings after the device names?

I think the + is necessary for the Arjan's graph-generating script and the
@ number is the value of current (ie. the PID of the calling task).

> You didn't print out the parent name for each device, so the tree 
> structure has been lost.

That's because the original Arjan's patch doesn't do that, I'm adding it
right now.

> Why do those "sd 0:0:0:0 [sda]" messages appear in between two 
> callbacks?  The cache-synchronization and the spin-down commands are
> not executed asynchronously.

Because the data are incomplete.  :-(

I've just realized that the Arjan's patch only covers bus types and classes
that have been converted to dev_pm_ops already, so I'm extending it to the
"legacy" ones at the moment.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912170018.05175.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] ` <200912170018.05175.rjw@sisk.pl>
@ 2009-12-17  1:30   ` Rafael J. Wysocki
  0 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-17  1:30 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Thursday 17 December 2009, Rafael J. Wysocki wrote:
> On Thursday 17 December 2009, Alan Stern wrote:
> > On Wed, 16 Dec 2009, Rafael J. Wysocki wrote:
> > 
> > > I've just put the first set of data, for the HP nx6325 at:
> > > http://www.sisk.pl/kernel/data/nx6325/
> > >  
> > > The *-dmesg.log files contain full dmesg outputs starting from a cold boot and
> > > including one suspend-resume cycle in each case, with debug_initcall enabled.
> > > 
> > > The *-suspend.log files are excerpts from the *-dmesg.log files containing
> > > the suspend messages only, and analogously for *-resume.log.
> > 
> > I've just started looking at the sync-suspend.log file.  What are all 
> > the '+' characters and " @ 3368" strings after the device names?
> 
> I think the + is necessary for the Arjan's graph-generating script and the
> @ number is the value of current (ie. the PID of the calling task).
> 
> > You didn't print out the parent name for each device, so the tree 
> > structure has been lost.
> 
> That's because the original Arjan's patch doesn't do that, I'm adding it
> right now.
> 
> > Why do those "sd 0:0:0:0 [sda]" messages appear in between two 
> > callbacks?  The cache-synchronization and the spin-down commands are
> > not executed asynchronously.
> 
> Because the data are incomplete.  :-(
> 
> I've just realized that the Arjan's patch only covers bus types and classes
> that have been converted to dev_pm_ops already, so I'm extending it to the
> "legacy" ones at the moment.

New data files have been uploaded to:

http://www.sisk.pl/kernel/data/nx6325/
http://www.sisk.pl/kernel/data/wind/

Please let me know if you need more information.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <Pine.LNX.4.44L0.0912161018100.2909-100000@iolanthe.rowland.org>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <Pine.LNX.4.44L0.0912161018100.2909-100000@iolanthe.rowland.org>
@ 2009-12-16 19:26 ` Rafael J. Wysocki
  0 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-16 19:26 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Wednesday 16 December 2009, Alan Stern wrote:
> On Wed, 16 Dec 2009, Rafael J. Wysocki wrote:
> 
> > I measured the total time of suspending and resuming devices as shown by the
> > code added by this patch:
> > http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67
> > on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they are quite
> > different and the HP was running 64-bit kernel and user space).
> 
> > I carried out 5 consecutive suspend-resume cycles (started from under X) on
> > each box in each case, and the raw data are here (all times in milliseconds):
> > http://www.sisk.pl/kernel/data/async-suspend.pdf
> 
> I'd like to see much more detailed data.  For each device, let's get 
> the device name, the parent's name, and the start time, end time, and 
> duration for suspend or resume.  The start time should be measured when 
> you have finished waiting for the children.  The end time should be 
> measured just before the complete_all().

I'm going to use the Arjan's patch + script to chart the suspend/resume times
for individual devices.  I can send you the raw data, though.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <alpine.LFD.2.00.0912151337350.14385@localhost.localdomain>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <alpine.LFD.2.00.0912151337350.14385@localhost.localdomain>
@ 2009-12-15 22:27 ` Alan Stern
  0 siblings, 0 replies; 98+ messages in thread
From: Alan Stern @ 2009-12-15 22:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Tue, 15 Dec 2009, Linus Torvalds wrote:

> On Tue, 15 Dec 2009, Alan Stern wrote:
> > 
> > Okay.  This obviously implies that if/when cardbus bridges are
> > converted to async suspend/resume, the driver should make sure that the
> > lower-numbered devices wait for their sibling higher-numbered devices
> > to suspend (and vice versa for resume).  Awkward though it may be.
> 
> Yes. However, this is an excellent case where the whole "the device layer 
> does things asynchronously" is really rather awkward.
> 
> For cardbus, the nicest model really would be for the _driver_ to decide 
> to do some things asynchronously, after having done some other things 
> synchronously (to make sure of ordering).

Have you considered the possibility of augmenting the design to allow 
this?  Perhaps reserve a particular return code from the suspend 
routine to mean that asynchronous operations are still underway, so the 
PM core shouldn't automatically do the complete_all().

> So I suspect that we _can_ just do cardbus bridges asynchronously too, but 
> it really needs some care. I suspect to a first approximation we would 
> want to do the easy cases first, and ignore cardbus as being "known to 
> possibly have issues".

Certainly.  Start with the easy things and leave harder devices like 
cardbus bridges for later.

> > > Subtle? Hell yes.
> > 
> > I don't disagree.  However the subtlety lies mainly in the matter of
> > non-obvious dependencies.
> 
> Yes. But we don't necessarily even _know_ those dependencies.

Yep.  Both non-obvious and non-known.

> The Cardbus ones I know about, but really only because I wrote much of 
> that code initially when converting cardbus to look like the PCI bridge it 
> largely is. But how many other cases like that do we have that we have 
> perhaps never even hit, because we've never done anything out of order.
> 
> > The ACPI relations are definitely something to worry about.  It would
> > be a good idea, at an early stage, to add those dependencies
> > explicitly.  I don't know enough about them to say more; perhaps Rafael 
> > does.
> 
> Quite frankly, I would really not want to do ACPI first at all.

Dear me, no!  I wasn't saying ACPI should be made async; I was saying
that ACPI "shadow" devices should be made to wait for their async PCI
counterparts.

> > Indeed.  Perhaps you were too hasty in suggesting that PCI bridges 
> > should be async.
> 
> Oh, yes. I would suggest that first we do _nothing_ async except for 
> within just a single USB tree, and perhaps some individual drivers like 
> the PS/2 keyboard controller (and do even that perhaps only for the PC 
> version, which we know is on the southbridge and not anywhere else).
> 
> If that ends up meaning that we block due to PCI bridges, so be it. I 
> really would prefer baby steps over anything more complete.

Agreed.  I'm not in any hurry.

Alan Stern

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912152226.22578.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <200912152226.22578.rjw@sisk.pl>
@ 2009-12-15 22:01 ` Alan Stern
  0 siblings, 0 replies; 98+ messages in thread
From: Alan Stern @ 2009-12-15 22:01 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Tue, 15 Dec 2009, Rafael J. Wysocki wrote:

> > Ideally we would figure out how to do the slow devices in parallel
> > without interference from fast devices having unknown dependencies.  
> > Unfortunately this may not be possible.
> 
> I really expect to see those "unknown dependencies" in the _noirq
> suspend/resume phases and above.  [The very fact they exist is worrisome,
> because that's why we don't know why things work on one system and don't
> work on another, although they appear to be very similar.]

This is a good reason for keeping the _noirq phases synchronous.  AFAIK 
they don't take long enough to be worth converting, so there's no loss.

> > The real issue is "blockage": synchronous devices preventing 
> > possible concurrency among async devices.  That's what you thought 
> > making PCI bridges async would help.
> > 
> > In general, blockage arises in suspend when you have an async child
> > with a synchronous parent.  The parent has to wait for the child, which
> > might take a long time, thereby delaying other unrelated devices.
> 
> Exactly, but the Linus' point seems to be that's going to be rare and we
> should be able to special case all of the interesting cases.

Maybe that's true.  Without seeing some examples of actual dpm_list
contents, we can't tell.  Can you post the interesting parts of the
lists from some of your test machines?  Maybe with a USB device or two
plugged in?  (The device names together with the names of their parents
should be enough.)

> > (This explains why you wanted to make PCI bridges async -- they are the
> > parents of USB controllers.)  For resume it's the opposite: an async
> > parent with synchronous children.
> 
> Is that really going to happen in practice?  I mean, what would be the point?

I don't know.  It's all speculation until we see some actual lists.

Alan Stern

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <Pine.LNX.4.44L0.0912151444010.2643-100000@iolanthe.rowland.org>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <Pine.LNX.4.44L0.0912151444010.2643-100000@iolanthe.rowland.org>
@ 2009-12-15 21:26 ` Rafael J. Wysocki
  2009-12-15 21:54 ` Linus Torvalds
  1 sibling, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-15 21:26 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Tuesday 15 December 2009, Alan Stern wrote:
> On Tue, 15 Dec 2009, Linus Torvalds wrote:
> 
> > It's a very subtle theory, and it's not necessarily always 100% true. For 
> > example, a cardbus bridge is strictly speaking very much a PCI bridge, but 
> > for cardbus bridges we _do_ have a suspend/resume function.
> > 
> > And perhaps worse than that, cardbus bridges are one of the canonical 
> > examples where two different PCI devices actually share registers. It's 
> > quite common that some of the control registers are shared across the two 
> > subfunctions of a two-slot cardbus controller (and we generally don't even 
> > have full docs for them!)
> 
> Okay.  This obviously implies that if/when cardbus bridges are
> converted to async suspend/resume, the driver should make sure that the
> lower-numbered devices wait for their sibling higher-numbered devices
> to suspend (and vice versa for resume).  Awkward though it may be.
> 
> > > The same goes for devices that don't have suspend or resume methods.
> > 
> > Yes and no. 
> > 
> > Again, the "async_suspend" flag is done at the generic device layer, but 
> > 99% of all suspend/resume methods are _not_ done at that level: they are 
> > bus-specific functions, where the bus has a generic suspend-resume 
> > function that it exposes to the generic device layer, and that knows about 
> > the bus-specific rules.
> > 
> > So if you are a PCI device (to take just that example - but it's true of 
> > just about all other buses too), and you don't have any suspend or resume 
> > methods, it's actually impossible to see that fact from the generic device 
> > layer.
> 
> Sure.  That's why the async_suspend flag is set at the bus/driver 
> level.
> 
> > And even when you know it's PCI, our rules are actually not simple at all. 
> > Our rules for PCI devices (and this strictly speaking is true for bridges 
> > too) are rather complex:
> > 
> >  - do we have _any_ legacy PM support (ie the "direct" driver 
> >    suspend/resume functions in the driver ops, rather than having a 
> >    "struct dev_pm_ops" pointer)? If so, call "->suspend()"
> > 
> >  - If not - do we have that "dev_pm_ops" thing? If so, call it.
> > 
> >  - If not - just disable the device entirely _UNLESS_ you're a PCI bridge.
> > 
> > Notice? The way things are set up, if you have no suspend routine, you'll 
> > not get suspended, but you will get disabled. 
> > 
> > So it's _not_ actually safe to asynchronously suspend a PCI device if that 
> > device has no driver or no suspend routines - because even in the absense 
> > of a driver and suspend routines, we'll still least disable it. And if 
> > there is some subtle dependency on that device that isn't obvious (say, it 
> > might be used indirectly for some ACPI thing), then that async suspend is 
> > the wrong thing to do.
> > 
> > Subtle? Hell yes.
> 
> I don't disagree.  However the subtlety lies mainly in the matter of
> non-obvious dependencies.  (The other stuff is all known to the PCI
> core.)  AFAICS there's otherwise little difference between an async
> routine that does nothing and one that disables the device -- both
> operations are very fast.
> 
> The ACPI relations are definitely something to worry about.  It would
> be a good idea, at an early stage, to add those dependencies
> explicitly.  I don't know enough about them to say more; perhaps Rafael 
> does.

It boils down to the fact that for each PCI device known to the ACPI BIOS
there is a "shadow" ACPI device that generally has its own suspend/resume
callbacks and these "shadow" devices are members of the ACPI subtree
of the device tree (ie. they have parents and so on).

Now, when I worked on the first version of async suspend/resume, I noticed
that if those "shadow" ACPI devices did not wait for their PCI counterparts to
suspend, things broke badly.  The reason probably wasn't related to what they
did in their suspend/resume callbacks, because they are usually empty, but it
was rather related to the dependencies between devices in the ACPI subtree
(so, generally speaking, it seems the entire ACPI subtree of the device tree
should be suspended after the entire PCI subtree).

That obviously requires more investigation, though.

> As for other non-obvious dependencies...  Who knows?  Probably the only
> way to find them is by experimentation.  My guess is that they will
> turn out to be connected mostly with "high-level" devices: system
> devices, things on the motherboard -- generally speaking, stuff close
> to the CPU.  Relatively few will be associated with devices below the 
> level of a PCI device or equivalent.
> 
> Ideally we would figure out how to do the slow devices in parallel
> without interference from fast devices having unknown dependencies.  
> Unfortunately this may not be possible.

I really expect to see those "unknown dependencies" in the _noirq
suspend/resume phases and above.  [The very fact they exist is worrisome,
because that's why we don't know why things work on one system and don't
work on another, although they appear to be very similar.]

> > So the whole thing about "we can do PCI bridges asynchronously because 
> > they are obviously no-op" is kind of true - except for the "obviously" 
> > part. It's not obvious at all. It's rather subtle.
> > 
> > As an example of this kind of subtlety - iirc PCIE bridges used to have 
> > suspend and resume bugs when we initially switched over to the "new world" 
> > suspend/resume exactly because they actually did things at "suspend" time 
> > (rather than suspend_late), and that broke devices behind them (this was 
> > not related to async, of course, but the point is that even when you look 
> > like a PCI bridge, you might be doing odd things).

Well, those "pcieport devices" still are the children of PCIe ports, although
physically they just correspond to different sets of registers within the
ports' config spaces (_that_ is overdesigned IMnsHO) and they are "suspended"
during the regular suspend of their PCIe port "parents".

> > So just saying "let's do it asynchronously" is _not_ always guaranteed to 
> > be the right thing at all. It's _probably_ safe for at least regular PCI 
> > bridges. Cardbus bridges? Probably not, but since most modern laptop have 
> > just a single slot - and people who have multiple slots seldom use them 
> > all - most people will probably never see the problems that it _could_ 
> > introduce.
> > 
> > And PCIE bridges? Should be safe these days, but it wasn't quite as 
> > obvious, because a PCIE bridge actually has a driver unlike a regular 
> > plain PCI-PCI bridge.
> > 
> > Subtle, subtle.
> 
> Indeed.  Perhaps you were too hasty in suggesting that PCI bridges 
> should be async.
> 
> It would help a lot to see some device lists for typical machines.  (If 
> there are such things.)  Otherwise we are just blowing gas.
> 
> > > There remains a separate question: Should async devices also be forced
> > > to wait for their children?  I don't see why not.  For PCI bridges it
> > > won't make any significant difference.  As long as the async code
> > > doesn't have to do anything, who cares when it runs?
> > 
> > That's why I just set the "async_resume = 1" thing.
> > 
> > But there might actually be reasons why we care. Like the fact that we 
> > actually throttle the amount of parallel work we do in async_schedule(). 
> > So doing even a "no-op" asynchronously isn't actually a no-op: while it is 
> > pending (and those things can be pending for a long time, since they have 
> > to wait for those slow devices underneath them), it can cause _other_ 
> > async work - that isn't necessarily a no-op at all - to be then done 
> > synchronously.
> > 
> > Now, admittedly our async throttling limits are high enough that the above 
> > kind of detail will probably never ever realy matter (default 256 worker 
> > threads etc). But it's an example of how practice is different from theory 
> > - in _theory_ it doesn't make any difference if you wait for something 
> > asynchronously, but in practice it could make a difference under some 
> > circumstances.
> 
> We certainly shouldn't be worried about side effects of async 
> throttling as this stage.  KISS works both ways: Don't overdesign, and 
> don't worry about things that might crop up when you expand the design.
> 
> We have strayed off the point of your original objection: not providing
> a way for devices to skip waiting for their children.  This really is a
> separate issue from deciding whether or not to go async.  For example,
> your proposed patch makes PCI bridges async but doesn't allow them to
> avoid waiting for children.  IMO that's a good thing.
> 
> The real issue is "blockage": synchronous devices preventing 
> possible concurrency among async devices.  That's what you thought 
> making PCI bridges async would help.
> 
> In general, blockage arises in suspend when you have an async child
> with a synchronous parent.  The parent has to wait for the child, which
> might take a long time, thereby delaying other unrelated devices.

Exactly, but the Linus' point seems to be that's going to be rare and we
should be able to special case all of the interesting cases.

> (This explains why you wanted to make PCI bridges async -- they are the
> parents of USB controllers.)  For resume it's the opposite: an async
> parent with synchronous children.

Is that really going to happen in practice?  I mean, what would be the point?

> Thus, while making PCI bridges async might make suspend faster, it probably
> won't help much with resume speed.  You'd have to make the children of USB
> devices (SCSI hosts, TTYs, and so on) async.  Depending on the order of
> device registration, of course.
> 
> Apart from all this, there's a glaring hole in the discussion so far.  
> You and Arjan may not have noticed it, but those of us still using
> rotating media have to put up with disk resume times that are a factor
> of 100 (!) larger than USB resume times.  That's where the greatest
> gains are to be found.

I guess so.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <Pine.LNX.4.44L0.0912151444010.2643-100000@iolanthe.rowland.org>
  2009-12-15 21:26 ` Rafael J. Wysocki
@ 2009-12-15 21:54 ` Linus Torvalds
  1 sibling, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-15 21:54 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, LKML, pm list

On Tue, 15 Dec 2009, Alan Stern wrote:
> 
> Okay.  This obviously implies that if/when cardbus bridges are
> converted to async suspend/resume, the driver should make sure that the
> lower-numbered devices wait for their sibling higher-numbered devices
> to suspend (and vice versa for resume).  Awkward though it may be.

Yes. However, this is an excellent case where the whole "the device layer 
does things asynchronously" is really rather awkward.

For cardbus, the nicest model really would be for the _driver_ to decide 
to do some things asynchronously, after having done some other things 
synchronously (to make sure of ordering).

That said, I think we are ok for at least Yenta resume, because the really 
ordering-critical stuff we tend to do at "resume_early", which wouldn't be 
asynchronous anyway.

But for an idea of what I'm talking about, look at the o2micro stuff in 
drivers/pcmcia/o2micro.h, and notice how it does certain things only for 
the "PCI_FUNC(..devfn) == 0" case.

So I suspect that we _can_ just do cardbus bridges asynchronously too, but 
it really needs some care. I suspect to a first approximation we would 
want to do the easy cases first, and ignore cardbus as being "known to 
possibly have issues".

> > Subtle? Hell yes.
> 
> I don't disagree.  However the subtlety lies mainly in the matter of
> non-obvious dependencies.

Yes. But we don't necessarily even _know_ those dependencies.

The Cardbus ones I know about, but really only because I wrote much of 
that code initially when converting cardbus to look like the PCI bridge it 
largely is. But how many other cases like that do we have that we have 
perhaps never even hit, because we've never done anything out of order.

> The ACPI relations are definitely something to worry about.  It would
> be a good idea, at an early stage, to add those dependencies
> explicitly.  I don't know enough about them to say more; perhaps Rafael 
> does.

Quite frankly, I would really not want to do ACPI first at all.

We already handle batteries specially, but any random system device? Don't 
touch it, is my suggestion. There is just too many ways it can fail. Don't 
tell me that things "should work" - we know for a fact that BIOS tables 
almost always have every single bug they could possibly have).

> > And PCIE bridges? Should be safe these days, but it wasn't quite as 
> > obvious, because a PCIE bridge actually has a driver unlike a regular 
> > plain PCI-PCI bridge.
> > 
> > Subtle, subtle.
> 
> Indeed.  Perhaps you were too hasty in suggesting that PCI bridges 
> should be async.

Oh, yes. I would suggest that first we do _nothing_ async except for 
within just a single USB tree, and perhaps some individual drivers like 
the PS/2 keyboard controller (and do even that perhaps only for the PC 
version, which we know is on the southbridge and not anywhere else).

If that ends up meaning that we block due to PCI bridges, so be it. I 
really would prefer baby steps over anything more complete.

			Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <Pine.LNX.4.44L0.0912151047410.3566-100000@iolanthe.rowland.org>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <Pine.LNX.4.44L0.0912151047410.3566-100000@iolanthe.rowland.org>
@ 2009-12-15 16:28 ` Linus Torvalds
       [not found] ` <alpine.LFD.2.00.0912150803250.14385@localhost.localdomain>
  1 sibling, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-15 16:28 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, LKML, pm list

On Tue, 15 Dec 2009, Alan Stern wrote:
> 
> It doesn't feel like an ugly hack to me.  It seems like exactly the 
> Right Thing To Do: Make as many devices as possible use async 
> suspend/resume.

The reason it's a ugly hack is that it's actually not a simple decision to 
make. The devil is in the details:

> The only reason we don't make every device async is because we don't
> know whether it's safe.  In the case of PCI bridges we _do_ know --
> because they don't have any work to do outside of
> late_suspend/early_resume -- and so they _should_ be async.

That's the theory, yes. And it was worth the comment to spell out that 
theory. But..

It's a very subtle theory, and it's not necessarily always 100% true. For 
example, a cardbus bridge is strictly speaking very much a PCI bridge, but 
for cardbus bridges we _do_ have a suspend/resume function.

And perhaps worse than that, cardbus bridges are one of the canonical 
examples where two different PCI devices actually share registers. It's 
quite common that some of the control registers are shared across the two 
subfunctions of a two-slot cardbus controller (and we generally don't even 
have full docs for them!)

> The same goes for devices that don't have suspend or resume methods.

Yes and no. 

Again, the "async_suspend" flag is done at the generic device layer, but 
99% of all suspend/resume methods are _not_ done at that level: they are 
bus-specific functions, where the bus has a generic suspend-resume 
function that it exposes to the generic device layer, and that knows about 
the bus-specific rules.

So if you are a PCI device (to take just that example - but it's true of 
just about all other buses too), and you don't have any suspend or resume 
methods, it's actually impossible to see that fact from the generic device 
layer.

And even when you know it's PCI, our rules are actually not simple at all. 
Our rules for PCI devices (and this strictly speaking is true for bridges 
too) are rather complex:

 - do we have _any_ legacy PM support (ie the "direct" driver 
   suspend/resume functions in the driver ops, rather than having a 
   "struct dev_pm_ops" pointer)? If so, call "->suspend()"

 - If not - do we have that "dev_pm_ops" thing? If so, call it.

 - If not - just disable the device entirely _UNLESS_ you're a PCI bridge.

Notice? The way things are set up, if you have no suspend routine, you'll 
not get suspended, but you will get disabled. 

So it's _not_ actually safe to asynchronously suspend a PCI device if that 
device has no driver or no suspend routines - because even in the absense 
of a driver and suspend routines, we'll still least disable it. And if 
there is some subtle dependency on that device that isn't obvious (say, it 
might be used indirectly for some ACPI thing), then that async suspend is 
the wrong thing to do.

Subtle? Hell yes.

So the whole thing about "we can do PCI bridges asynchronously because 
they are obviously no-op" is kind of true - except for the "obviously" 
part. It's not obvious at all. It's rather subtle.

As an example of this kind of subtlety - iirc PCIE bridges used to have 
suspend and resume bugs when we initially switched over to the "new world" 
suspend/resume exactly because they actually did things at "suspend" time 
(rather than suspend_late), and that broke devices behind them (this was 
not related to async, of course, but the point is that even when you look 
like a PCI bridge, you might be doing odd things).

So just saying "let's do it asynchronously" is _not_ always guaranteed to 
be the right thing at all. It's _probably_ safe for at least regular PCI 
bridges. Cardbus bridges? Probably not, but since most modern laptop have 
just a single slot - and people who have multiple slots seldom use them 
all - most people will probably never see the problems that it _could_ 
introduce.

And PCIE bridges? Should be safe these days, but it wasn't quite as 
obvious, because a PCIE bridge actually has a driver unlike a regular 
plain PCI-PCI bridge.

Subtle, subtle.

> There remains a separate question: Should async devices also be forced
> to wait for their children?  I don't see why not.  For PCI bridges it
> won't make any significant difference.  As long as the async code
> doesn't have to do anything, who cares when it runs?

That's why I just set the "async_resume = 1" thing.

But there might actually be reasons why we care. Like the fact that we 
actually throttle the amount of parallel work we do in async_schedule(). 
So doing even a "no-op" asynchronously isn't actually a no-op: while it is 
pending (and those things can be pending for a long time, since they have 
to wait for those slow devices underneath them), it can cause _other_ 
async work - that isn't necessarily a no-op at all - to be then done 
synchronously.

Now, admittedly our async throttling limits are high enough that the above 
kind of detail will probably never ever realy matter (default 256 worker 
threads etc). But it's an example of how practice is different from theory 
- in _theory_ it doesn't make any difference if you wait for something 
asynchronously, but in practice it could make a difference under some 
circumstances.

			Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <alpine.LFD.2.00.0912150803250.14385@localhost.localdomain>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] ` <alpine.LFD.2.00.0912150803250.14385@localhost.localdomain>
@ 2009-12-15 18:57   ` Linus Torvalds
  2009-12-15 20:26   ` Alan Stern
  1 sibling, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-15 18:57 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, LKML, pm list



On Tue, 15 Dec 2009, Linus Torvalds wrote:
> 
> And even when you know it's PCI, our rules are actually not simple at all. 
> Our rules for PCI devices (and this strictly speaking is true for bridges 
> too) are rather complex:
> 
>  - do we have _any_ legacy PM support (ie the "direct" driver 
>    suspend/resume functions in the driver ops, rather than having a 
>    "struct dev_pm_ops" pointer)? If so, call "->suspend()"
> 
>  - If not - do we have that "dev_pm_ops" thing? If so, call it.
> 
>  - If not - just disable the device entirely _UNLESS_ you're a PCI bridge.
> 
> Notice? The way things are set up, if you have no suspend routine, you'll 
> not get suspended, but you will get disabled. 

Side note - what I think might be a clean solution for PCI at least is to 
do something like the following:

 - move that "disable the device entirely" thing to suspend_late, rather 
   than the earlier suspend phase. Now PCI devices without drivers or PM 
   will not be touched at all in the first suspend phase.

 - initialize all PCI devices to have 'async_suspend = 1' on discovery

 - whenever we bind a driver to the PCI device, we'd then look at whether 
   that driver implements suspend/resume callbacks (legacy or new), and 
   clear the async_suspend bit if so.

That way we'd have the same old synchronous behavior for all PCI suspend 
and resume events (unless the driver itself then sets the async_suspend 
bit at device init time, which it could do, of course), while still always 
doing async "no-op" events.

That would avoid the ugly one-liner that just "knows" that PCI bridges are 
special and don't do anything at suspend time (even though they aren't 
really - a PCI bridge _could_ have a driver associated with it that does 
something that might not be happy being asynchronous).

			Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] ` <alpine.LFD.2.00.0912150803250.14385@localhost.localdomain>
  2009-12-15 18:57   ` Linus Torvalds
@ 2009-12-15 20:26   ` Alan Stern
  1 sibling, 0 replies; 98+ messages in thread
From: Alan Stern @ 2009-12-15 20:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Tue, 15 Dec 2009, Linus Torvalds wrote:

> It's a very subtle theory, and it's not necessarily always 100% true. For 
> example, a cardbus bridge is strictly speaking very much a PCI bridge, but 
> for cardbus bridges we _do_ have a suspend/resume function.
> 
> And perhaps worse than that, cardbus bridges are one of the canonical 
> examples where two different PCI devices actually share registers. It's 
> quite common that some of the control registers are shared across the two 
> subfunctions of a two-slot cardbus controller (and we generally don't even 
> have full docs for them!)

Okay.  This obviously implies that if/when cardbus bridges are
converted to async suspend/resume, the driver should make sure that the
lower-numbered devices wait for their sibling higher-numbered devices
to suspend (and vice versa for resume).  Awkward though it may be.

> > The same goes for devices that don't have suspend or resume methods.
> 
> Yes and no. 
> 
> Again, the "async_suspend" flag is done at the generic device layer, but 
> 99% of all suspend/resume methods are _not_ done at that level: they are 
> bus-specific functions, where the bus has a generic suspend-resume 
> function that it exposes to the generic device layer, and that knows about 
> the bus-specific rules.
> 
> So if you are a PCI device (to take just that example - but it's true of 
> just about all other buses too), and you don't have any suspend or resume 
> methods, it's actually impossible to see that fact from the generic device 
> layer.

Sure.  That's why the async_suspend flag is set at the bus/driver 
level.

> And even when you know it's PCI, our rules are actually not simple at all. 
> Our rules for PCI devices (and this strictly speaking is true for bridges 
> too) are rather complex:
> 
>  - do we have _any_ legacy PM support (ie the "direct" driver 
>    suspend/resume functions in the driver ops, rather than having a 
>    "struct dev_pm_ops" pointer)? If so, call "->suspend()"
> 
>  - If not - do we have that "dev_pm_ops" thing? If so, call it.
> 
>  - If not - just disable the device entirely _UNLESS_ you're a PCI bridge.
> 
> Notice? The way things are set up, if you have no suspend routine, you'll 
> not get suspended, but you will get disabled. 
> 
> So it's _not_ actually safe to asynchronously suspend a PCI device if that 
> device has no driver or no suspend routines - because even in the absense 
> of a driver and suspend routines, we'll still least disable it. And if 
> there is some subtle dependency on that device that isn't obvious (say, it 
> might be used indirectly for some ACPI thing), then that async suspend is 
> the wrong thing to do.
> 
> Subtle? Hell yes.

I don't disagree.  However the subtlety lies mainly in the matter of
non-obvious dependencies.  (The other stuff is all known to the PCI
core.)  AFAICS there's otherwise little difference between an async
routine that does nothing and one that disables the device -- both
operations are very fast.

The ACPI relations are definitely something to worry about.  It would
be a good idea, at an early stage, to add those dependencies
explicitly.  I don't know enough about them to say more; perhaps Rafael 
does.

As for other non-obvious dependencies...  Who knows?  Probably the only
way to find them is by experimentation.  My guess is that they will
turn out to be connected mostly with "high-level" devices: system
devices, things on the motherboard -- generally speaking, stuff close
to the CPU.  Relatively few will be associated with devices below the 
level of a PCI device or equivalent.

Ideally we would figure out how to do the slow devices in parallel
without interference from fast devices having unknown dependencies.  
Unfortunately this may not be possible.

> So the whole thing about "we can do PCI bridges asynchronously because 
> they are obviously no-op" is kind of true - except for the "obviously" 
> part. It's not obvious at all. It's rather subtle.
> 
> As an example of this kind of subtlety - iirc PCIE bridges used to have 
> suspend and resume bugs when we initially switched over to the "new world" 
> suspend/resume exactly because they actually did things at "suspend" time 
> (rather than suspend_late), and that broke devices behind them (this was 
> not related to async, of course, but the point is that even when you look 
> like a PCI bridge, you might be doing odd things).
> 
> So just saying "let's do it asynchronously" is _not_ always guaranteed to 
> be the right thing at all. It's _probably_ safe for at least regular PCI 
> bridges. Cardbus bridges? Probably not, but since most modern laptop have 
> just a single slot - and people who have multiple slots seldom use them 
> all - most people will probably never see the problems that it _could_ 
> introduce.
> 
> And PCIE bridges? Should be safe these days, but it wasn't quite as 
> obvious, because a PCIE bridge actually has a driver unlike a regular 
> plain PCI-PCI bridge.
> 
> Subtle, subtle.

Indeed.  Perhaps you were too hasty in suggesting that PCI bridges 
should be async.

It would help a lot to see some device lists for typical machines.  (If 
there are such things.)  Otherwise we are just blowing gas.

> > There remains a separate question: Should async devices also be forced
> > to wait for their children?  I don't see why not.  For PCI bridges it
> > won't make any significant difference.  As long as the async code
> > doesn't have to do anything, who cares when it runs?
> 
> That's why I just set the "async_resume = 1" thing.
> 
> But there might actually be reasons why we care. Like the fact that we 
> actually throttle the amount of parallel work we do in async_schedule(). 
> So doing even a "no-op" asynchronously isn't actually a no-op: while it is 
> pending (and those things can be pending for a long time, since they have 
> to wait for those slow devices underneath them), it can cause _other_ 
> async work - that isn't necessarily a no-op at all - to be then done 
> synchronously.
> 
> Now, admittedly our async throttling limits are high enough that the above 
> kind of detail will probably never ever realy matter (default 256 worker 
> threads etc). But it's an example of how practice is different from theory 
> - in _theory_ it doesn't make any difference if you wait for something 
> asynchronously, but in practice it could make a difference under some 
> circumstances.

We certainly shouldn't be worried about side effects of async 
throttling as this stage.  KISS works both ways: Don't overdesign, and 
don't worry about things that might crop up when you expand the design.

We have strayed off the point of your original objection: not providing
a way for devices to skip waiting for their children.  This really is a
separate issue from deciding whether or not to go async.  For example,
your proposed patch makes PCI bridges async but doesn't allow them to
avoid waiting for children.  IMO that's a good thing.

The real issue is "blockage": synchronous devices preventing 
possible concurrency among async devices.  That's what you thought 
making PCI bridges async would help.

In general, blockage arises in suspend when you have an async child
with a synchronous parent.  The parent has to wait for the child, which
might take a long time, thereby delaying other unrelated devices.  
(This explains why you wanted to make PCI bridges async -- they are the
parents of USB controllers.)  For resume it's the opposite: an async
parent with synchronous children.  Thus, while making PCI bridges async
might make suspend faster, it probably won't help much with resume
speed.  You'd have to make the children of USB devices (SCSI hosts,
TTYs, and so on) async.  Depending on the order of device registration,
of course.

Apart from all this, there's a glaring hole in the discussion so far.  
You and Arjan may not have noticed it, but those of us still using
rotating media have to put up with disk resume times that are a factor
of 100 (!) larger than USB resume times.  That's where the greatest
gains are to be found.

Alan Stern

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <Pine.LNX.4.44L0.0912131221210.1111-100000@netrider.rowland.org>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <Pine.LNX.4.44L0.0912131221210.1111-100000@netrider.rowland.org>
@ 2009-12-13 19:02 ` Alan Stern
  0 siblings, 0 replies; 98+ messages in thread
From: Alan Stern @ 2009-12-13 19:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, pm list, LKML

On Sun, 13 Dec 2009, Alan Stern wrote:

> > Namely that there's no apparent sane way to say "don't wait for children".
> > 
> > PCI bridges that don't suspend at all - or any other device that only 
> > suspends in the 'suspend_late()' thing, for that matter - don't have any 
> > reason what-so-ever to wait for children, since they aren't actually 
> > suspending in the first place. But you make them wait regardless, which 
> > then serializes things unnecessarily (for example, two unrelated USB 
> > controllers).

> In short, allowing devices to suspend before their children would be 
> dangerous and probably would not save a significant amount of time.

There's more to be said.  Even without this "don't wait for children"  
thing, there can be bad interactions causing unnecessary delays.  For
example, suppose A (async) is the parent of B (sync), B comes before C
(sync) in dpm_list, and C is the parent of D (async).  Even if A & B
are unrelated to C & D, they will be forced to wait for them.  It 
doesn't matter that A and D are unrelated and so could suspend 
concurrently.

In essence, every synchonrous device is treated as though it depends on 
all the synchronous devices preceding it in dpm_list.  That's a lot of 
unnecessary constraints.  At the moment we have no choice, because we 
have to assume that some of those constraints actually are necessary -- 
and we don't know which ones.

It's an inescapable fact: If there are unnecessary ordering constraints
then you generally can't be 100% efficient in carrying out parallel
operations.  Compared with all these extra "synchronous" constraints,
the relatively small number of "don't need to wait for children"  
constraints is harmless.  I bet that if we got rid of all unnecessary
constraints except for making parents always wait for their children,
we'd attain more than 95% of the ideal speedup.

Alan Stern

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912112317.31668.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <200912112317.31668.rjw@sisk.pl>
@ 2009-12-12  0:38 ` Alan Stern
  0 siblings, 0 replies; 98+ messages in thread
From: Alan Stern @ 2009-12-12  0:38 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Fri, 11 Dec 2009, Rafael J. Wysocki wrote:

> > > .. and I've told you several times that we should simply not do such 
> > > devices asynchronously. At least not unless there is some _overriding_ 
> > > reason to. And so far, nobody has suggested anything even remotely 
> > > likely for that.
> > 
> > Agreed.  The fact that async non-tree suspend constraints are difficult 
> > with rwsems isn't a drawback if nobody needs to use them.
> 
> Well, see my reply to Linus.  The only thing that bothers me is that if we use
> rwsems, there's no way to handle that even if it turns out that someone
> needs them after all.

This is now a totally moot point, but I want to make it anyway just to
show how perverse life can be.  It turns out that by combining some of
the worst parts of the rwsem approach and the completion approach, it
_is_ possible to have async non-tree suspend constraints with rwsems.  
The key is to imitate the way the completions work.

The resume algorithm doesn't change, but the suspend algorithm does.  
Currently, when suspending a device you first read-lock the parent (to 
prevent it from suspending too soon), then you asynchronously 
write-lock the device and suspend it, and finally read-unlock the 
parent.

Instead, you could first write-lock the device (to prevent the parent
and any other dependents from suspending too soon), then asynchronously
read-lock each of the children and anything else the device needs to
wait for, then suspend the device, and finally write-unlock it.  This
really is analogous to completions:  down_write() is like
init_completion(), up_write() is like complete_all(), and
down_read()+up_read() is like wait_for_completion().  I got the idea
from Linus's comment that completions really are nothing but locks
initialized in the "locked" state.

Of course, you would have to iterate over all the children and deal
with lockdep complaints.  So this obviously is not to be considered as
a serious proposal.

Alan Stern

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <Pine.LNX.4.44L0.0912102155390.12136-100000@netrider.rowland.org>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <Pine.LNX.4.44L0.0912102155390.12136-100000@netrider.rowland.org>
@ 2009-12-11 22:17 ` Rafael J. Wysocki
  0 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-11 22:17 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Friday 11 December 2009, Alan Stern wrote:
> Up front: This is my personal view of the matter.  Which probably isn't
> of much interest to anybody, so I won't bother to defend these views or
> comment any further on them.  The decision about what version to use is
> up to the two of you.  The fact is, either implementation would get the 
> job done.
> 
> On Thu, 10 Dec 2009, Linus Torvalds wrote:
> 
> > Completions really are "locks that were initialized to locked". That is, 
> > in fact, how completions came to be: we literally used to use semaphores 
> > for them, and the reason for completions is literally the magic lifetime 
> > rules they have.
> > 
> > So when you do
> > 
> > 	INIT_COMPLETION(dev->power.completion);
> > 
> > that really is historically, logically, and conceptually exactly the same 
> > thing as initializing a lock to the locked state. We literally used to do 
> > it with the equivalent of
> > 
> > 	init_MUTEX_LOCKED()
> > 
> > way back when (well, except we didn't have mutexes back then, we had only 
> > counting semaphores) and instead of "complete()", we had "up()" on the 
> > semaphore to complete it.
> 
> You think of it that way because you have been closely involved in the
> development of the various kinds of locks.  Speaking as an outsider who
> has relatively little interest in the internal details, completions
> appear simpler than rwsems.  Mostly because they have a smaller API:  
> complete() (or complete_all()) and wait_for_completion() as opposed to
> down_read(), up_read(), down_write(), and up_write().

Agreed.

> > > Besides, suppose a device driver wants some off-tree constraints to be
> > > satisfied.
> > 
> > .. and I've told you several times that we should simply not do such 
> > devices asynchronously. At least not unless there is some _overriding_ 
> > reason to. And so far, nobody has suggested anything even remotely 
> > likely for that.
> 
> Agreed.  The fact that async non-tree suspend constraints are difficult 
> with rwsems isn't a drawback if nobody needs to use them.

Well, see my reply to Linus.  The only thing that bothers me is that if we use
rwsems, there's no way to handle that even if it turns out that someone
needs them after all.

> > > Well, why actually do we need to preserve the state of the data structure from
> > > one cycle to another?  There's no need whatsoever.
> > 
> > My point is, with locks, none of that is necessary. Because they 
> > automatically do the right thing.
> > 
> > By picking the right concept, you don't have any of those "oh, we need to 
> > re-initialize things" issues. They just work.
> 
> That's true, but it's not entirely clear.  There are subtle questions
> about what happens if you stop in the middle or a device gets
> unregistered or registered in the middle.  They require careful thought
> in both approaches.
> 
> Having to reinitialize a completion each time doesn't bother me.  It's 
> merely an indication that each suspend & resume is independent of all 
> the others.

YES!

> > > I still don't think there are many places where locks are used in a way you're
> > > suggesting.  I would even say it's quite unusual to use locks this way.
> > 
> > See above. It's what completions _are_.
> 
> This is almost a philosophical issue.  If each A_i must wait for some
> B_j's, is the onus on each A_i to test the B_j's it's interested in?  
> Or is the onus on each B_j to tell the A_i's waiting for it that they
> may proceed?  As Humpty-Dumpty said, "The question is which is to be
> master -- that's all".

Agreed.

> > > Well, I guess your point is that the implementation of completions is much
> > > more complicated that we really need, but I'm not sure if that really hurts.
> > 
> > No. The implementation of completions is actually pretty simple, exactly 
> > because they have that spinlock that is required to protect them. 
> > 
> > That wasn't the point. The point was that locks are actually the "normal" 
> > thing to use. 
> > 
> > You are arguing as if completions are somehow the simpler model. That's 
> > simply not true. Completions are just a _special_case_of_locking_.
> 
> Doesn't that make them simpler by definition?  Special cases always 
> have less to worry about than the general case.

Heh, good point.

> > So why not just use regular locks instead, when it's actually the natural 
> > way to do it, and results in simpler code?
> 
> Simpler but also more subtle, IMO.  If you didn't already know how the
> algorithm worked, figuring it out from the code would be harder with
> rwsems than with completions.

Indeed.

> Partly because of the way readers and
> writers exchange roles in suspend vs. resume, and partly because
> sometimes devices lock themselves and sometimes they lock other
> devices.  With completions each device has its own, and each device
> waits for other devices' completions -- easier to keep track of 
> mentally.

Agreed again.

> (I still think this whole readers vs. writers thing is a red herring.  
> The essential property is that there are two opposing classes of lock
> holders.  The fact that multiple writers can't hold the lock at the
> same time whereas multiple readers can is of no importance; the
> algorithm would work just as well if multiple writers _could_ hold the
> lock simultaneously.)
> 
> Balancing the additional conceptual complexity of the rwsem approach is 
> the conceptual simplicity afforded by not needing to check all the 
> children.  To me this makes it pretty much a toss-up.

Yup.

Thanks!
Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <Pine.LNX.4.44L0.0912101321020.2680-100000@iolanthe.rowland.org>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <Pine.LNX.4.44L0.0912101321020.2680-100000@iolanthe.rowland.org>
@ 2009-12-10 23:51 ` Linus Torvalds
  0 siblings, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-10 23:51 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, LKML, pm list

On Thu, 10 Dec 2009, Alan Stern wrote:
> 
> You probably didn't look closely at the original code in dpm_suspend()  
> and dpm_resume().  It's very awkward; each device is removed from
> dpm_list, operated on, and then added on to a new local list.  At the
> end the new list is spliced back into dpm_list.
> 
> This approach is better because it doesn't involve changing any list 
> pointers while the sleep transition is in progress.  At any rate, I 
> don't recommend doing it in the same patch as the async stuff; it 
> should be done separately.  Either before or after -- the two are 
> independent.

I do agree with the "independent" part. But I don't agree about the 
awkwardness per se.

Sure, it moves things back and forth and has private lists, but that's 
actually a fairly standard thing to do in those kinds of situations where 
you're taking something off a list, operating on it, and may need to put 
it back on the same list eventually. The VM layer does similar things.

So that's why I think your version was actually odder - the existing list 
manipulation isn't all that odd. It has that strange "did we get removed 
while we dropped the lock and tried to suspend the device" thing, of 
course, but that's not entirely unheard of either.

Could it be done more cleanly? I think so, but I agree with you that it's 
likely a separate issue.

I _suspect_, for example, that we could just do something like, the 
appended to avoid _some_ of the subtlety. IOW, just move the device to the 
local list early - and if it gets removed while being suspended, it will 
automatically get removed from the local list (the remover doesn't care 
_what_ list it is on whe it does a 'list_del(power.entr)').

UNTESTED PATCH! This may be total crap, of course. But it _looks_ like an 
"ObviousCleanup(tm)" - famous last words.

		Linus

---
 drivers/base/power/main.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 8aa2443..f2bb493 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -687,6 +687,7 @@ static int dpm_suspend(pm_message_t state)
 		struct device *dev = to_device(dpm_list.prev);

 		get_device(dev);
+		list_move(&dev->power.entry, &list);
 		mutex_unlock(&dpm_list_mtx);

 		error = device_suspend(dev, state);
@@ -698,8 +699,6 @@ static int dpm_suspend(pm_message_t state)
 			break;
 		}
 		dev->power.status = DPM_OFF;
-		if (!list_empty(&dev->power.entry))
-			list_move(&dev->power.entry, &list);
 		put_device(dev);
 	}
 	list_splice(&list, dpm_list.prev);

^ permalink raw reply related	[flat|nested] 98+ messages in thread

[parent not found: <Pine.LNX.4.44L0.0912101653120.2680-100000@iolanthe.rowland.org>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <Pine.LNX.4.44L0.0912101653120.2680-100000@iolanthe.rowland.org>
@ 2009-12-10 23:45 ` Rafael J. Wysocki
  0 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-10 23:45 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Thursday 10 December 2009, Alan Stern wrote:
> On Thu, 10 Dec 2009, Rafael J. Wysocki wrote:
> 
> > > You should see how badly lockdep complains about the rwsems.  If it 
> > > really doesn't like them then using completions makes sense.
> > 
> > It does complain about them, but when the nested _down operations are marked
> > as nested, it stops complaining (that's in the version where there's no async
> > in the _noirq phases).
> 
> Did you set the async_suspend flag for any devices during the test?  

Yes.  All ACPI, all PCI, all serio, as usual. ;-)

> And did you run more than one suspend/resume cycle?

Sure.  Actually, I test it in the /sys/power/pm_test = core mode, but that
shouldn't really matter.

> > +extern int __dpm_wait(struct device *dev, void *ign);
> > +
> > +static inline void dpm_wait(struct device *dev)
> > +{
> > +	__dpm_wait(dev, NULL);
> > +}
> 
> Sorry, I intended to mention this before but forgot.  This design is
> inelegant.  You shouldn't have inlines calling functions with extra
> unused arguments; they just waste code space.  Make dpm_wait() be a
> real routine and add a shim to the device_for_each_child() loop.

I thought about that myself, done now.

> > @@ -366,7 +388,7 @@ void dpm_resume_noirq(pm_message_t state
> >  
> >  	mutex_lock(&dpm_list_mtx);
> >  	transition_started = false;
> > -	list_for_each_entry(dev, &dpm_list, power.entry)
> > +	list_for_each_entry(dev, &dpm_list, power.entry) {
> >  		if (dev->power.status > DPM_OFF) {
> >  			int error;
> >  
> > @@ -375,23 +397,27 @@ void dpm_resume_noirq(pm_message_t state
> >  			if (error)
> >  				pm_dev_err(dev, state, " early", error);
> >  		}
> > +		/* Needed by the subsequent dpm_resume(). */
> > +		INIT_COMPLETION(dev->power.completion);
> 
> You're still doing it.  Don't initialize the completions in a totally
> different phase!  Initialize them directly before they are used.  
> Namely, at the start of device_resume() and device_suspend().

The idea was to initialize them all at the same time, before entering the
phase in which they were used, but I came to the conclusion that this was not
necessary, because the dpm_list ordering was such that the devices to be waited
for would always have their completions reinitialized before starting
__device_suspend() or __device_resume() for the waiting ones.

> One more thing.  A logical time to check for errors is just after
> waiting for the children in __device_suspend(), instead of beforehand 
> in async_suspend().  After all, if an error occurs then it's likely to 
> happen while we are waiting.

Good idea, done.

Updated patch is appended.

Rafael


---
 drivers/base/power/main.c    |  106 ++++++++++++++++++++++++++++++++++++++++---
 include/linux/device.h       |    6 ++
 include/linux/pm.h           |    7 ++
 include/linux/resume-trace.h |    7 ++
 4 files changed, 121 insertions(+), 5 deletions(-)

Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -26,6 +26,7 @@
 #include <linux/spinlock.h>
 #include <linux/wait.h>
 #include <linux/timer.h>
+#include <linux/completion.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -412,9 +413,11 @@ struct dev_pm_info {
 	pm_message_t		power_state;
 	unsigned int		can_wakeup:1;
 	unsigned int		should_wakeup:1;
+	unsigned		async_suspend:1;
 	enum dpm_state		status;		/* Owned by the PM core */
 #ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
+	struct completion	completion;
 #endif
 #ifdef CONFIG_PM_RUNTIME
 	struct timer_list	suspend_timer;
@@ -508,6 +511,8 @@ extern void __suspend_report_result(cons
 		__suspend_report_result(__func__, fn, ret);		\
 	} while (0)
 
+extern void dpm_wait(struct device *dev);
+
 #else /* !CONFIG_PM_SLEEP */
 
 #define device_pm_lock() do {} while (0)
@@ -520,6 +525,8 @@ static inline int dpm_suspend_start(pm_m
 
 #define suspend_report_result(fn, ret)		do {} while (0)
 
+static inline void dpm_wait(struct device *dev) {}
+
 #endif /* !CONFIG_PM_SLEEP */
 
 /* How to reorder dpm_list after device_move() */
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -25,6 +25,7 @@
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
+#include <linux/async.h>
 
 #include "../base.h"
 #include "power.h"
@@ -42,6 +43,7 @@
 LIST_HEAD(dpm_list);
 
 static DEFINE_MUTEX(dpm_list_mtx);
+static pm_message_t pm_transition;
 
 /*
  * Set once the preparation of devices for a PM transition has started, reset
@@ -56,6 +58,7 @@ static bool transition_started;
 void device_pm_init(struct device *dev)
 {
 	dev->power.status = DPM_ON;
+	init_completion(&dev->power.completion);
 	pm_runtime_init(dev);
 }
 
@@ -111,6 +114,7 @@ void device_pm_remove(struct device *dev
 	pr_debug("PM: Removing info for %s:%s\n",
 		 dev->bus ? dev->bus->name : "No Bus",
 		 kobject_name(&dev->kobj));
+	complete_all(&dev->power.completion);
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
 	mutex_unlock(&dpm_list_mtx);
@@ -162,6 +166,28 @@ void device_pm_move_last(struct device *
 }
 
 /**
+ * dpm_wait - Wait for a PM operation to complete.
+ * @dev: Device to wait for.
+ */
+void dpm_wait(struct device *dev)
+{
+	if (dev)
+		wait_for_completion(&dev->power.completion);
+}
+EXPORT_SYMBOL_GPL(dpm_wait);
+
+static int dpm_wait_fn(struct device *dev, void *ignore)
+{
+	dpm_wait(dev);
+	return 0;
+}
+
+static void dpm_wait_for_children(struct device *dev)
+{
+       device_for_each_child(dev, NULL, dpm_wait_fn);
+}
+
+/**
  * pm_op - Execute the PM operation appropriate for given PM event.
  * @dev: Device to handle.
  * @ops: PM operations to choose from.
@@ -381,17 +407,18 @@ void dpm_resume_noirq(pm_message_t state
 EXPORT_SYMBOL_GPL(dpm_resume_noirq);
 
 /**
- * device_resume - Execute "resume" callbacks for given device.
+ * __device_resume - Execute "resume" callbacks for given device.
  * @dev: Device to handle.
  * @state: PM transition of the system being carried out.
  */
-static int device_resume(struct device *dev, pm_message_t state)
+static int __device_resume(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
 	TRACE_DEVICE(dev);
 	TRACE_RESUME(0);
 
+	dpm_wait(dev->parent);
 	down(&dev->sem);
 
 	if (dev->bus) {
@@ -426,11 +453,34 @@ static int device_resume(struct device *
 	}
  End:
 	up(&dev->sem);
+	complete_all(&dev->power.completion);
 
 	TRACE_RESUME(error);
 	return error;
 }
 
+static void async_resume(void *data, async_cookie_t cookie)
+{
+	struct device *dev = (struct device *)data;
+	int error;
+
+	error = __device_resume(dev, pm_transition);
+	if (error)
+		pm_dev_err(dev, pm_transition, " async", error);
+	put_device(dev);
+}
+
+static int device_resume(struct device *dev)
+{
+	if (dev->power.async_suspend && !pm_trace_is_enabled()) {
+		get_device(dev);
+		async_schedule(async_resume, dev);
+		return 0;
+	}
+
+	return __device_resume(dev, pm_transition);
+}
+
 /**
  * dpm_resume - Execute "resume" callbacks for non-sysdev devices.
  * @state: PM transition of the system being carried out.
@@ -444,6 +494,7 @@ static void dpm_resume(pm_message_t stat
 
 	INIT_LIST_HEAD(&list);
 	mutex_lock(&dpm_list_mtx);
+	pm_transition = state;
 	while (!list_empty(&dpm_list)) {
 		struct device *dev = to_device(dpm_list.next);
 
@@ -451,10 +502,11 @@ static void dpm_resume(pm_message_t stat
 		if (dev->power.status >= DPM_OFF) {
 			int error;
 
+			INIT_COMPLETION(dev->power.completion);
 			dev->power.status = DPM_RESUMING;
 			mutex_unlock(&dpm_list_mtx);
 
-			error = device_resume(dev, state);
+			error = device_resume(dev);
 
 			mutex_lock(&dpm_list_mtx);
 			if (error)
@@ -469,6 +521,7 @@ static void dpm_resume(pm_message_t stat
 	}
 	list_splice(&list, &dpm_list);
 	mutex_unlock(&dpm_list_mtx);
+	async_synchronize_full();
 }
 
 /**
@@ -623,17 +676,23 @@ int dpm_suspend_noirq(pm_message_t state
 }
 EXPORT_SYMBOL_GPL(dpm_suspend_noirq);
 
+static int async_error;
+
 /**
  * device_suspend - Execute "suspend" callbacks for given device.
  * @dev: Device to handle.
  * @state: PM transition of the system being carried out.
  */
-static int device_suspend(struct device *dev, pm_message_t state)
+static int __device_suspend(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
+	dpm_wait_for_children(dev);
 	down(&dev->sem);
 
+	if (async_error)
+		goto End;
+
 	if (dev->class) {
 		if (dev->class->pm) {
 			pm_dev_dbg(dev, state, "class ");
@@ -666,12 +725,42 @@ static int device_suspend(struct device 
 			suspend_report_result(dev->bus->suspend, error);
 		}
 	}
+
+	if (!error)
+		dev->power.status = DPM_OFF;
+
  End:
 	up(&dev->sem);
+	complete_all(&dev->power.completion);
 
 	return error;
 }
 
+static void async_suspend(void *data, async_cookie_t cookie)
+{
+	struct device *dev = (struct device *)data;
+	int error;
+
+	error = __device_suspend(dev, pm_transition);
+	if (error) {
+		pm_dev_err(dev, pm_transition, " async", error);
+		async_error = error;
+	}
+
+	put_device(dev);
+}
+
+static int device_suspend(struct device *dev, pm_message_t state)
+{
+	if (dev->power.async_suspend) {
+		get_device(dev);
+		async_schedule(async_suspend, dev);
+		return 0;
+	}
+
+	return __device_suspend(dev, pm_transition);
+}
+
 /**
  * dpm_suspend - Execute "suspend" callbacks for all non-sysdev devices.
  * @state: PM transition of the system being carried out.
@@ -683,10 +772,12 @@ static int dpm_suspend(pm_message_t stat
 
 	INIT_LIST_HEAD(&list);
 	mutex_lock(&dpm_list_mtx);
+	pm_transition = state;
 	while (!list_empty(&dpm_list)) {
 		struct device *dev = to_device(dpm_list.prev);
 
 		get_device(dev);
+		INIT_COMPLETION(dev->power.completion);
 		mutex_unlock(&dpm_list_mtx);
 
 		error = device_suspend(dev, state);
@@ -697,13 +788,17 @@ static int dpm_suspend(pm_message_t stat
 			put_device(dev);
 			break;
 		}
-		dev->power.status = DPM_OFF;
 		if (!list_empty(&dev->power.entry))
 			list_move(&dev->power.entry, &list);
 		put_device(dev);
+		if (async_error)
+			break;
 	}
 	list_splice(&list, dpm_list.prev);
 	mutex_unlock(&dpm_list_mtx);
+	async_synchronize_full();
+	if (!error)
+		error = async_error;
 	return error;
 }
 
@@ -762,6 +857,7 @@ static int dpm_prepare(pm_message_t stat
 	INIT_LIST_HEAD(&list);
 	mutex_lock(&dpm_list_mtx);
 	transition_started = true;
+	async_error = 0;
 	while (!list_empty(&dpm_list)) {
 		struct device *dev = to_device(dpm_list.next);
 
Index: linux-2.6/include/linux/resume-trace.h
===================================================================
--- linux-2.6.orig/include/linux/resume-trace.h
+++ linux-2.6/include/linux/resume-trace.h
@@ -6,6 +6,11 @@
 
 extern int pm_trace_enabled;
 
+static inline int pm_trace_is_enabled(void)
+{
+       return pm_trace_enabled;
+}
+
 struct device;
 extern void set_trace_device(struct device *);
 extern void generate_resume_trace(const void *tracedata, unsigned int user);
@@ -17,6 +22,8 @@ extern void generate_resume_trace(const 
 
 #else
 
+static inline int pm_trace_is_enabled(void) { return 0; }
+
 #define TRACE_DEVICE(dev) do { } while (0)
 #define TRACE_RESUME(dev) do { } while (0)
 
Index: linux-2.6/include/linux/device.h
===================================================================
--- linux-2.6.orig/include/linux/device.h
+++ linux-2.6/include/linux/device.h
@@ -472,6 +472,12 @@ static inline int device_is_registered(s
 	return dev->kobj.state_in_sysfs;
 }
 
+static inline void device_enable_async_suspend(struct device *dev, bool enable)
+{
+	if (dev->power.status == DPM_ON)
+		dev->power.async_suspend = enable;
+}
+
 void driver_init(void);
 
 /*

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912102214.40310.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <200912102214.40310.rjw@sisk.pl>
@ 2009-12-10 22:17 ` Alan Stern
  0 siblings, 0 replies; 98+ messages in thread
From: Alan Stern @ 2009-12-10 22:17 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Thu, 10 Dec 2009, Rafael J. Wysocki wrote:

> > You should see how badly lockdep complains about the rwsems.  If it 
> > really doesn't like them then using completions makes sense.
> 
> It does complain about them, but when the nested _down operations are marked
> as nested, it stops complaining (that's in the version where there's no async
> in the _noirq phases).

Did you set the async_suspend flag for any devices during the test?  
And did you run more than one suspend/resume cycle?

> +extern int __dpm_wait(struct device *dev, void *ign);
> +
> +static inline void dpm_wait(struct device *dev)
> +{
> +	__dpm_wait(dev, NULL);
> +}

Sorry, I intended to mention this before but forgot.  This design is
inelegant.  You shouldn't have inlines calling functions with extra
unused arguments; they just waste code space.  Make dpm_wait() be a
real routine and add a shim to the device_for_each_child() loop.

> @@ -366,7 +388,7 @@ void dpm_resume_noirq(pm_message_t state
>  
>  	mutex_lock(&dpm_list_mtx);
>  	transition_started = false;
> -	list_for_each_entry(dev, &dpm_list, power.entry)
> +	list_for_each_entry(dev, &dpm_list, power.entry) {
>  		if (dev->power.status > DPM_OFF) {
>  			int error;
>  
> @@ -375,23 +397,27 @@ void dpm_resume_noirq(pm_message_t state
>  			if (error)
>  				pm_dev_err(dev, state, " early", error);
>  		}
> +		/* Needed by the subsequent dpm_resume(). */
> +		INIT_COMPLETION(dev->power.completion);

You're still doing it.  Don't initialize the completions in a totally
different phase!  Initialize them directly before they are used.  
Namely, at the start of device_resume() and device_suspend().

One more thing.  A logical time to check for errors is just after
waiting for the children in __device_suspend(), instead of beforehand 
in async_suspend().  After all, if an error occurs then it's likely to 
happen while we are waiting.

Alan Stern

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <Pine.LNX.4.44L0.0912101010090.2825-100000@iolanthe.rowland.org>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <Pine.LNX.4.44L0.0912101010090.2825-100000@iolanthe.rowland.org>
@ 2009-12-10 15:45 ` Linus Torvalds
  2009-12-10 21:14 ` Rafael J. Wysocki
  1 sibling, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-10 15:45 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, LKML, pm list

On Thu, 10 Dec 2009, Alan Stern wrote:
> 
> In device_pm_remove():
> 
> 	mutex_lock(&dpm_list_mtx);
> 	if (dev == dpm_next)
> 		dpm_next = to_device(dpm_iterate_forward ?
> 			dev->power.entry.next : dev->power.entry.prev);
> 	list_del_init(&dev->power.entry);
> 	mutex_unlock(&dpm_list_mtx);

I'm really not seeing the point - it's much better to hardcode the 
ordering in the place you use it (where it is static and the compiler can 
generate bette code) than to do some dynamic choice that depends on some 
fake flag - especially a global one.

Also, quite frankly, error handling needs to be separated out of the whole 
async patch, and needs to be thought about a lot more. And I would 
seriously argue that if you have any async suspends, then those async 
suspends are _not_ allowed to fail. At least not initially 

Having async failures and trying to fix them up is just a disaster. Which 
ones actually failed, and which ones were aborted before they even really 
got to their suspend routines? Which ones do you try to resume?

IOW, it needs way more thought than what has clearly happened so far. And 
once more, I will refuse to merge anything that is complicated for no 
actual reason (where reason is "real life, and tested to make a big 
difference", not some hand-waving)

		Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <Pine.LNX.4.44L0.0912101010090.2825-100000@iolanthe.rowland.org>
  2009-12-10 15:45 ` Linus Torvalds
@ 2009-12-10 21:14 ` Rafael J. Wysocki
  1 sibling, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-10 21:14 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Thursday 10 December 2009, Alan Stern wrote:
> On Thu, 10 Dec 2009, Rafael J. Wysocki wrote:
> 
> > > How about CONFIG_PROVE_LOCKING?  If lockdep really does start 
> > > complaining then switching to completions would be a simple way to 
> > > appease it.
> > 
> > Ah, that one is not set.  I guess I'll try it later, although I've already
> > decided to use completions anyway.
> 
> You should see how badly lockdep complains about the rwsems.  If it 
> really doesn't like them then using completions makes sense.

It does complain about them, but when the nested _down operations are marked
as nested, it stops complaining (that's in the version where there's no async
in the _noirq phases).

> > Index: linux-2.6/drivers/base/power/main.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/base/power/main.c
> > +++ linux-2.6/drivers/base/power/main.c
> > @@ -56,6 +58,7 @@ static bool transition_started;
> >  void device_pm_init(struct device *dev)
> >  {
> >  	dev->power.status = DPM_ON;
> > +	init_completion(&dev->power.completion);
> >  	pm_runtime_init(dev);
> >  }
> 
> You need a matching complete_all() in device_pm_remove(), in case 
> someone else is waiting for the device when it gets unregistered.

Right, added.

> > +/**
> > + * dpm_synchronize - Wait for PM callbacks of all devices to complete.
> > + */
> > +static void dpm_synchronize(void)
> > +{
> > +	struct device *dev;
> > +
> > +	async_synchronize_full();
> > +
> > +	mutex_lock(&dpm_list_mtx);
> > +	list_for_each_entry(dev, &dpm_list, power.entry)
> > +		INIT_COMPLETION(dev->power.completion);
> > +	mutex_unlock(&dpm_list_mtx);
> > +}
> 
> I agree with Linus, initializing the completions here is weird.  You
> should initialize them just before using them.

I removed that completely and now the INIT_COMPLETION() is always done in the
preceding phase.

> > @@ -683,6 +786,7 @@ static int dpm_suspend(pm_message_t stat
> >  
> >  	INIT_LIST_HEAD(&list);
> >  	mutex_lock(&dpm_list_mtx);
> > +	pm_transition = state;
> >  	while (!list_empty(&dpm_list)) {
> >  		struct device *dev = to_device(dpm_list.prev);
> >  
> > @@ -697,13 +801,18 @@ static int dpm_suspend(pm_message_t stat
> >  			put_device(dev);
> >  			break;
> >  		}
> > -		dev->power.status = DPM_OFF;
> >  		if (!list_empty(&dev->power.entry))
> >  			list_move(&dev->power.entry, &list);
> >  		put_device(dev);
> > +		error = atomic_read(&async_error);
> > +		if (error)
> > +			break;
> >  	}
> >  	list_splice(&list, dpm_list.prev);
> 
> Here's something you might want to do in a later patch.  These awkward 
> list-pointer manipulations can be simplified as follows:

Well, I'm not sure if that's more straightforward.

Anyway, as you said, that's something for a different patch. :-)

Below is an updated version of the $subject one.  I don't use the atomic_t for
async_error any more and (apart from this fixed issue) I don't see any problems
in the suspend error path now.

Rafael


---
 drivers/base/power/main.c    |  113 ++++++++++++++++++++++++++++++++++++++++---
 include/linux/device.h       |    6 ++
 include/linux/pm.h           |   12 ++++
 include/linux/resume-trace.h |    7 ++
 4 files changed, 131 insertions(+), 7 deletions(-)

Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -26,6 +26,7 @@
 #include <linux/spinlock.h>
 #include <linux/wait.h>
 #include <linux/timer.h>
+#include <linux/completion.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -412,9 +413,11 @@ struct dev_pm_info {
 	pm_message_t		power_state;
 	unsigned int		can_wakeup:1;
 	unsigned int		should_wakeup:1;
+	unsigned		async_suspend:1;
 	enum dpm_state		status;		/* Owned by the PM core */
 #ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
+	struct completion	completion;
 #endif
 #ifdef CONFIG_PM_RUNTIME
 	struct timer_list	suspend_timer;
@@ -508,6 +511,13 @@ extern void __suspend_report_result(cons
 		__suspend_report_result(__func__, fn, ret);		\
 	} while (0)
 
+extern int __dpm_wait(struct device *dev, void *ign);
+
+static inline void dpm_wait(struct device *dev)
+{
+	__dpm_wait(dev, NULL);
+}
+
 #else /* !CONFIG_PM_SLEEP */
 
 #define device_pm_lock() do {} while (0)
@@ -520,6 +530,8 @@ static inline int dpm_suspend_start(pm_m
 
 #define suspend_report_result(fn, ret)		do {} while (0)
 
+static inline void dpm_wait(struct device *dev) {}
+
 #endif /* !CONFIG_PM_SLEEP */
 
 /* How to reorder dpm_list after device_move() */
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -25,6 +25,7 @@
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
+#include <linux/async.h>
 
 #include "../base.h"
 #include "power.h"
@@ -42,6 +43,7 @@
 LIST_HEAD(dpm_list);
 
 static DEFINE_MUTEX(dpm_list_mtx);
+static pm_message_t pm_transition;
 
 /*
  * Set once the preparation of devices for a PM transition has started, reset
@@ -56,6 +58,7 @@ static bool transition_started;
 void device_pm_init(struct device *dev)
 {
 	dev->power.status = DPM_ON;
+	init_completion(&dev->power.completion);
 	pm_runtime_init(dev);
 }
 
@@ -111,6 +114,7 @@ void device_pm_remove(struct device *dev
 	pr_debug("PM: Removing info for %s:%s\n",
 		 dev->bus ? dev->bus->name : "No Bus",
 		 kobject_name(&dev->kobj));
+	complete_all(&dev->power.completion);
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
 	mutex_unlock(&dpm_list_mtx);
@@ -162,6 +166,24 @@ void device_pm_move_last(struct device *
 }
 
 /**
+ * __dpm_wait - Wait for a PM operation to complete.
+ * @dev: Device to wait for.
+ * @ign: This value is not used by the function.
+ */
+int __dpm_wait(struct device *dev, void *ign)
+{
+	if (dev)
+		wait_for_completion(&dev->power.completion);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__dpm_wait);
+
+static void dpm_wait_for_children(struct device *dev)
+{
+       device_for_each_child(dev, NULL, __dpm_wait);
+}
+
+/**
  * pm_op - Execute the PM operation appropriate for given PM event.
  * @dev: Device to handle.
  * @ops: PM operations to choose from.
@@ -366,7 +388,7 @@ void dpm_resume_noirq(pm_message_t state
 
 	mutex_lock(&dpm_list_mtx);
 	transition_started = false;
-	list_for_each_entry(dev, &dpm_list, power.entry)
+	list_for_each_entry(dev, &dpm_list, power.entry) {
 		if (dev->power.status > DPM_OFF) {
 			int error;
 
@@ -375,23 +397,27 @@ void dpm_resume_noirq(pm_message_t state
 			if (error)
 				pm_dev_err(dev, state, " early", error);
 		}
+		/* Needed by the subsequent dpm_resume(). */
+		INIT_COMPLETION(dev->power.completion);
+	}
 	mutex_unlock(&dpm_list_mtx);
 	resume_device_irqs();
 }
 EXPORT_SYMBOL_GPL(dpm_resume_noirq);
 
 /**
- * device_resume - Execute "resume" callbacks for given device.
+ * __device_resume - Execute "resume" callbacks for given device.
  * @dev: Device to handle.
  * @state: PM transition of the system being carried out.
  */
-static int device_resume(struct device *dev, pm_message_t state)
+static int __device_resume(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
 	TRACE_DEVICE(dev);
 	TRACE_RESUME(0);
 
+	dpm_wait(dev->parent);
 	down(&dev->sem);
 
 	if (dev->bus) {
@@ -426,11 +452,34 @@ static int device_resume(struct device *
 	}
  End:
 	up(&dev->sem);
+	complete_all(&dev->power.completion);
 
 	TRACE_RESUME(error);
 	return error;
 }
 
+static void async_resume(void *data, async_cookie_t cookie)
+{
+	struct device *dev = (struct device *)data;
+	int error;
+
+	error = __device_resume(dev, pm_transition);
+	if (error)
+		pm_dev_err(dev, pm_transition, " async", error);
+	put_device(dev);
+}
+
+static int device_resume(struct device *dev)
+{
+	if (dev->power.async_suspend && !pm_trace_is_enabled()) {
+		get_device(dev);
+		async_schedule(async_resume, dev);
+		return 0;
+	}
+
+	return __device_resume(dev, pm_transition);
+}
+
 /**
  * dpm_resume - Execute "resume" callbacks for non-sysdev devices.
  * @state: PM transition of the system being carried out.
@@ -444,6 +493,7 @@ static void dpm_resume(pm_message_t stat
 
 	INIT_LIST_HEAD(&list);
 	mutex_lock(&dpm_list_mtx);
+	pm_transition = state;
 	while (!list_empty(&dpm_list)) {
 		struct device *dev = to_device(dpm_list.next);
 
@@ -454,7 +504,7 @@ static void dpm_resume(pm_message_t stat
 			dev->power.status = DPM_RESUMING;
 			mutex_unlock(&dpm_list_mtx);
 
-			error = device_resume(dev, state);
+			error = device_resume(dev);
 
 			mutex_lock(&dpm_list_mtx);
 			if (error)
@@ -469,6 +519,7 @@ static void dpm_resume(pm_message_t stat
 	}
 	list_splice(&list, &dpm_list);
 	mutex_unlock(&dpm_list_mtx);
+	async_synchronize_full();
 }
 
 /**
@@ -623,15 +674,18 @@ int dpm_suspend_noirq(pm_message_t state
 }
 EXPORT_SYMBOL_GPL(dpm_suspend_noirq);
 
+static int async_error;
+
 /**
  * device_suspend - Execute "suspend" callbacks for given device.
  * @dev: Device to handle.
  * @state: PM transition of the system being carried out.
  */
-static int device_suspend(struct device *dev, pm_message_t state)
+static int __device_suspend(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
+	dpm_wait_for_children(dev);
 	down(&dev->sem);
 
 	if (dev->class) {
@@ -666,12 +720,48 @@ static int device_suspend(struct device 
 			suspend_report_result(dev->bus->suspend, error);
 		}
 	}
+
+	if (!error)
+		dev->power.status = DPM_OFF;
+
  End:
 	up(&dev->sem);
+	complete_all(&dev->power.completion);
 
 	return error;
 }
 
+static void async_suspend(void *data, async_cookie_t cookie)
+{
+	struct device *dev = (struct device *)data;
+	int error;
+
+	if (async_error) {
+		complete_all(&dev->power.completion);
+		goto End;
+	}
+
+	error = __device_suspend(dev, pm_transition);
+	if (error) {
+		pm_dev_err(dev, pm_transition, " async", error);
+		async_error = error;
+	}
+
+ End:
+	put_device(dev);
+}
+
+static int device_suspend(struct device *dev, pm_message_t state)
+{
+	if (dev->power.async_suspend) {
+		get_device(dev);
+		async_schedule(async_suspend, dev);
+		return 0;
+	}
+
+	return __device_suspend(dev, pm_transition);
+}
+
 /**
  * dpm_suspend - Execute "suspend" callbacks for all non-sysdev devices.
  * @state: PM transition of the system being carried out.
@@ -683,6 +773,7 @@ static int dpm_suspend(pm_message_t stat
 
 	INIT_LIST_HEAD(&list);
 	mutex_lock(&dpm_list_mtx);
+	pm_transition = state;
 	while (!list_empty(&dpm_list)) {
 		struct device *dev = to_device(dpm_list.prev);
 
@@ -697,13 +788,17 @@ static int dpm_suspend(pm_message_t stat
 			put_device(dev);
 			break;
 		}
-		dev->power.status = DPM_OFF;
 		if (!list_empty(&dev->power.entry))
 			list_move(&dev->power.entry, &list);
 		put_device(dev);
+		if (async_error)
+			break;
 	}
 	list_splice(&list, dpm_list.prev);
 	mutex_unlock(&dpm_list_mtx);
+	async_synchronize_full();
+	if (!error)
+		error = async_error;
 	return error;
 }
 
@@ -762,6 +857,7 @@ static int dpm_prepare(pm_message_t stat
 	INIT_LIST_HEAD(&list);
 	mutex_lock(&dpm_list_mtx);
 	transition_started = true;
+	async_error = 0;
 	while (!list_empty(&dpm_list)) {
 		struct device *dev = to_device(dpm_list.next);
 
@@ -793,8 +889,11 @@ static int dpm_prepare(pm_message_t stat
 			break;
 		}
 		dev->power.status = DPM_SUSPENDING;
-		if (!list_empty(&dev->power.entry))
+		if (!list_empty(&dev->power.entry)) {
 			list_move_tail(&dev->power.entry, &list);
+			/* Needed by the subsequent dpm_suspend(). */
+			INIT_COMPLETION(dev->power.completion);
+		}
 		put_device(dev);
 	}
 	list_splice(&list, &dpm_list);
Index: linux-2.6/include/linux/resume-trace.h
===================================================================
--- linux-2.6.orig/include/linux/resume-trace.h
+++ linux-2.6/include/linux/resume-trace.h
@@ -6,6 +6,11 @@
 
 extern int pm_trace_enabled;
 
+static inline int pm_trace_is_enabled(void)
+{
+       return pm_trace_enabled;
+}
+
 struct device;
 extern void set_trace_device(struct device *);
 extern void generate_resume_trace(const void *tracedata, unsigned int user);
@@ -17,6 +22,8 @@ extern void generate_resume_trace(const 
 
 #else
 
+static inline int pm_trace_is_enabled(void) { return 0; }
+
 #define TRACE_DEVICE(dev) do { } while (0)
 #define TRACE_RESUME(dev) do { } while (0)
 
Index: linux-2.6/include/linux/device.h
===================================================================
--- linux-2.6.orig/include/linux/device.h
+++ linux-2.6/include/linux/device.h
@@ -472,6 +472,12 @@ static inline int device_is_registered(s
 	return dev->kobj.state_in_sysfs;
 }
 
+static inline void device_enable_async_suspend(struct device *dev, bool enable)
+{
+	if (dev->power.status == DPM_ON)
+		dev->power.async_suspend = enable;
+}
+
 void driver_init(void);
 
 /*

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <alpine.LFD.2.00.0912100739260.3560@localhost.localdomain>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <alpine.LFD.2.00.0912100739260.3560@localhost.localdomain>
@ 2009-12-10 18:37 ` Alan Stern
  0 siblings, 0 replies; 98+ messages in thread
From: Alan Stern @ 2009-12-10 18:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Thu, 10 Dec 2009, Linus Torvalds wrote:

> 
> 
> On Thu, 10 Dec 2009, Alan Stern wrote:
> > 
> > In device_pm_remove():
> > 
> > 	mutex_lock(&dpm_list_mtx);
> > 	if (dev == dpm_next)
> > 		dpm_next = to_device(dpm_iterate_forward ?
> > 			dev->power.entry.next : dev->power.entry.prev);
> > 	list_del_init(&dev->power.entry);
> > 	mutex_unlock(&dpm_list_mtx);
> 
> I'm really not seeing the point - it's much better to hardcode the 
> ordering in the place you use it (where it is static and the compiler can 
> generate bette code) than to do some dynamic choice that depends on some 
> fake flag - especially a global one.

You probably didn't look closely at the original code in dpm_suspend()  
and dpm_resume().  It's very awkward; each device is removed from
dpm_list, operated on, and then added on to a new local list.  At the
end the new list is spliced back into dpm_list.

This approach is better because it doesn't involve changing any list 
pointers while the sleep transition is in progress.  At any rate, I 
don't recommend doing it in the same patch as the async stuff; it 
should be done separately.  Either before or after -- the two are 
independent.

> Also, quite frankly, error handling needs to be separated out of the whole 
> async patch, and needs to be thought about a lot more. And I would 
> seriously argue that if you have any async suspends, then those async 
> suspends are _not_ allowed to fail. At least not initially 
> 
> Having async failures and trying to fix them up is just a disaster. Which 
> ones actually failed, and which ones were aborted before they even really 
> got to their suspend routines? Which ones do you try to resume?

We record the status of each device; dev->power.status stores different
values depending on whether the device suspend succeeded or failed.  
The value will be correct and up-to-date after async_synchronize_full()
returns.  The value is used in dpm_resume() to decide which devices
need their resume methods called.  I don't see any problems there.

> IOW, it needs way more thought than what has clearly happened so far. And 
> once more, I will refuse to merge anything that is complicated for no 
> actual reason (where reason is "real life, and tested to make a big 
> difference", not some hand-waving)

I don't think the error handling requires more than minimal changes.

The whole atomic_t thing was overkill.  It probably stemmed from a
discussion some time back with Pavel Machek about concurrent writes to
a single variable.  I claimed that concurrent writes to a properly
aligned pointer, int, or long would never create a "mash-up"; that is,
readers would see either the original value or one of the new values
but never some weird combination of bits.

Alan Cox pointed out that while this was technically correct, there's
nothing to prevent the compiler from translating

	a = b + c;

into something like:

	load b, R1
	store R1, a
	load c, R1
	add R1, a

in which case readers might see the intermediate value.  (Okay, the
compiler would have to be pretty stupid to do this with such a simple
expression, but it could happen with more complicated expressions.)  
Pavel favored always using atomic types when there could be concurrent
writes, and apparently Rafael was following his advice.

Alan Stern

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <Pine.LNX.4.44L0.0912091729530.2672-100000@iolanthe.rowland.org>]

* Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] <Pine.LNX.4.44L0.0912091729530.2672-100000@iolanthe.rowland.org>
@ 2009-12-09 23:18 ` Rafael J. Wysocki
       [not found] ` <200912100018.19723.rjw@sisk.pl>
  1 sibling, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-09 23:18 UTC (permalink / raw)
  To: Alan Stern; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Wednesday 09 December 2009, Alan Stern wrote:
> On Wed, 9 Dec 2009, Rafael J. Wysocki wrote:
> 
> > On Wednesday 09 December 2009, Alan Stern wrote:
> > > On Tue, 8 Dec 2009, Rafael J. Wysocki wrote:
> > > 
> > > > For completness, below is the full async suspend/resume patch with rwlocks,
> > > > that has been (very slightly) tested and doesn't seem to break things.
> > > > 
> > > > [Note to Alan: lockdep doesn't seem to complain about the not annotated nested
> > > > locks.]
> > > 
> > > I can't imagine why not.  And wouldn't lockdep get confused by the fact
> > > that in the async case, the rwsems are released by a different process
> > > from the one that acquired them?
> > 
> > /me looks at the .config
> > 
> > I have CONFIG_LOCKDEP_SUPPORT set, is there anything else I need to set
> > in .config?
> 
> How about CONFIG_PROVE_LOCKING?  If lockdep really does start 
> complaining then switching to completions would be a simple way to 
> appease it.

Ah, that one is not set.  I guess I'll try it later, although I've already
decided to use completions anyway.

...
> > > How about exporting a wait_for_device_to_resume() routine?  Drivers
> > > could call it for non-tree resume constraints:
> > > 
> > > 	void wait_for_device_to_resume(struct device *other)
> > > 	{
> > > 		down_read(&other->power.rwsem);
> > > 		up_read(&other->power.rwsem);
> > > 	}
> > > 
> > > Unfortunately there is no equivalent for non-tree suspend constraints.
> > 
> > If we use completions, it will be possible to just export something like
> > 
> > dpm_wait(dev)
> > {
> >         if (dev)
> >                 wait_for_completion(dev->power.completion);
> > }
> > 
> > I think.  It appears that will also work for suspend, unless I'm missing
> > something.
> 
> It will.

Completions it is, then.

Additionally, I've removed the async support from the _noirq parts and moved
the setting of power.status on suspend to __device_suspend().  The result is
appended.

Rafael


---
 drivers/base/power/main.c    |  124 ++++++++++++++++++++++++++++++++++++++++---
 include/linux/device.h       |    6 ++
 include/linux/pm.h           |   12 ++++
 include/linux/resume-trace.h |    7 ++
 4 files changed, 143 insertions(+), 6 deletions(-)

Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -26,6 +26,7 @@
 #include <linux/spinlock.h>
 #include <linux/wait.h>
 #include <linux/timer.h>
+#include <linux/completion.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -412,9 +413,11 @@ struct dev_pm_info {
 	pm_message_t		power_state;
 	unsigned int		can_wakeup:1;
 	unsigned int		should_wakeup:1;
+	unsigned		async_suspend:1;
 	enum dpm_state		status;		/* Owned by the PM core */
 #ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
+	struct completion	completion;
 #endif
 #ifdef CONFIG_PM_RUNTIME
 	struct timer_list	suspend_timer;
@@ -508,6 +511,13 @@ extern void __suspend_report_result(cons
 		__suspend_report_result(__func__, fn, ret);		\
 	} while (0)
 
+extern int __dpm_wait(struct device *dev, void *ign);
+
+static inline void dpm_wait(struct device *dev)
+{
+	__dpm_wait(dev, NULL);
+}
+
 #else /* !CONFIG_PM_SLEEP */
 
 #define device_pm_lock() do {} while (0)
@@ -520,6 +530,8 @@ static inline int dpm_suspend_start(pm_m
 
 #define suspend_report_result(fn, ret)		do {} while (0)
 
+static inline void dpm_wait(struct device *dev) {}
+
 #endif /* !CONFIG_PM_SLEEP */
 
 /* How to reorder dpm_list after device_move() */
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -25,6 +25,7 @@
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
+#include <linux/async.h>
 
 #include "../base.h"
 #include "power.h"
@@ -42,6 +43,7 @@
 LIST_HEAD(dpm_list);
 
 static DEFINE_MUTEX(dpm_list_mtx);
+static pm_message_t pm_transition;
 
 /*
  * Set once the preparation of devices for a PM transition has started, reset
@@ -56,6 +58,7 @@ static bool transition_started;
 void device_pm_init(struct device *dev)
 {
 	dev->power.status = DPM_ON;
+	init_completion(&dev->power.completion);
 	pm_runtime_init(dev);
 }
 
@@ -162,6 +165,39 @@ void device_pm_move_last(struct device *
 }
 
 /**
+ * __dpm_wait - Wait for a PM operation to complete.
+ * @dev: Device to wait for.
+ * @ign: This value is not used by the function.
+ */
+int __dpm_wait(struct device *dev, void *ign)
+{
+	if (dev)
+		wait_for_completion(&dev->power.completion);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__dpm_wait);
+
+static void dpm_wait_for_children(struct device *dev)
+{
+       device_for_each_child(dev, NULL, __dpm_wait);
+}
+
+/**
+ * dpm_synchronize - Wait for PM callbacks of all devices to complete.
+ */
+static void dpm_synchronize(void)
+{
+	struct device *dev;
+
+	async_synchronize_full();
+
+	mutex_lock(&dpm_list_mtx);
+	list_for_each_entry(dev, &dpm_list, power.entry)
+		INIT_COMPLETION(dev->power.completion);
+	mutex_unlock(&dpm_list_mtx);
+}
+
+/**
  * pm_op - Execute the PM operation appropriate for given PM event.
  * @dev: Device to handle.
  * @ops: PM operations to choose from.
@@ -381,17 +417,18 @@ void dpm_resume_noirq(pm_message_t state
 EXPORT_SYMBOL_GPL(dpm_resume_noirq);
 
 /**
- * device_resume - Execute "resume" callbacks for given device.
+ * __device_resume - Execute "resume" callbacks for given device.
  * @dev: Device to handle.
  * @state: PM transition of the system being carried out.
  */
-static int device_resume(struct device *dev, pm_message_t state)
+static int __device_resume(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
 	TRACE_DEVICE(dev);
 	TRACE_RESUME(0);
 
+	dpm_wait(dev->parent);
 	down(&dev->sem);
 
 	if (dev->bus) {
@@ -426,11 +463,34 @@ static int device_resume(struct device *
 	}
  End:
 	up(&dev->sem);
+	complete_all(&dev->power.completion);
 
 	TRACE_RESUME(error);
 	return error;
 }
 
+static void async_resume(void *data, async_cookie_t cookie)
+{
+	struct device *dev = (struct device *)data;
+	int error;
+
+	error = __device_resume(dev, pm_transition);
+	if (error)
+		pm_dev_err(dev, pm_transition, " async", error);
+	put_device(dev);
+}
+
+static int device_resume(struct device *dev)
+{
+	if (dev->power.async_suspend && !pm_trace_is_enabled()) {
+		get_device(dev);
+		async_schedule(async_resume, dev);
+		return 0;
+	}
+
+	return __device_resume(dev, pm_transition);
+}
+
 /**
  * dpm_resume - Execute "resume" callbacks for non-sysdev devices.
  * @state: PM transition of the system being carried out.
@@ -444,6 +504,7 @@ static void dpm_resume(pm_message_t stat
 
 	INIT_LIST_HEAD(&list);
 	mutex_lock(&dpm_list_mtx);
+	pm_transition = state;
 	while (!list_empty(&dpm_list)) {
 		struct device *dev = to_device(dpm_list.next);
 
@@ -454,7 +515,7 @@ static void dpm_resume(pm_message_t stat
 			dev->power.status = DPM_RESUMING;
 			mutex_unlock(&dpm_list_mtx);
 
-			error = device_resume(dev, state);
+			error = device_resume(dev);
 
 			mutex_lock(&dpm_list_mtx);
 			if (error)
@@ -469,6 +530,7 @@ static void dpm_resume(pm_message_t stat
 	}
 	list_splice(&list, &dpm_list);
 	mutex_unlock(&dpm_list_mtx);
+	dpm_synchronize();
 }
 
 /**
@@ -533,6 +595,8 @@ static void dpm_complete(pm_message_t st
 	mutex_unlock(&dpm_list_mtx);
 }
 
+static atomic_t async_error;
+
 /**
  * dpm_resume_end - Execute "resume" callbacks and complete system transition.
  * @state: PM transition of the system being carried out.
@@ -628,10 +692,11 @@ EXPORT_SYMBOL_GPL(dpm_suspend_noirq);
  * @dev: Device to handle.
  * @state: PM transition of the system being carried out.
  */
-static int device_suspend(struct device *dev, pm_message_t state)
+static int __device_suspend(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
+	dpm_wait_for_children(dev);
 	down(&dev->sem);
 
 	if (dev->class) {
@@ -666,12 +731,50 @@ static int device_suspend(struct device 
 			suspend_report_result(dev->bus->suspend, error);
 		}
 	}
+
+	if (!error)
+		dev->power.status = DPM_OFF;
+
  End:
 	up(&dev->sem);
+	complete_all(&dev->power.completion);
 
 	return error;
 }
 
+static void async_suspend(void *data, async_cookie_t cookie)
+{
+	struct device *dev = (struct device *)data;
+	int error = atomic_read(&async_error);
+
+	if (error) {
+		complete_all(&dev->power.completion);
+		goto End;
+	}
+
+	error = __device_suspend(dev, pm_transition);
+	if (error) {
+		pm_dev_err(dev, pm_transition, " async", error);
+		atomic_set(&async_error, error);
+	}
+
+ End:
+	put_device(dev);
+}
+
+static int device_suspend(struct device *dev, pm_message_t state)
+{
+	int error;
+
+	if (dev->power.async_suspend) {
+		get_device(dev);
+		async_schedule(async_suspend, dev);
+		return 0;
+	}
+
+	return __device_suspend(dev, pm_transition);
+}
+
 /**
  * dpm_suspend - Execute "suspend" callbacks for all non-sysdev devices.
  * @state: PM transition of the system being carried out.
@@ -683,6 +786,7 @@ static int dpm_suspend(pm_message_t stat
 
 	INIT_LIST_HEAD(&list);
 	mutex_lock(&dpm_list_mtx);
+	pm_transition = state;
 	while (!list_empty(&dpm_list)) {
 		struct device *dev = to_device(dpm_list.prev);
 
@@ -697,13 +801,18 @@ static int dpm_suspend(pm_message_t stat
 			put_device(dev);
 			break;
 		}
-		dev->power.status = DPM_OFF;
 		if (!list_empty(&dev->power.entry))
 			list_move(&dev->power.entry, &list);
 		put_device(dev);
+		error = atomic_read(&async_error);
+		if (error)
+			break;
 	}
 	list_splice(&list, dpm_list.prev);
 	mutex_unlock(&dpm_list_mtx);
+	dpm_synchronize();
+	if (!error)
+		error = atomic_read(&async_error);
 	return error;
 }
 
@@ -762,6 +871,7 @@ static int dpm_prepare(pm_message_t stat
 	INIT_LIST_HEAD(&list);
 	mutex_lock(&dpm_list_mtx);
 	transition_started = true;
+	atomic_set(&async_error, 0);
 	while (!list_empty(&dpm_list)) {
 		struct device *dev = to_device(dpm_list.next);
 
@@ -793,8 +903,10 @@ static int dpm_prepare(pm_message_t stat
 			break;
 		}
 		dev->power.status = DPM_SUSPENDING;
-		if (!list_empty(&dev->power.entry))
+		if (!list_empty(&dev->power.entry)) {
 			list_move_tail(&dev->power.entry, &list);
+			INIT_COMPLETION(dev->power.completion);
+		}
 		put_device(dev);
 	}
 	list_splice(&list, &dpm_list);
Index: linux-2.6/include/linux/resume-trace.h
===================================================================
--- linux-2.6.orig/include/linux/resume-trace.h
+++ linux-2.6/include/linux/resume-trace.h
@@ -6,6 +6,11 @@
 
 extern int pm_trace_enabled;
 
+static inline int pm_trace_is_enabled(void)
+{
+       return pm_trace_enabled;
+}
+
 struct device;
 extern void set_trace_device(struct device *);
 extern void generate_resume_trace(const void *tracedata, unsigned int user);
@@ -17,6 +22,8 @@ extern void generate_resume_trace(const 
 
 #else
 
+static inline int pm_trace_is_enabled(void) { return 0; }
+
 #define TRACE_DEVICE(dev) do { } while (0)
 #define TRACE_RESUME(dev) do { } while (0)
 
Index: linux-2.6/include/linux/device.h
===================================================================
--- linux-2.6.orig/include/linux/device.h
+++ linux-2.6/include/linux/device.h
@@ -472,6 +472,12 @@ static inline int device_is_registered(s
 	return dev->kobj.state_in_sysfs;
 }
 
+static inline void device_enable_async_suspend(struct device *dev, bool enable)
+{
+	if (dev->power.status == DPM_ON)
+		dev->power.async_suspend = enable;
+}
+
 void driver_init(void);
 
 /*

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912100018.19723.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] ` <200912100018.19723.rjw@sisk.pl>
@ 2009-12-10  2:51   ` Linus Torvalds
  2009-12-10 15:31   ` Alan Stern
       [not found]   ` <alpine.LFD.2.00.0912091835280.3560@localhost.localdomain>
  2 siblings, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-10  2:51 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, LKML, pm list

On Thu, 10 Dec 2009, Rafael J. Wysocki wrote:
> 
> Completions it is, then.

What was so hard with the "Try the simple one first" to understand? You 
had a simpler working patch, why are you making this more complex one 
without ever having had any problems with the simpler one?

Btw, your 'atomic_set()' with errors is pure voodoo programming. That's 
not how atomics work. They do SMP-atomic addition etc, the 'atomic_set()' 
and 'atomic_read()' things are not in any way more atomic than any other 
access.

They are meant for racy reads (atomic_read()) and for initializations 
(atomic_set()), and the way you use them that 'atomic' part is entirely 
pointless, because it really isn't anything different from an 'int', 
except that it may be very very expensive on some architectures due to 
hashed spinlocks etc.

So stop this overdesign thing. Start simple. If you _ever_ see real 
problems, that's when you add stuff. As it is, any time you add 
complexity, you just add bugs.

> +/**
> + * dpm_synchronize - Wait for PM callbacks of all devices to complete.
> + */
> +static void dpm_synchronize(void)
> +{
> +	struct device *dev;
> +
> +	async_synchronize_full();
> +
> +	mutex_lock(&dpm_list_mtx);
> +	list_for_each_entry(dev, &dpm_list, power.entry)
> +		INIT_COMPLETION(dev->power.completion);
> +	mutex_unlock(&dpm_list_mtx);
> +}

And this, for example, is pretty disgusting. Not only is that 
INIT_COMPLETION purely brought on by the whole problem with completions 
(they are fundamentally one-shot, but you want to use them over and over 
so you need to re-initialize them: a nice lock wouldn't have that problem 
to begin with), but the comment isn't even accurate. Sure, it waits for 
any async jobs, but that's the _least_ of what the function actually does, 
so the comment is actively misleading, isn't it?

		Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found] ` <200912100018.19723.rjw@sisk.pl>
  2009-12-10  2:51   ` Linus Torvalds
@ 2009-12-10 15:31   ` Alan Stern
       [not found]   ` <alpine.LFD.2.00.0912091835280.3560@localhost.localdomain>
  2 siblings, 0 replies; 98+ messages in thread
From: Alan Stern @ 2009-12-10 15:31 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Thu, 10 Dec 2009, Rafael J. Wysocki wrote:

> > How about CONFIG_PROVE_LOCKING?  If lockdep really does start 
> > complaining then switching to completions would be a simple way to 
> > appease it.
> 
> Ah, that one is not set.  I guess I'll try it later, although I've already
> decided to use completions anyway.

You should see how badly lockdep complains about the rwsems.  If it 
really doesn't like them then using completions makes sense.

> Index: linux-2.6/drivers/base/power/main.c
> ===================================================================
> --- linux-2.6.orig/drivers/base/power/main.c
> +++ linux-2.6/drivers/base/power/main.c
> @@ -56,6 +58,7 @@ static bool transition_started;
>  void device_pm_init(struct device *dev)
>  {
>  	dev->power.status = DPM_ON;
> +	init_completion(&dev->power.completion);
>  	pm_runtime_init(dev);
>  }

You need a matching complete_all() in device_pm_remove(), in case 
someone else is waiting for the device when it gets unregistered.

> +/**
> + * dpm_synchronize - Wait for PM callbacks of all devices to complete.
> + */
> +static void dpm_synchronize(void)
> +{
> +	struct device *dev;
> +
> +	async_synchronize_full();
> +
> +	mutex_lock(&dpm_list_mtx);
> +	list_for_each_entry(dev, &dpm_list, power.entry)
> +		INIT_COMPLETION(dev->power.completion);
> +	mutex_unlock(&dpm_list_mtx);
> +}

I agree with Linus, initializing the completions here is weird.  You
should initialize them just before using them.

> @@ -683,6 +786,7 @@ static int dpm_suspend(pm_message_t stat
>  
>  	INIT_LIST_HEAD(&list);
>  	mutex_lock(&dpm_list_mtx);
> +	pm_transition = state;
>  	while (!list_empty(&dpm_list)) {
>  		struct device *dev = to_device(dpm_list.prev);
>  
> @@ -697,13 +801,18 @@ static int dpm_suspend(pm_message_t stat
>  			put_device(dev);
>  			break;
>  		}
> -		dev->power.status = DPM_OFF;
>  		if (!list_empty(&dev->power.entry))
>  			list_move(&dev->power.entry, &list);
>  		put_device(dev);
> +		error = atomic_read(&async_error);
> +		if (error)
> +			break;
>  	}
>  	list_splice(&list, dpm_list.prev);

Here's something you might want to do in a later patch.  These awkward 
list-pointer manipulations can be simplified as follows:

static bool dpm_iterate_forward;
static struct device *dpm_next;

In device_pm_remove():

	mutex_lock(&dpm_list_mtx);
	if (dev == dpm_next)
		dpm_next = to_device(dpm_iterate_forward ?
			dev->power.entry.next : dev->power.entry.prev);
	list_del_init(&dev->power.entry);
	mutex_unlock(&dpm_list_mtx);

In dpm_resume():

	dpm_iterate_forward = true;
	list_for_each_entry_safe(dev, dpm_next, dpm_list, power.entry) {
		...

In dpm_suspend():

	dpm_iterate_forward = false;
	list_for_each_entry_safe_reverse(dev, dpm_next, dpm_list, 
			power.entry) {
		...

Whether this really is better is a matter of opinion; I like it.

Alan Stern

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <alpine.LFD.2.00.0912091835280.3560@localhost.localdomain>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]   ` <alpine.LFD.2.00.0912091835280.3560@localhost.localdomain>
@ 2009-12-10 19:40     ` Rafael J. Wysocki
       [not found]     ` <200912102040.11063.rjw@sisk.pl>
  1 sibling, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-10 19:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Thursday 10 December 2009, Linus Torvalds wrote:
> 
> On Thu, 10 Dec 2009, Rafael J. Wysocki wrote:
> > 
> > Completions it is, then.
> 
> What was so hard with the "Try the simple one first" to understand? You 
> had a simpler working patch, why are you making this more complex one 
> without ever having had any problems with the simpler one?

OK, why don't you just say you won't merge anything that doesn't use rwsems
(although you said before that completions would be fine with you)?  That would
make things clear, but also it would mean we gave up handling the off-tree
dependencies in general.

> Btw, your 'atomic_set()' with errors is pure voodoo programming. That's 
> not how atomics work. They do SMP-atomic addition etc, the 'atomic_set()' 
> and 'atomic_read()' things are not in any way more atomic than any other 
> access.
>
> They are meant for racy reads (atomic_read()) and for initializations 
> (atomic_set()), and the way you use them that 'atomic' part is entirely 
> pointless, because it really isn't anything different from an 'int', 
> except that it may be very very expensive on some architectures due to 
> hashed spinlocks etc.
> 
> So stop this overdesign thing. Start simple. If you _ever_ see real 
> problems, that's when you add stuff. As it is, any time you add 
> complexity, you just add bugs.

OK, so that need not be atomic.
 
> > +/**
> > + * dpm_synchronize - Wait for PM callbacks of all devices to complete.
> > + */
> > +static void dpm_synchronize(void)
> > +{
> > +	struct device *dev;
> > +
> > +	async_synchronize_full();
> > +
> > +	mutex_lock(&dpm_list_mtx);
> > +	list_for_each_entry(dev, &dpm_list, power.entry)
> > +		INIT_COMPLETION(dev->power.completion);
> > +	mutex_unlock(&dpm_list_mtx);
> > +}
> 
> And this, for example, is pretty disgusting. Not only is that 
> INIT_COMPLETION purely brought on by the whole problem with completions 
> (they are fundamentally one-shot, but you want to use them over and over

Actually, twice.  However, since I don't want to do any async handling in the
_noirq phases any more, I can get rid of this whole function.  Thanks for
pointing that out to me.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912102040.11063.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]     ` <200912102040.11063.rjw@sisk.pl>
@ 2009-12-10 23:30       ` Linus Torvalds
       [not found]       ` <alpine.LFD.2.00.0912101507550.3560@localhost.localdomain>
  1 sibling, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-10 23:30 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, LKML, pm list

On Thu, 10 Dec 2009, Rafael J. Wysocki wrote:
> 
> OK, why don't you just say you won't merge anything that doesn't use rwsems

I did! Here's a quote (and it's pretty much the whole email, so it's not 
like it was hidden):

 - alpine.LFD.2.00.0912081309370.3560@localhost.localdomain:

   "Let me put this simply: I've told you guys how to do it simply, with 
    _zero_ crap. No "iterating over children". No games. No data structures. 
    No new infrastructure. Just a single new rwlock per device, and _trivial_ 
    code.

    So here's the challenge: try it my simple way first. I've quoted the code 
    about five million times already. If you _actually_ see some problems, 
    explain them. Don't make up stupid "iterate over each child" things. Don't 
    claim totally made-up "leads to difficulties". Don't make it any more 
    complicated than it needs to be.

    Keep it simple. And once you have tried that simple approach, and you 
    really can show why it doesn't work, THEN you can try something else.

    But before you try the simple approach and explain why it wouldn't work, I 
    simply will not pull anything more complex. Understood and agreed?"

And then later about completions:

 - alpine.LFD.2.00.0912081416470.3560@localhost.localdomain:

   "So I think completions should work, if done right. That whole "make the 
    parent wait for all the children to complete" is fine in that sense. And 
    I'll happily take such an approach if my rwlock thing doesn't work."

IOW, I'll happily take the completions version, but dammit, I refuse to 
take it when there is a simpler approach that does NOT need to iterate, 
and does NOT need to re-initialize the data structures each round etc.

That's what I've been arguing against the whole time. It started as 
arguing against complex and unnecessary infrastructure, and trying to show 
that it _can_ be done so much simpler using existing basic locking.

And I get annoyed when you guys continually seem to want to make it more 
complex than it needs to be. 

> > And this, for example, is pretty disgusting. Not only is that 
> > INIT_COMPLETION purely brought on by the whole problem with completions 
> > (they are fundamentally one-shot, but you want to use them over and over
> 
> Actually, twice.  However, since I don't want to do any async handling in the
> _noirq phases any more, I can get rid of this whole function.  Thanks for
> pointing that out to me.

Well, my point was that you'll need to do that

	INIT_COMPLETION(dev->power.completion);

thing each suspend and each resume. Exactly because completions are 
designed to be "onw-way" things, so you end up having to reset them each 
cycle (you just reset them even _more_ than you needed).

Again, my point was that using locks is actually a very _natural_ thing to 
do. I really don't understand what problems you and Alan have with just 
using locks - we have way more locks in the kernel than we have 
completions, so they are the "default" thing to do, and they really are 
very natural to use.

[ Ok, so admittedly the actual use of 'struct rw_semaphore' is pretty 
  unusual, but my point is that people are used to locking semantics in 
  general, more so than the semantics of completions ]

Completions were literally designed to be used for one-off things - one of 
the most common uses is that the 'struct completion' is on the _stack_. It 
doesn't get much more one-off than that - and the completions are really 
very explicitly designed so that you can do a 'complete()' on something 
that will literally disappear from under you as you do it (because the 
struct completion might be on the stack of the thing that is waiting for 
it, and gets de-allocated when the waiter goes ahead).

That is why 'wait_for_completion()' always has to take the spinlock, for 
example - there is no fastpath for completion, because the races for the 
waiter releasing things too early are too nasty.

So completions are actually very subtle things - and you don't need any of 
that subtlety. I realize that from a user perspective, completions look 
very simple, but in many ways they actually have subtler semantics than a 
regular lock has.

			Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <alpine.LFD.2.00.0912101507550.3560@localhost.localdomain>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]       ` <alpine.LFD.2.00.0912101507550.3560@localhost.localdomain>
@ 2009-12-11  1:02         ` Rafael J. Wysocki
       [not found]         ` <200912110202.28536.rjw@sisk.pl>
  1 sibling, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-11  1:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Friday 11 December 2009, Linus Torvalds wrote:
> 
> On Thu, 10 Dec 2009, Rafael J. Wysocki wrote:
...
> 
> IOW, I'll happily take the completions version, but dammit, I refuse to 
> take it when there is a simpler approach that does NOT need to iterate, 
> and does NOT need to re-initialize the data structures each round etc.

I don't think it really is that simple.  For example, the fact that the outer
lock has to be taken by one thread and released by another is not exactly
straightforward.  [One might ask what's the critical section in this case.]

Besides, suppose a device driver wants some off-tree constraints to be
satisfied.  What's the driver writer supposed to do?  He only can lock the
other device, but that will cause lockdep to complain, because this lock
is going to be nested.  Moreover, it's already too late, because his async
thread has started and there's no guarantee that the other device hasn't
acquired its rwsem yet.

With completions, the driver doesn't have to take any action to prevent another
one from suspending too early.  Instead, the other one has to wait for its
suspend to complete, and for me personally this is a much more natural thing
to do.  IOW, if I were a driver writed, I'd probably prefer to wait on a
completion than to use a lock in a tricky manner.

> That's what I've been arguing against the whole time. It started as 
> arguing against complex and unnecessary infrastructure, and trying to show 
> that it _can_ be done so much simpler using existing basic locking.
> 
> And I get annoyed when you guys continually seem to want to make it more 
> complex than it needs to be. 
> 
> > > And this, for example, is pretty disgusting. Not only is that 
> > > INIT_COMPLETION purely brought on by the whole problem with completions 
> > > (they are fundamentally one-shot, but you want to use them over and over
> > 
> > Actually, twice.  However, since I don't want to do any async handling in the
> > _noirq phases any more, I can get rid of this whole function.  Thanks for
> > pointing that out to me.
> 
> Well, my point was that you'll need to do that
> 
> 	INIT_COMPLETION(dev->power.completion);
> 
> thing each suspend and each resume. Exactly because completions are 
> designed to be "onw-way" things, so you end up having to reset them each 
> cycle (you just reset them even _more_ than you needed).

Well, why actually do we need to preserve the state of the data structure from
one cycle to another?  There's no need whatsoever.

> Again, my point was that using locks is actually a very _natural_ thing to 
> do. I really don't understand what problems you and Alan have with just 
> using locks - we have way more locks in the kernel than we have 
> completions, so they are the "default" thing to do, and they really are 
> very natural to use.
> 
> [ Ok, so admittedly the actual use of 'struct rw_semaphore' is pretty 
>   unusual, but my point is that people are used to locking semantics in 
>   general, more so than the semantics of completions ]

I still don't think there are many places where locks are used in a way you're
suggesting.  I would even say it's quite unusual to use locks this way.

> Completions were literally designed to be used for one-off things - one of 
> the most common uses is that the 'struct completion' is on the _stack_. It 
> doesn't get much more one-off than that - and the completions are really 
> very explicitly designed so that you can do a 'complete()' on something 
> that will literally disappear from under you as you do it (because the 
> struct completion might be on the stack of the thing that is waiting for 
> it, and gets de-allocated when the waiter goes ahead).

We could literally throw away a completion after all of the potentially waiting
threads have finished their operations and then allocate it back again when
necessary.  We only need the synchronization in this particular phase of
suspend or resume and it doesn't need to extend to the other phases or other
cycles, because all of the concurrent threads we need to synchronize will
only live during this one particular phase of suspend or resume.  They will
all exit when it's finished anyway.

> That is why 'wait_for_completion()' always has to take the spinlock, for 
> example - there is no fastpath for completion, because the races for the 
> waiter releasing things too early are too nasty.
> 
> So completions are actually very subtle things - and you don't need any of 
> that subtlety. I realize that from a user perspective, completions look 
> very simple, but in many ways they actually have subtler semantics than a 
> regular lock has.

Well, I guess your point is that the implementation of completions is much
more complicated that we really need, but I'm not sure if that really hurts.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912110202.28536.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]         ` <200912110202.28536.rjw@sisk.pl>
@ 2009-12-11  1:25           ` Linus Torvalds
       [not found]           ` <alpine.LFD.2.00.0912101713440.3560@localhost.localdomain>
  1 sibling, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-11  1:25 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, LKML, pm list

On Fri, 11 Dec 2009, Rafael J. Wysocki wrote:
> 
> I don't think it really is that simple.  For example, the fact that the outer
> lock has to be taken by one thread and released by another is not exactly
> straightforward.  [One might ask what's the critical section in this case.]

Why is that any different from initializing the completion in one thread, 
and completing it in another?

It's exactly equivalent.

Completions really are "locks that were initialized to locked". That is, 
in fact, how completions came to be: we literally used to use semaphores 
for them, and the reason for completions is literally the magic lifetime 
rules they have.

So when you do

	INIT_COMPLETION(dev->power.completion);

that really is historically, logically, and conceptually exactly the same 
thing as initializing a lock to the locked state. We literally used to do 
it with the equivalent of

	init_MUTEX_LOCKED()

way back when (well, except we didn't have mutexes back then, we had only 
counting semaphores) and instead of "complete()", we had "up()" on the 
semaphore to complete it.

> Besides, suppose a device driver wants some off-tree constraints to be
> satisfied.

.. and I've told you several times that we should simply not do such 
devices asynchronously. At least not unless there is some _overriding_ 
reason to. And so far, nobody has suggested anything even remotely 
likely for that.

Again - KISS: Keep It Simple, Stupid!

Don't try to make up problems. The _only_ subsystem we know wants this is 
USB, and we know USB is purely a tree.

> > 	INIT_COMPLETION(dev->power.completion);
> > 
> > thing each suspend and each resume. Exactly because completions are 
> > designed to be "onw-way" things, so you end up having to reset them each 
> > cycle (you just reset them even _more_ than you needed).
> 
> Well, why actually do we need to preserve the state of the data structure from
> one cycle to another?  There's no need whatsoever.

My point is, with locks, none of that is necessary. Because they 
automatically do the right thing.

By picking the right concept, you don't have any of those "oh, we need to 
re-initialize things" issues. They just work.

> I still don't think there are many places where locks are used in a way you're
> suggesting.  I would even say it's quite unusual to use locks this way.

See above. It's what completions _are_.

> Well, I guess your point is that the implementation of completions is much
> more complicated that we really need, but I'm not sure if that really hurts.

No. The implementation of completions is actually pretty simple, exactly 
because they have that spinlock that is required to protect them. 

That wasn't the point. The point was that locks are actually the "normal" 
thing to use. 

You are arguing as if completions are somehow the simpler model. That's 
simply not true. Completions are just a _special_case_of_locking_.

So why not just use regular locks instead, when it's actually the natural 
way to do it, and results in simpler code?

		Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <alpine.LFD.2.00.0912101713440.3560@localhost.localdomain>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]           ` <alpine.LFD.2.00.0912101713440.3560@localhost.localdomain>
@ 2009-12-11  3:42             ` Alan Stern
  2009-12-11 22:11             ` Rafael J. Wysocki
       [not found]             ` <200912112311.08548.rjw@sisk.pl>
  2 siblings, 0 replies; 98+ messages in thread
From: Alan Stern @ 2009-12-11  3:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

Up front: This is my personal view of the matter.  Which probably isn't
of much interest to anybody, so I won't bother to defend these views or
comment any further on them.  The decision about what version to use is
up to the two of you.  The fact is, either implementation would get the 
job done.

On Thu, 10 Dec 2009, Linus Torvalds wrote:

> Completions really are "locks that were initialized to locked". That is, 
> in fact, how completions came to be: we literally used to use semaphores 
> for them, and the reason for completions is literally the magic lifetime 
> rules they have.
> 
> So when you do
> 
> 	INIT_COMPLETION(dev->power.completion);
> 
> that really is historically, logically, and conceptually exactly the same 
> thing as initializing a lock to the locked state. We literally used to do 
> it with the equivalent of
> 
> 	init_MUTEX_LOCKED()
> 
> way back when (well, except we didn't have mutexes back then, we had only 
> counting semaphores) and instead of "complete()", we had "up()" on the 
> semaphore to complete it.

You think of it that way because you have been closely involved in the
development of the various kinds of locks.  Speaking as an outsider who
has relatively little interest in the internal details, completions
appear simpler than rwsems.  Mostly because they have a smaller API:  
complete() (or complete_all()) and wait_for_completion() as opposed to
down_read(), up_read(), down_write(), and up_write().

> > Besides, suppose a device driver wants some off-tree constraints to be
> > satisfied.
> 
> .. and I've told you several times that we should simply not do such 
> devices asynchronously. At least not unless there is some _overriding_ 
> reason to. And so far, nobody has suggested anything even remotely 
> likely for that.

Agreed.  The fact that async non-tree suspend constraints are difficult 
with rwsems isn't a drawback if nobody needs to use them.

> > Well, why actually do we need to preserve the state of the data structure from
> > one cycle to another?  There's no need whatsoever.
> 
> My point is, with locks, none of that is necessary. Because they 
> automatically do the right thing.
> 
> By picking the right concept, you don't have any of those "oh, we need to 
> re-initialize things" issues. They just work.

That's true, but it's not entirely clear.  There are subtle questions
about what happens if you stop in the middle or a device gets
unregistered or registered in the middle.  They require careful thought
in both approaches.

Having to reinitialize a completion each time doesn't bother me.  It's 
merely an indication that each suspend & resume is independent of all 
the others.

> > I still don't think there are many places where locks are used in a way you're
> > suggesting.  I would even say it's quite unusual to use locks this way.
> 
> See above. It's what completions _are_.

This is almost a philosophical issue.  If each A_i must wait for some
B_j's, is the onus on each A_i to test the B_j's it's interested in?  
Or is the onus on each B_j to tell the A_i's waiting for it that they
may proceed?  As Humpty-Dumpty said, "The question is which is to be
master -- that's all".

> > Well, I guess your point is that the implementation of completions is much
> > more complicated that we really need, but I'm not sure if that really hurts.
> 
> No. The implementation of completions is actually pretty simple, exactly 
> because they have that spinlock that is required to protect them. 
> 
> That wasn't the point. The point was that locks are actually the "normal" 
> thing to use. 
> 
> You are arguing as if completions are somehow the simpler model. That's 
> simply not true. Completions are just a _special_case_of_locking_.

Doesn't that make them simpler by definition?  Special cases always 
have less to worry about than the general case.

> So why not just use regular locks instead, when it's actually the natural 
> way to do it, and results in simpler code?

Simpler but also more subtle, IMO.  If you didn't already know how the
algorithm worked, figuring it out from the code would be harder with
rwsems than with completions.  Partly because of the way readers and
writers exchange roles in suspend vs. resume, and partly because
sometimes devices lock themselves and sometimes they lock other
devices.  With completions each device has its own, and each device
waits for other devices' completions -- easier to keep track of 
mentally.

(I still think this whole readers vs. writers thing is a red herring.  
The essential property is that there are two opposing classes of lock
holders.  The fact that multiple writers can't hold the lock at the
same time whereas multiple readers can is of no importance; the
algorithm would work just as well if multiple writers _could_ hold the
lock simultaneously.)

Balancing the additional conceptual complexity of the rwsem approach is 
the conceptual simplicity afforded by not needing to check all the 
children.  To me this makes it pretty much a toss-up.

Alan Stern

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]           ` <alpine.LFD.2.00.0912101713440.3560@localhost.localdomain>
  2009-12-11  3:42             ` Alan Stern
@ 2009-12-11 22:11             ` Rafael J. Wysocki
       [not found]             ` <200912112311.08548.rjw@sisk.pl>
  2 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-11 22:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Friday 11 December 2009, Linus Torvalds wrote:
> 
> On Fri, 11 Dec 2009, Rafael J. Wysocki wrote:
> > 
> > I don't think it really is that simple.  For example, the fact that the outer
> > lock has to be taken by one thread and released by another is not exactly
> > straightforward.  [One might ask what's the critical section in this case.]
> 
> Why is that any different from initializing the completion in one thread, 
> and completing it in another?
> 
> It's exactly equivalent.
> 
> Completions really are "locks that were initialized to locked". That is, 
> in fact, how completions came to be: we literally used to use semaphores 
> for them, and the reason for completions is literally the magic lifetime 
> rules they have.

I don't know how they emerged historically and that's why I look a them in a
different way than you do, probably.

But fine, say we use the approach based on rwsems and consider suspend and
the inner lock.  We acquire it using down_write(), because we want to wait for
multiple other dirvers.  Now, in fact we could do literally

down_write(dev->power.rwsem);
up_write(dev->power.rwsem);

because the lock doesn't really protect anything from anyone.  What it does is
to prevent _us_ from doing something too early.  To me, personally, it's not a
usual use of locks.

Moreover, if you think completions should be treated like locks, the up_write()
above plays the role of the INIT_COMPLETION() in my last patch (or vice versa),
so we reinitialize the data structure to the previous state in this case too,
only earlier (and we could do that later just as well).

The only real drawback of using completions I can see is that we have to
iterate over the children during suspend, but if async suspend is going to save
us any time at all, we can easily afford it (resume with completions is
actually simpler than with rwsems, because we only have to wait for one device
each time).

> > Besides, suppose a device driver wants some off-tree constraints to be
> > satisfied.
> 
> .. and I've told you several times that we should simply not do such 
> devices asynchronously. At least not unless there is some _overriding_ 
> reason to. And so far, nobody has suggested anything even remotely 
> likely for that.
> 
> Again - KISS: Keep It Simple, Stupid!
> 
> Don't try to make up problems. The _only_ subsystem we know wants this is 
> USB, and we know USB is purely a tree.

Not really.

I've already said it once, but let me repeat.  Some device objects have those
ACPI "shadow" device objects that represent the ACPI view of given "physical"
device and have their own suspend and resume routines.  It turns out that
these ACPI "shadow" devices have to be suspended after their "physical"
counterparts and resumed before them, or else things beak really badly.
I don't know the reason for that, I only verified it experimentally (I also
don't like that design, but I didn't invent it and I have to live with it at
least for now).  So if we don't enforce these constraints doing async
suspend and resume, we won't be able to handle _any_ devices with those
ACPI "shadow" things asynchronously.  Ever.  [That includes the majority
PCI devices, at least the "planar" ones (which is unfortunate, but that's how
it goes).]

If we had a clean way of representing off-tree constraints during asynchronous
suspend and resume, we'd be able to handle this issue at the bus type level.

And even if we don't anticipate it right now, I think the iteration over
children during suspend is a fair price for a clean interface that bus types or
drivers can use in future.  YMMV.

> > Well, I guess your point is that the implementation of completions is much
> > more complicated that we really need, but I'm not sure if that really hurts.
> 
> No. The implementation of completions is actually pretty simple, exactly 
> because they have that spinlock that is required to protect them. 
> 
> That wasn't the point. The point was that locks are actually the "normal" 
> thing to use. 
> 
> You are arguing as if completions are somehow the simpler model.

That's because I think so.

> That's simply not true. Completions are just a _special_case_of_locking_.

Which doesn't necessarily prevent them from being conceptually simpler
that the locking scheme based on rwsems.

> So why not just use regular locks instead, when it's actually the natural 
> way to do it, and results in simpler code?

Well, to me, it's way not natural and, quite frankly, in my not so humble
opinion, it's a matter of personal preference.

But, since your personal preference is what matters in this case, I'm not
going to argue any more, because that just plain doesn't make sense.

So, if you're not fine with the last patch I sent
(http://patchwork.kernel.org/patch/66375/), I'll send one using rwsems instead
of completions just to make _you_ happy, not because I think that's what we
should do objectively.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912112311.08548.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]             ` <200912112311.08548.rjw@sisk.pl>
@ 2009-12-11 22:31               ` Linus Torvalds
       [not found]               ` <alpine.LFD.2.00.0912111415160.3922@localhost.localdomain>
  1 sibling, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-11 22:31 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, LKML, pm list

On Fri, 11 Dec 2009, Rafael J. Wysocki wrote:
> 
> But fine, say we use the approach based on rwsems and consider suspend and
> the inner lock.  We acquire it using down_write(), because we want to wait for
> multiple other dirvers.  Now, in fact we could do literally
> 
> down_write(dev->power.rwsem);
> up_write(dev->power.rwsem);
> 
> because the lock doesn't really protect anything from anyone.  What it does is
> to prevent _us_ from doing something too early.  To me, personally, it's not a
> usual use of locks.

I agree that it's fairly unusual, but on the other hand, it's unusual only 
because you contrieved it to be.

If you instead do

	down_write(dev->power.rwsem);
	.. do the actual suspend ..
	up_write(dev->power.rwsem);

it doesn't look odd any more, does it? And while you don't _need_ to hold 
the power lock over the suspend call, it actually does make sense, and 
gives you some nicer guarantees.

For an example of the kinds of guarantees it would give you - I think that 
you might actually be able to do a partial suspend and then a resume 
without any other locks, and you'd know that just the per-device locking 
would already guarantee that no device is ever tried to resume before it 
has finished its asynchronous suspend.

Think about it.

In the completion model, the "async_synchronize_full()" will synchronize 
all async work, and as a result you think that you don't need that level 
of robustness from the locking itself.

But think about it this way: if you could abort a failed suspend, and 
start resuming devices immediately, without doing that 
"async_synchronize_full()" in between - simply because you know that the 
node locking itself will just "do the right thing".

To me, that's a sign of a _good_ design. Using a rwsem is simply just more 
robust and natural for the problem in question. Exactly because it's a 
real lock.

> > Don't try to make up problems. The _only_ subsystem we know wants this is 
> > USB, and we know USB is purely a tree.
> 
> Not really.
> 
> I've already said it once, but let me repeat.  Some device objects have those
> ACPI "shadow" device objects that represent the ACPI view of given "physical"
> device and have their own suspend and resume routines.  It turns out that
> these ACPI "shadow" devices have to be suspended after their "physical"
> counterparts and resumed before them, or else things beak really badly.
> I don't know the reason for that, I only verified it experimentally (I also
> don't like that design, but I didn't invent it and I have to live with it at
> least for now).  So if we don't enforce these constraints doing async
> suspend and resume, we won't be able to handle _any_ devices with those
> ACPI "shadow" things asynchronously.  Ever.  [That includes the majority
> PCI devices, at least the "planar" ones (which is unfortunate, but that's how
> it goes).]

So?

First off, you're wrong. It's not "ever". I'm happy to add complexity 
later, I just don't want to start out with a complex model. Adding 
complexity too early "just because we migth need it" is the wrong thing to 
do.

Secondly, I repeat: we don't want to do those PCI devices asynchronously 
anyway. You're again digging yourself deeper by just continually bringing 
up this total non-issue. I realize you did it for testing, but I'm serious 
when I say that we should limit these things as much as possible, rather 
than see it as an opportunity to do crazy things.

Solve the problem at hand _first_. Solve it as simply as you can. And hope 
that you never ever will need anything more complex.

				Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <alpine.LFD.2.00.0912111415160.3922@localhost.localdomain>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]               ` <alpine.LFD.2.00.0912111415160.3922@localhost.localdomain>
@ 2009-12-11 23:48                 ` Rafael J. Wysocki
       [not found]                 ` <200912120048.46180.rjw@sisk.pl>
  1 sibling, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-11 23:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Friday 11 December 2009, Linus Torvalds wrote:
> 
> On Fri, 11 Dec 2009, Rafael J. Wysocki wrote:
> > 
> > But fine, say we use the approach based on rwsems and consider suspend and
> > the inner lock.  We acquire it using down_write(), because we want to wait for
> > multiple other dirvers.  Now, in fact we could do literally
> > 
> > down_write(dev->power.rwsem);
> > up_write(dev->power.rwsem);
> > 
> > because the lock doesn't really protect anything from anyone.  What it does is
> > to prevent _us_ from doing something too early.  To me, personally, it's not a
> > usual use of locks.
> 
> I agree that it's fairly unusual, but on the other hand, it's unusual only 
> because you contrieved it to be.

Whatever.  The very fact that you can freely move the up_write() (as long as
it's after the down_write()) is fairly unusual.

> But think about it this way: if you could abort a failed suspend, and 
> start resuming devices immediately, without doing that 
> "async_synchronize_full()" in between - simply because you know that the 
> node locking itself will just "do the right thing".

I'd rather not. :-)

> To me, that's a sign of a _good_ design. Using a rwsem is simply just more 
> robust and natural for the problem in question. Exactly because it's a 
> real lock.
...
> Solve the problem at hand _first_. Solve it as simply as you can. And hope 
> that you never ever will need anything more complex.

Below is a patch I've just tested, but there's a lockdep problem in it I don't
know how to solve.  Namely, lockdep is apparently unhappy with us not releasing
the lock taken in device_suspend() and it complains we take it twice in a row
(which we do, but for another device).  I need to use down_read_non_owner()
to make it shut up and then I also need to use up_read_non_owner() in
__device_suspend(), although there's the comment in include/linux/rwsem.h
saying exatly this about that:

/*
 * Take/release a lock when not the owner will release it.
 *
 * [ This API should be avoided as much as possible - the
 *   proper abstraction for this case is completions. ]
 */

(I'd like to know your opinion about that).  Yet, that's not all, because next
it complains during resume that __device_resume() releases a lock it didn't
acquire, which it clearly does, but that is intentional.  Unfortunately,
there's no up_write_non_owner() ...

So, what am I supposed to do about that?

Rafael


---
 drivers/base/power/main.c    |  107 +++++++++++++++++++++++++++++++++++++++----
 include/linux/device.h       |    6 ++
 include/linux/pm.h           |    3 +
 include/linux/resume-trace.h |    7 ++
 4 files changed, 114 insertions(+), 9 deletions(-)

Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -26,6 +26,7 @@
 #include <linux/spinlock.h>
 #include <linux/wait.h>
 #include <linux/timer.h>
+#include <linux/rwsem.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -412,9 +413,11 @@ struct dev_pm_info {
 	pm_message_t		power_state;
 	unsigned int		can_wakeup:1;
 	unsigned int		should_wakeup:1;
+	unsigned		async_suspend:1;
 	enum dpm_state		status;		/* Owned by the PM core */
 #ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
+	struct rw_semaphore	rwsem;
 #endif
 #ifdef CONFIG_PM_RUNTIME
 	struct timer_list	suspend_timer;
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -25,6 +25,7 @@
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
+#include <linux/async.h>
 
 #include "../base.h"
 #include "power.h"
@@ -42,6 +43,7 @@
 LIST_HEAD(dpm_list);
 
 static DEFINE_MUTEX(dpm_list_mtx);
+static pm_message_t pm_transition;
 
 /*
  * Set once the preparation of devices for a PM transition has started, reset
@@ -56,6 +58,7 @@ static bool transition_started;
 void device_pm_init(struct device *dev)
 {
 	dev->power.status = DPM_ON;
+	init_rwsem(&dev->power.rwsem);
 	pm_runtime_init(dev);
 }
 
@@ -381,17 +384,22 @@ void dpm_resume_noirq(pm_message_t state
 EXPORT_SYMBOL_GPL(dpm_resume_noirq);
 
 /**
- * device_resume - Execute "resume" callbacks for given device.
+ * __device_resume - Execute "resume" callbacks for given device.
  * @dev: Device to handle.
  * @state: PM transition of the system being carried out.
  */
-static int device_resume(struct device *dev, pm_message_t state)
+static int __device_resume(struct device *dev, pm_message_t state)
 {
+	struct device *parent = dev->parent;
 	int error = 0;
 
 	TRACE_DEVICE(dev);
 	TRACE_RESUME(0);
 
+	/* Wait for the parent's resume to complete, if necessary. */
+	if (parent)
+		down_read_nested(&parent->power.rwsem, SINGLE_DEPTH_NESTING);
+
 	down(&dev->sem);
 
 	if (dev->bus) {
@@ -426,11 +434,41 @@ static int device_resume(struct device *
 	}
  End:
 	up(&dev->sem);
+	if (parent)
+		up_read(&parent->power.rwsem);
+
+	/* Allow the children to resume now. */
+	up_write(&dev->power.rwsem);
 
 	TRACE_RESUME(error);
 	return error;
 }
 
+static void async_resume(void *data, async_cookie_t cookie)
+{
+	struct device *dev = (struct device *)data;
+	int error;
+
+	error = __device_resume(dev, pm_transition);
+	if (error)
+		pm_dev_err(dev, pm_transition, " async", error);
+	put_device(dev);
+}
+
+static int device_resume(struct device *dev)
+{
+	/* Prevent the children from resuming before us. */
+	down_write(&dev->power.rwsem);
+
+	if (dev->power.async_suspend && !pm_trace_is_enabled()) {
+		get_device(dev);
+		async_schedule(async_resume, dev);
+		return 0;
+	}
+
+	return __device_resume(dev, pm_transition);
+}
+
 /**
  * dpm_resume - Execute "resume" callbacks for non-sysdev devices.
  * @state: PM transition of the system being carried out.
@@ -444,6 +482,7 @@ static void dpm_resume(pm_message_t stat
 
 	INIT_LIST_HEAD(&list);
 	mutex_lock(&dpm_list_mtx);
+	pm_transition = state;
 	while (!list_empty(&dpm_list)) {
 		struct device *dev = to_device(dpm_list.next);
 
@@ -454,7 +493,7 @@ static void dpm_resume(pm_message_t stat
 			dev->power.status = DPM_RESUMING;
 			mutex_unlock(&dpm_list_mtx);
 
-			error = device_resume(dev, state);
+			error = device_resume(dev);
 
 			mutex_lock(&dpm_list_mtx);
 			if (error)
@@ -469,6 +508,7 @@ static void dpm_resume(pm_message_t stat
 	}
 	list_splice(&list, &dpm_list);
 	mutex_unlock(&dpm_list_mtx);
+	async_synchronize_full();
 }
 
 /**
@@ -584,13 +624,11 @@ static int device_suspend_noirq(struct d
 {
 	int error = 0;
 
-	if (!dev->bus)
-		return 0;
-
-	if (dev->bus->pm) {
+	if (dev->bus && dev->bus->pm) {
 		pm_dev_dbg(dev, state, "LATE ");
 		error = pm_noirq_op(dev, dev->bus->pm, state);
 	}
+
 	return error;
 }
 
@@ -623,17 +661,24 @@ int dpm_suspend_noirq(pm_message_t state
 }
 EXPORT_SYMBOL_GPL(dpm_suspend_noirq);
 
+static int async_error;
+
 /**
  * device_suspend - Execute "suspend" callbacks for given device.
  * @dev: Device to handle.
  * @state: PM transition of the system being carried out.
  */
-static int device_suspend(struct device *dev, pm_message_t state)
+static int __device_suspend(struct device *dev, pm_message_t state)
 {
 	int error = 0;
 
+	/* Wait for the suspends of the children to complete, if necessary. */
+	down_write_nested(&dev->power.rwsem, SINGLE_DEPTH_NESTING);
 	down(&dev->sem);
 
+	if (async_error)
+		goto End;
+
 	if (dev->class) {
 		if (dev->class->pm) {
 			pm_dev_dbg(dev, state, "class ");
@@ -666,12 +711,50 @@ static int device_suspend(struct device 
 			suspend_report_result(dev->bus->suspend, error);
 		}
 	}
+
+	if (!error)
+		dev->power.status = DPM_OFF;
+
  End:
 	up(&dev->sem);
+	up_write(&dev->power.rwsem);
+
+	/* Allow the parent to suspend now. */
+	if (dev->parent)
+		up_read_non_owner(&dev->parent->power.rwsem);
 
 	return error;
 }
 
+static void async_suspend(void *data, async_cookie_t cookie)
+{
+	struct device *dev = (struct device *)data;
+	int error;
+
+	error = __device_suspend(dev, pm_transition);
+	if (error) {
+		pm_dev_err(dev, pm_transition, " async", error);
+		async_error = error;
+	}
+
+	put_device(dev);
+}
+
+static int device_suspend(struct device *dev, pm_message_t state)
+{
+	/* Prevent the parent from suspending before us. */
+	if (dev->parent)
+		down_read_non_owner(&dev->parent->power.rwsem);
+
+	if (dev->power.async_suspend) {
+		get_device(dev);
+		async_schedule(async_suspend, dev);
+		return 0;
+	}
+
+	return __device_suspend(dev, pm_transition);
+}
+
 /**
  * dpm_suspend - Execute "suspend" callbacks for all non-sysdev devices.
  * @state: PM transition of the system being carried out.
@@ -683,6 +766,7 @@ static int dpm_suspend(pm_message_t stat
 
 	INIT_LIST_HEAD(&list);
 	mutex_lock(&dpm_list_mtx);
+	pm_transition = state;
 	while (!list_empty(&dpm_list)) {
 		struct device *dev = to_device(dpm_list.prev);
 
@@ -697,13 +781,17 @@ static int dpm_suspend(pm_message_t stat
 			put_device(dev);
 			break;
 		}
-		dev->power.status = DPM_OFF;
 		if (!list_empty(&dev->power.entry))
 			list_move(&dev->power.entry, &list);
 		put_device(dev);
+		if (async_error)
+			break;
 	}
 	list_splice(&list, dpm_list.prev);
 	mutex_unlock(&dpm_list_mtx);
+	async_synchronize_full();
+	if (!error)
+		error = async_error;
 	return error;
 }
 
@@ -762,6 +850,7 @@ static int dpm_prepare(pm_message_t stat
 	INIT_LIST_HEAD(&list);
 	mutex_lock(&dpm_list_mtx);
 	transition_started = true;
+	async_error = 0;
 	while (!list_empty(&dpm_list)) {
 		struct device *dev = to_device(dpm_list.next);
 
Index: linux-2.6/include/linux/resume-trace.h
===================================================================
--- linux-2.6.orig/include/linux/resume-trace.h
+++ linux-2.6/include/linux/resume-trace.h
@@ -6,6 +6,11 @@
 
 extern int pm_trace_enabled;
 
+static inline int pm_trace_is_enabled(void)
+{
+       return pm_trace_enabled;
+}
+
 struct device;
 extern void set_trace_device(struct device *);
 extern void generate_resume_trace(const void *tracedata, unsigned int user);
@@ -17,6 +22,8 @@ extern void generate_resume_trace(const 
 
 #else
 
+static inline int pm_trace_is_enabled(void) { return 0; }
+
 #define TRACE_DEVICE(dev) do { } while (0)
 #define TRACE_RESUME(dev) do { } while (0)
 
Index: linux-2.6/include/linux/device.h
===================================================================
--- linux-2.6.orig/include/linux/device.h
+++ linux-2.6/include/linux/device.h
@@ -472,6 +472,12 @@ static inline int device_is_registered(s
 	return dev->kobj.state_in_sysfs;
 }
 
+static inline void device_enable_async_suspend(struct device *dev, bool enable)
+{
+	if (dev->power.status == DPM_ON)
+		dev->power.async_suspend = enable;
+}
+
 void driver_init(void);
 
 /*

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912120048.46180.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                 ` <200912120048.46180.rjw@sisk.pl>
@ 2009-12-11 23:53                   ` Linus Torvalds
  2009-12-12  0:43                   ` Alan Stern
       [not found]                   ` <alpine.LFD.2.00.0912111552330.3526@localhost.localdomain>
  2 siblings, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-11 23:53 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, LKML, pm list



On Sat, 12 Dec 2009, Rafael J. Wysocki wrote:
> 
> Below is a patch I've just tested, but there's a lockdep problem in it I don't
> know how to solve.  Namely, lockdep is apparently unhappy with us not releasing
> the lock taken in device_suspend() and it complains we take it twice in a row
> (which we do, but for another device).  I need to use down_read_non_owner()
> to make it shut up and then I also need to use up_read_non_owner() in
> __device_suspend(),

Ok, that I admit is actually a problem.

Ok, ok, I'll accept that completion() version, even though I think it's 
inferior.

		Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                 ` <200912120048.46180.rjw@sisk.pl>
  2009-12-11 23:53                   ` Linus Torvalds
@ 2009-12-12  0:43                   ` Alan Stern
       [not found]                   ` <alpine.LFD.2.00.0912111552330.3526@localhost.localdomain>
  2 siblings, 0 replies; 98+ messages in thread
From: Alan Stern @ 2009-12-12  0:43 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Sat, 12 Dec 2009, Rafael J. Wysocki wrote:

> Below is a patch I've just tested, but there's a lockdep problem in it I don't
> know how to solve.  Namely, lockdep is apparently unhappy with us not releasing
> the lock taken in device_suspend() and it complains we take it twice in a row
> (which we do, but for another device).  I need to use down_read_non_owner()
> to make it shut up and then I also need to use up_read_non_owner() in
> __device_suspend(), although there's the comment in include/linux/rwsem.h
> saying exatly this about that:
> 
> /*
>  * Take/release a lock when not the owner will release it.
>  *
>  * [ This API should be avoided as much as possible - the
>  *   proper abstraction for this case is completions. ]
>  */
> 
> (I'd like to know your opinion about that).  Yet, that's not all, because next
> it complains during resume that __device_resume() releases a lock it didn't
> acquire, which it clearly does, but that is intentional.  Unfortunately,
> there's no up_write_non_owner() ...

Hah!  I knew it!

How come lockdep didn't complain earlier?  What's different about this 
patch?  Only the nesting annotations?  Why should adding annotations 
make lockdep less happy?

Alan Stern

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <alpine.LFD.2.00.0912111552330.3526@localhost.localdomain>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                   ` <alpine.LFD.2.00.0912111552330.3526@localhost.localdomain>
@ 2009-12-12 17:48                     ` Rafael J. Wysocki
  2009-12-12 18:54                       ` Linus Torvalds
  0 siblings, 1 reply; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-12 17:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Saturday 12 December 2009, Linus Torvalds wrote:
> 
> On Sat, 12 Dec 2009, Rafael J. Wysocki wrote:
> > 
> > Below is a patch I've just tested, but there's a lockdep problem in it I don't
> > know how to solve.  Namely, lockdep is apparently unhappy with us not releasing
> > the lock taken in device_suspend() and it complains we take it twice in a row
> > (which we do, but for another device).  I need to use down_read_non_owner()
> > to make it shut up and then I also need to use up_read_non_owner() in
> > __device_suspend(),
> 
> Ok, that I admit is actually a problem.
> 
> Ok, ok, I'll accept that completion() version, even though I think it's 
> inferior.

Great! :-)

I slightly changed it in the meantime to avoid calling wait_for_completion()
when both the parent and the child are "synchronous", which prevents the code
from choking on some situations when the ordering of dpm_list is wrong (this
happens as a result of bugs, but not necessarily fatal, for example if one of
the drivers' suspend and resume callbacks are NULL and the bus type doesn't
access the hardware directly, so we shouldn't make things worse than they
already are IMO).

I'd like to put it into my tree in this form, if you don't mind.

[Note for Alan: dpm_wait() is not exported for now, we'll export it when there
are any users.]

Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM: Asynchronous suspend and resume of devices

Theoretically, the total time of system sleep transitions (suspend
to RAM, hibernation) can be reduced by running suspend and resume
callbacks of device drivers in parallel with each other.  However,
there are dependencies between devices such that we're not allowed
to suspend the parent of a device before suspending the device
itself.  Analogously, we're not allowed to resume a device before
resuming its parent.

Thus, to make it possible to execute device drivers' suspend and
resume callbacks in parallel with each other, introduce (at the PM
core level) a synchronization mechanism preventing the dependencies
between devices from being violated.

First, device drivers that want their suspend and resume callbacks
to be run asynchronously need to set the power.async_suspend flags
of their devices using device_enable_async_suspend().

Second, for each device with the power.async_suspend flag set the PM
core will start async threads to execute its suspend and resume
callbacks.

The async threads started for different devices are synchronized with
each other and with the main suspend (or resume) thread with the help
of completions, in the following way:
(1) There is a completion, power.completion, for each device object.
(2) Each device's completion is reset before starting the async
    suspend (or resume) thread for the device or, in the case of
    devices whose power.async_suspend flags are not set, before
    executing the device's suspend and resume callbacks.
(3) During suspend, right before running the bus type, device type
    and device class suspend callbacks for the device, the PM core
    waits for the completions of all the device's children to be
    completed.
(4) During resume, right before running the bus type, device type and
    device class resume callbacks for the device, the PM core waits
    for the completion of the device's parent to be completed.
(5) The PM core completes power.completion for each device right
    after the bus type, device type and device class suspend (or
    resume) callbacks executed for the device have returned.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/base/power/main.c    |  115 ++++++++++++++++++++++++++++++++++++++++---
 include/linux/device.h       |    6 ++
 include/linux/pm.h           |    3 +
 include/linux/resume-trace.h |    7 ++
 4 files changed, 125 insertions(+), 6 deletions(-)

Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -26,6 +26,7 @@
 #include <linux/spinlock.h>
 #include <linux/wait.h>
 #include <linux/timer.h>
+#include <linux/completion.h>
 
 /*
  * Callbacks for platform drivers to implement.
@@ -412,9 +413,11 @@ struct dev_pm_info {
 	pm_message_t		power_state;
 	unsigned int		can_wakeup:1;
 	unsigned int		should_wakeup:1;
+	unsigned		async_suspend:1;
 	enum dpm_state		status;		/* Owned by the PM core */
 #ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
+	struct completion	completion;
 #endif
 #ifdef CONFIG_PM_RUNTIME
 	struct timer_list	suspend_timer;
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -25,6 +25,7 @@
 #include <linux/resume-trace.h>
 #include <linux/rwsem.h>
 #include <linux/interrupt.h>
+#include <linux/async.h>
 
 #include "../base.h"
 #include "power.h"
@@ -42,6 +43,7 @@
 LIST_HEAD(dpm_list);
 
 static DEFINE_MUTEX(dpm_list_mtx);
+static pm_message_t pm_transition;
 
 /*
  * Set once the preparation of devices for a PM transition has started, reset
@@ -56,6 +58,7 @@ static bool transition_started;
 void device_pm_init(struct device *dev)
 {
 	dev->power.status = DPM_ON;
+	init_completion(&dev->power.completion);
 	pm_runtime_init(dev);
 }
 
@@ -111,6 +114,7 @@ void device_pm_remove(struct device *dev
 	pr_debug("PM: Removing info for %s:%s\n",
 		 dev->bus ? dev->bus->name : "No Bus",
 		 kobject_name(&dev->kobj));
+	complete_all(&dev->power.completion);
 	mutex_lock(&dpm_list_mtx);
 	list_del_init(&dev->power.entry);
 	mutex_unlock(&dpm_list_mtx);
@@ -162,6 +166,31 @@ void device_pm_move_last(struct device *
 }
 
 /**
+ * dpm_wait - Wait for a PM operation to complete.
+ * @dev: Device to wait for.
+ * @async: If unset, wait only if the device's power.async_suspend flag is set.
+ */
+static void dpm_wait(struct device *dev, bool async)
+{
+	if (!dev)
+		return;
+
+	if (async || dev->power.async_suspend)
+		wait_for_completion(&dev->power.completion);
+}
+
+static int dpm_wait_fn(struct device *dev, void *async_ptr)
+{
+	dpm_wait(dev, *((bool *)async_ptr));
+	return 0;
+}
+
+static void dpm_wait_for_children(struct device *dev, bool async)
+{
+       device_for_each_child(dev, &async, dpm_wait_fn);
+}
+
+/**
  * pm_op - Execute the PM operation appropriate for given PM event.
  * @dev: Device to handle.
  * @ops: PM operations to choose from.
@@ -381,17 +410,19 @@ void dpm_resume_noirq(pm_message_t state
 EXPORT_SYMBOL_GPL(dpm_resume_noirq);
 
 /**
- * device_resume - Execute "resume" callbacks for given device.
+ * __device_resume - Execute "resume" callbacks for given device.
  * @dev: Device to handle.
  * @state: PM transition of the system being carried out.
+ * @async: If true, the device is being resumed asynchronously.
  */
-static int device_resume(struct device *dev, pm_message_t state)
+static int __device_resume(struct device *dev, pm_message_t state, bool async)
 {
 	int error = 0;
 
 	TRACE_DEVICE(dev);
 	TRACE_RESUME(0);
 
+	dpm_wait(dev->parent, async);
 	down(&dev->sem);
 
 	if (dev->bus) {
@@ -426,11 +457,36 @@ static int device_resume(struct device *
 	}
  End:
 	up(&dev->sem);
+	complete_all(&dev->power.completion);
 
 	TRACE_RESUME(error);
 	return error;
 }
 
+static void async_resume(void *data, async_cookie_t cookie)
+{
+	struct device *dev = (struct device *)data;
+	int error;
+
+	error = __device_resume(dev, pm_transition, true);
+	if (error)
+		pm_dev_err(dev, pm_transition, " async", error);
+	put_device(dev);
+}
+
+static int device_resume(struct device *dev)
+{
+	INIT_COMPLETION(dev->power.completion);
+
+	if (dev->power.async_suspend && !pm_trace_is_enabled()) {
+		get_device(dev);
+		async_schedule(async_resume, dev);
+		return 0;
+	}
+
+	return __device_resume(dev, pm_transition, false);
+}
+
 /**
  * dpm_resume - Execute "resume" callbacks for non-sysdev devices.
  * @state: PM transition of the system being carried out.
@@ -444,6 +500,7 @@ static void dpm_resume(pm_message_t stat
 
 	INIT_LIST_HEAD(&list);
 	mutex_lock(&dpm_list_mtx);
+	pm_transition = state;
 	while (!list_empty(&dpm_list)) {
 		struct device *dev = to_device(dpm_list.next);
 
@@ -454,7 +511,7 @@ static void dpm_resume(pm_message_t stat
 			dev->power.status = DPM_RESUMING;
 			mutex_unlock(&dpm_list_mtx);
 
-			error = device_resume(dev, state);
+			error = device_resume(dev);
 
 			mutex_lock(&dpm_list_mtx);
 			if (error)
@@ -469,6 +526,7 @@ static void dpm_resume(pm_message_t stat
 	}
 	list_splice(&list, &dpm_list);
 	mutex_unlock(&dpm_list_mtx);
+	async_synchronize_full();
 }
 
 /**
@@ -623,17 +681,24 @@ int dpm_suspend_noirq(pm_message_t state
 }
 EXPORT_SYMBOL_GPL(dpm_suspend_noirq);
 
+static int async_error;
+
 /**
  * device_suspend - Execute "suspend" callbacks for given device.
  * @dev: Device to handle.
  * @state: PM transition of the system being carried out.
+ * @async: If true, the device is being suspended asynchronously.
  */
-static int device_suspend(struct device *dev, pm_message_t state)
+static int __device_suspend(struct device *dev, pm_message_t state, bool async)
 {
 	int error = 0;
 
+	dpm_wait_for_children(dev, async);
 	down(&dev->sem);
 
+	if (async_error)
+		goto End;
+
 	if (dev->class) {
 		if (dev->class->pm) {
 			pm_dev_dbg(dev, state, "class ");
@@ -666,12 +731,44 @@ static int device_suspend(struct device 
 			suspend_report_result(dev->bus->suspend, error);
 		}
 	}
+
+	if (!error)
+		dev->power.status = DPM_OFF;
+
  End:
 	up(&dev->sem);
+	complete_all(&dev->power.completion);
 
 	return error;
 }
 
+static void async_suspend(void *data, async_cookie_t cookie)
+{
+	struct device *dev = (struct device *)data;
+	int error;
+
+	error = __device_suspend(dev, pm_transition, true);
+	if (error) {
+		pm_dev_err(dev, pm_transition, " async", error);
+		async_error = error;
+	}
+
+	put_device(dev);
+}
+
+static int device_suspend(struct device *dev)
+{
+	INIT_COMPLETION(dev->power.completion);
+
+	if (dev->power.async_suspend) {
+		get_device(dev);
+		async_schedule(async_suspend, dev);
+		return 0;
+	}
+
+	return __device_suspend(dev, pm_transition, false);
+}
+
 /**
  * dpm_suspend - Execute "suspend" callbacks for all non-sysdev devices.
  * @state: PM transition of the system being carried out.
@@ -683,13 +780,15 @@ static int dpm_suspend(pm_message_t stat
 
 	INIT_LIST_HEAD(&list);
 	mutex_lock(&dpm_list_mtx);
+	pm_transition = state;
+	async_error = 0;
 	while (!list_empty(&dpm_list)) {
 		struct device *dev = to_device(dpm_list.prev);
 
 		get_device(dev);
 		mutex_unlock(&dpm_list_mtx);
 
-		error = device_suspend(dev, state);
+		error = device_suspend(dev);
 
 		mutex_lock(&dpm_list_mtx);
 		if (error) {
@@ -697,13 +796,17 @@ static int dpm_suspend(pm_message_t stat
 			put_device(dev);
 			break;
 		}
-		dev->power.status = DPM_OFF;
 		if (!list_empty(&dev->power.entry))
 			list_move(&dev->power.entry, &list);
 		put_device(dev);
+		if (async_error)
+			break;
 	}
 	list_splice(&list, dpm_list.prev);
 	mutex_unlock(&dpm_list_mtx);
+	async_synchronize_full();
+	if (!error)
+		error = async_error;
 	return error;
 }
 
Index: linux-2.6/include/linux/resume-trace.h
===================================================================
--- linux-2.6.orig/include/linux/resume-trace.h
+++ linux-2.6/include/linux/resume-trace.h
@@ -6,6 +6,11 @@
 
 extern int pm_trace_enabled;
 
+static inline int pm_trace_is_enabled(void)
+{
+       return pm_trace_enabled;
+}
+
 struct device;
 extern void set_trace_device(struct device *);
 extern void generate_resume_trace(const void *tracedata, unsigned int user);
@@ -17,6 +22,8 @@ extern void generate_resume_trace(const 
 
 #else
 
+static inline int pm_trace_is_enabled(void) { return 0; }
+
 #define TRACE_DEVICE(dev) do { } while (0)
 #define TRACE_RESUME(dev) do { } while (0)
 
Index: linux-2.6/include/linux/device.h
===================================================================
--- linux-2.6.orig/include/linux/device.h
+++ linux-2.6/include/linux/device.h
@@ -472,6 +472,12 @@ static inline int device_is_registered(s
 	return dev->kobj.state_in_sysfs;
 }
 
+static inline void device_enable_async_suspend(struct device *dev, bool enable)
+{
+	if (dev->power.status == DPM_ON)
+		dev->power.async_suspend = enable;
+}
+
 void driver_init(void);
 
 /*

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
  2009-12-12 17:48                     ` Rafael J. Wysocki
@ 2009-12-12 18:54                       ` Linus Torvalds
  2009-12-12 22:34                         ` Rafael J. Wysocki
                                           ` (2 more replies)
  0 siblings, 3 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-12 18:54 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, LKML, pm list

On Sat, 12 Dec 2009, Rafael J. Wysocki wrote:
> 
> I'd like to put it into my tree in this form, if you don't mind.

This version still has a major problem, which is not related to 
completions vs rwsems, but simply to the fact that you wanted to do this 
at the generic device layer level rather than do it at the actual 
low-level suspend/resume level.

Namely that there's no apparent sane way to say "don't wait for children".

PCI bridges that don't suspend at all - or any other device that only 
suspends in the 'suspend_late()' thing, for that matter - don't have any 
reason what-so-ever to wait for children, since they aren't actually 
suspending in the first place. But you make them wait regardless, which 
then serializes things unnecessarily (for example, two unrelated USB 
controllers).

And no, making _everything_ be async is _not_ the answer.

			Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
  2009-12-12 18:54                       ` Linus Torvalds
@ 2009-12-12 22:34                         ` Rafael J. Wysocki
  2009-12-12 22:40                           ` Rafael J. Wysocki
                                             ` (2 more replies)
  2009-12-13 13:08                         ` Rafael J. Wysocki
  2009-12-13 17:30                         ` Alan Stern
  2 siblings, 3 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-12 22:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Saturday 12 December 2009, Linus Torvalds wrote:
> 
> On Sat, 12 Dec 2009, Rafael J. Wysocki wrote:
> > 
> > I'd like to put it into my tree in this form, if you don't mind.
> 
> This version still has a major problem, which is not related to 
> completions vs rwsems, but simply to the fact that you wanted to do this 
> at the generic device layer level rather than do it at the actual 
> low-level suspend/resume level.
> 
> Namely that there's no apparent sane way to say "don't wait for children".
> 
> PCI bridges that don't suspend at all - or any other device that only 
> suspends in the 'suspend_late()' thing, for that matter - don't have any 
> reason what-so-ever to wait for children, since they aren't actually 
> suspending in the first place. But you make them wait regardless, which 
> then serializes things unnecessarily (for example, two unrelated USB 
> controllers).

This is a problem that needs to be solved.

One solution that we have discussed on linux-pm is to start a bunch of async
threads searching for async devices that can be suspended and suspending
them (assuming suspend is considered) out of order with respect to dpm_list.
For example, leaf async devices can always be suspended at the same time
regardless of their positions in dpm_list.  This way we could get almost the
entire gain resulting from suspending or resuming devices in parallel without
bothering drivers with the problem of dependencies that need to be honoured.

That's something we can add on top of this patch, though, not to complicate
things from the start and it surely requires more discussion.

> And no, making _everything_ be async is _not_ the answer.

I'm not sure what you mean, really.

Speaking of PCI bridges, even though they don't "suspend" in the sense of
being put into low power states or something, we still need to save their
registers on suspend and restore them on resume, and that restore has to
be done before we start to access devices below the bridge.

There are devices with totally null suspend and resume routines that even
the bus type doesn't really handle, but those can be marked as "async" from
the start and they won't really get in the way any more (this creates another
issue to solve, namely that we shouldn't really start a new async thread for
each of them; we have considered that too).

Even if we move that all to drivers, the constraints won't go away and someone
will have to take care of them.  Now, since _we_ have problems with reaching
an agreement about how to do it, the driver writers will be even less likely to
figure that out.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
  2009-12-12 22:34                         ` Rafael J. Wysocki
@ 2009-12-12 22:40                           ` Rafael J. Wysocki
  2009-12-14 18:21                           ` Linus Torvalds
       [not found]                           ` <alpine.LFD.2.00.0912141015240.26135@localhost.localdomain>
  2 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-12 22:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Saturday 12 December 2009, Rafael J. Wysocki wrote:
> On Saturday 12 December 2009, Linus Torvalds wrote:
> > 
> > On Sat, 12 Dec 2009, Rafael J. Wysocki wrote:
> > > 
...
> 
> > And no, making _everything_ be async is _not_ the answer.
> 
> I'm not sure what you mean, really.
> 
> Speaking of PCI bridges, even though they don't "suspend" in the sense of
> being put into low power states or something, we still need to save their
> registers on suspend and restore them on resume, and that restore has to
> be done before we start to access devices below the bridge.

Of course we restore them at the early stage now so the above remark does't
apply to the patch in question, sorry.

But the one below does.

> Even if we move that all to drivers, the constraints won't go away and someone
> will have to take care of them.  Now, since _we_ have problems with reaching
> an agreement about how to do it, the driver writers will be even less likely to
> figure that out.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
  2009-12-12 22:34                         ` Rafael J. Wysocki
  2009-12-12 22:40                           ` Rafael J. Wysocki
@ 2009-12-14 18:21                           ` Linus Torvalds
       [not found]                           ` <alpine.LFD.2.00.0912141015240.26135@localhost.localdomain>
  2 siblings, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-14 18:21 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, LKML, pm list

On Sat, 12 Dec 2009, Rafael J. Wysocki wrote:
> 
> One solution that we have discussed on linux-pm is to start a bunch of async
> threads searching for async devices that can be suspended and suspending
> them (assuming suspend is considered) out of order with respect to dpm_list.

Ok, guys, stop the crazy.

That's another of those "ok, that's just ttoally stupid and clearly too 
complex" ideas that I would never pull.

I should seriously suggest that people just stop discussing architectural 
details on the pm list if they all end up being this level of crazy.

The sane thing to do is to just totally ignore the async layer on PCI 
bridges and other things that only have a late-suspend/early-resume thing. 
No need for the above kind of obviously idiotic crap.

However, my point was really that we wouldn't even have _needed_ that kind 
of special case if we had just decided to let the subsystems do it. But 
whatever. At worst, the PCI layer can even just mark such devices with 
just late/early suspend/resume as being asynchronous, even though that 
ends up resulting in some totally pointless async work that doesn't do 
anything.

But please guys - reign in the crazy ideas on the pm list. It's not like 
our suspend/resume has gotten so stable as to be boring, and we want it to 
become unreliable again.

			Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <alpine.LFD.2.00.0912141015240.26135@localhost.localdomain>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                           ` <alpine.LFD.2.00.0912141015240.26135@localhost.localdomain>
@ 2009-12-14 22:11                             ` Rafael J. Wysocki
       [not found]                             ` <200912142311.31658.rjw@sisk.pl>
  1 sibling, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-14 22:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Monday 14 December 2009, Linus Torvalds wrote:
> 
> On Sat, 12 Dec 2009, Rafael J. Wysocki wrote:
> > 
> > One solution that we have discussed on linux-pm is to start a bunch of async
> > threads searching for async devices that can be suspended and suspending
> > them (assuming suspend is considered) out of order with respect to dpm_list.
> 
> Ok, guys, stop the crazy.
> 
> That's another of those "ok, that's just ttoally stupid and clearly too 
> complex" ideas that I would never pull.
> 
> I should seriously suggest that people just stop discussing architectural 
> details on the pm list if they all end up being this level of crazy.
> 
> The sane thing to do is to just totally ignore the async layer on PCI 
> bridges and other things that only have a late-suspend/early-resume thing. 
> No need for the above kind of obviously idiotic crap.
> 
> However, my point was really that we wouldn't even have _needed_ that kind 
> of special case if we had just decided to let the subsystems do it. But 
> whatever. At worst, the PCI layer can even just mark such devices with 
> just late/early suspend/resume as being asynchronous, even though that 
> ends up resulting in some totally pointless async work that doesn't do 
> anything.
> 
> But please guys - reign in the crazy ideas on the pm list. It's not like 
> our suspend/resume has gotten so stable as to be boring, and we want it to 
> become unreliable again.

Indeed.

OK, what about a two-pass approach in which the first pass only inits the
completions and starts async threads for leaf "async" devices?  I think leaf
devices are most likely to take much time to suspend, so this will give us
a chance to save quite some time.

A more aggressive version of this might start the async threads for all async
devices in the first pass and then only handle the sychronous ones in the
second pass - as long as there are only a few async devices that should be
quite efficient.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912142311.31658.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                             ` <200912142311.31658.rjw@sisk.pl>
@ 2009-12-14 22:41                               ` Linus Torvalds
       [not found]                               ` <alpine.LFD.2.00.0912141416040.26135@localhost.localdomain>
  1 sibling, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-14 22:41 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, LKML, pm list



On Mon, 14 Dec 2009, Rafael J. Wysocki wrote:
>
> OK, what about a two-pass approach in which the first pass only inits the
> completions and starts async threads for leaf "async" devices?  I think leaf
> devices are most likely to take much time to suspend, so this will give us
> a chance to save quite some time.

Why?

Really.

Again, stop making it harder than it needs to be.

Why do you make up these crazy schemes that are way more complex than they 
need to be?

Here's an untested one-liner that has a 10-line comment.

I agree it is ugly, but it is ugly exactly because the generic device 
layer _forces_ us to wait for children even when we don't want to. With 
this, that unnecessary wait is now done asynchronously.

I'd rather do it some other way - perhaps having an explicit flag that 
says "don't wait for children because I'm not going to suspend myself 
until 'suspend_late' _anyway_". But at least this is _simple_.

		Linus

---
 drivers/pci/probe.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 98ffb2d..4e0ad7b 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -437,6 +437,17 @@ static struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent,
 	}
 	bridge->subordinate = child;
 
+	/*
+	 * We don't really suspend PCI buses asyncronously.
+	 *
+	 * However, since we don't actually suspend them at all until
+	 * the late phase, we might as well lie to the device layer
+	 * and it to do our no-op not-suspend asynchronously, so that
+	 * we end up not synchronizing with any of our child devices
+	 * that might want to be asynchronous.
+	 */
+	bridge->dev.power.async_suspend = 1;
+
 	return child;
 }
 

^ permalink raw reply related	[flat|nested] 98+ messages in thread

[parent not found: <alpine.LFD.2.00.0912141416040.26135@localhost.localdomain>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                               ` <alpine.LFD.2.00.0912141416040.26135@localhost.localdomain>
@ 2009-12-14 22:43                                 ` Linus Torvalds
  2009-12-14 23:18                                 ` Rafael J. Wysocki
       [not found]                                 ` <200912150018.11837.rjw@sisk.pl>
  2 siblings, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-14 22:43 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, LKML, pm list



On Mon, 14 Dec 2009, Linus Torvalds wrote:
> 
> Here's an untested one-liner that has a 10-line comment.

Btw, when I say "untested", in this case I mean that it isn't even 
compile-tested. I haven't merged your other patches yet, so in my tree 
that 'async_suspend' flag doesn't even exist, and the patch I sent out 
definitely doesn't compile.

But it _might_ compile (and perhaps even work) in your tree.

		Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                               ` <alpine.LFD.2.00.0912141416040.26135@localhost.localdomain>
  2009-12-14 22:43                                 ` Linus Torvalds
@ 2009-12-14 23:18                                 ` Rafael J. Wysocki
       [not found]                                 ` <200912150018.11837.rjw@sisk.pl>
  2 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-14 23:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Monday 14 December 2009, Linus Torvalds wrote:
> 
> On Mon, 14 Dec 2009, Rafael J. Wysocki wrote:
> >
> > OK, what about a two-pass approach in which the first pass only inits the
> > completions and starts async threads for leaf "async" devices?  I think leaf
> > devices are most likely to take much time to suspend, so this will give us
> > a chance to save quite some time.
> 
> Why?
> 
> Really.

Because the PCI bridges are not the only case where it matters (I'd say they
are really a corner case).  Basically, any two async devices separeted by a
series of sync ones are likely not to be suspended (or resumed) in parallel
with each other, because the parent is usually next to its children in dpm_list.
So, if the first device suspends, its "synchronous" parent waits for it and the
suspend of the second async device won't be started until the first one's
suspend has returned.  And it doesn't matter at what level we do the async
thing, because dpm_list is there anyway.

As Alan said, the real problem is that we generally can't change the ordering
of dpm_list arbitrarily, because we don't know what's going to happen as a
result.  The async_suspend flag tells us, basically, what devices can be safely
moved to different positions in dpm_list without breaking things, as long as
they are not moved behind their parents or in front of their children.

Starting the async suspends upfront would effectively work in the same way as
moving those devices to the beginning of dpm_list without breaking the
parent-child chains, which in turn is likely to allow us to save some extra
time.

That's not only about the PCI bridges, it's more general.  As far as your
one-liner is concerned, I'm going to test it, because I think we could use it
anyway.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912150018.11837.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                 ` <200912150018.11837.rjw@sisk.pl>
@ 2009-12-15  0:10                                   ` Linus Torvalds
       [not found]                                   ` <alpine.LFD.2.00.0912141609020.14385@localhost.localdomain>
  1 sibling, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-15  0:10 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, LKML, pm list



On Tue, 15 Dec 2009, Rafael J. Wysocki wrote:
> 
> Because the PCI bridges are not the only case where it matters (I'd say they
> are really a corner case).  Basically, any two async devices separeted by a
> series of sync ones are likely not to be suspended (or resumed) in parallel
> with each other, because the parent is usually next to its children in dpm_list.

Give a real example that matters.

Really. 

How hard can it be to understand: KISS. Keep It Simple, Stupid.

I get really tired of this whole stupid async discussion, because you're 
overdesigning it.

To a first approximation, THE ONLY THING THAT MATTERS IS USB.

		Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <alpine.LFD.2.00.0912141609020.14385@localhost.localdomain>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                   ` <alpine.LFD.2.00.0912141609020.14385@localhost.localdomain>
@ 2009-12-15  0:11                                     ` Linus Torvalds
  2009-12-15 11:03                                     ` Rafael J. Wysocki
                                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-15  0:11 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, LKML, pm list



On Mon, 14 Dec 2009, Linus Torvalds wrote:
> 
> I get really tired of this whole stupid async discussion, because you're 
> overdesigning it.

Btw, this is important. I'm not going to pull even your _current_ async 
stuff if you can't show that you fundamentally UNDERSTAND this fact.

Stop making up idiotic complex interfaces. Look at my one-liner patch, and 
realize that it gets you 99% there - the 99% that matters.

		Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                   ` <alpine.LFD.2.00.0912141609020.14385@localhost.localdomain>
  2009-12-15  0:11                                     ` Linus Torvalds
@ 2009-12-15 11:03                                     ` Rafael J. Wysocki
       [not found]                                     ` <alpine.LFD.2.00.0912141610460.14385@localhost.localdomain>
       [not found]                                     ` <200912151203.22916.rjw@sisk.pl>
  3 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-15 11:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Tuesday 15 December 2009, Linus Torvalds wrote:
> 
> On Tue, 15 Dec 2009, Rafael J. Wysocki wrote:
> > 
> > Because the PCI bridges are not the only case where it matters (I'd say they
> > are really a corner case).  Basically, any two async devices separeted by a
> > series of sync ones are likely not to be suspended (or resumed) in parallel
> > with each other, because the parent is usually next to its children in dpm_list.
> 
> Give a real example that matters.

I'll try.  Let -> denote child-parent relationships and assume dpm_list looks
like this:

..., A->B->C, D, E->F->G, ...

where A, B, E, F are all async and C, D, G are sync (E, F, G may be USB and
A, B, C may be serio input devices and D is a device that just happens to be in
dpm_list between them).  Say A and C take the majority of the total suspend
time and assume we traverse the dpm_list from left to right.

Now, during suspend, C waits for B that waits for A and G waits for F that
waits for E.  Moreover, since C is sync, the PM core won't start the suspend
of D until the suspend of C has returned.  In turn, since D is sync, the
suspend of E won't be started until the suspend of D has returned.  So in
this situation the gain from the async suspends of A, B, E, F is zero.

However, it won't be zero if we start the async suspends of A, B, E, F
upfront.

I'm not sure if this is sufficiently "real life" for you, but this is how
dpm_list looks on one of my test boxes, more or less.

> Really. 
> 
> How hard can it be to understand: KISS. Keep It Simple, Stupid.
> 
> I get really tired of this whole stupid async discussion, because you're 
> overdesigning it.
> 
> To a first approximation, THE ONLY THING THAT MATTERS IS USB.

If this applies to _resume_ only, then I agree, but the Arjan's data clearly
show that serio devices take much more time to suspend than USB.

But if we only talk about resume, the PCI bridges don't really matter,
because they are resumed before all devices that depend on them, so they don't
really need to wait for anyone anyway.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <alpine.LFD.2.00.0912141610460.14385@localhost.localdomain>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                     ` <alpine.LFD.2.00.0912141610460.14385@localhost.localdomain>
@ 2009-12-15 11:14                                       ` Rafael J. Wysocki
       [not found]                                       ` <200912151214.10980.rjw@sisk.pl>
  1 sibling, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-15 11:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Tuesday 15 December 2009, Linus Torvalds wrote:
> 
> On Mon, 14 Dec 2009, Linus Torvalds wrote:
> > 
> > I get really tired of this whole stupid async discussion, because you're 
> > overdesigning it.
> 
> Btw, this is important. I'm not going to pull even your _current_ async 
> stuff if you can't show that you fundamentally UNDERSTAND this fact.

What fact?  The only thing that matters is USB?  For resume, it is.  For
suspend, it clearly isn't.

> Stop making up idiotic complex interfaces. Look at my one-liner patch, and 
> realize that it gets you 99% there - the 99% that matters.

I said I was going to use it, but I don't think that's going to be sufficient.

[BTW, I'm not sure what you want to achieve by insulting me.  Either you may
want to scare me, but I'm not scared, or you may want to try to make me so
disgusted that I'll just give up and back off, but this is not going to happen
either.]

Insults aside, I'm going to make some measurements to see how much time we can
save.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912151214.10980.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                       ` <200912151214.10980.rjw@sisk.pl>
@ 2009-12-15 15:31                                         ` Linus Torvalds
  0 siblings, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-15 15:31 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, LKML, pm list

On Tue, 15 Dec 2009, Rafael J. Wysocki wrote:
> 
> What fact?  The only thing that matters is USB?  For resume, it is.  For
> suspend, it clearly isn't.

For suspend, the only other case we've seen has been the keyboard and 
mouse controller, which has exactly the same "we can special case it with 
a single 'let's do _this_ device asynchronously'". Again, it may not be 
pretty, but it sure is simple.

Much simpler than talking about some generic infrastructure changes and 
about doing "let's do leaves of the tree separately" schemes.

And that's why I'm _soo_ unhappy with you, and am insulting you. Because 
you keep on making the same mistake over and over - overdesigning.

Overdesigning is a SIN. It's the archetypal example of what I call "bad 
taste". I get really upset when a subsystem maintainer starts 
overdesigning things.

			Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912151203.22916.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                     ` <200912151203.22916.rjw@sisk.pl>
@ 2009-12-15 15:26                                       ` Linus Torvalds
       [not found]                                       ` <alpine.LFD.2.00.0912150722310.14385@localhost.localdomain>
  1 sibling, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-15 15:26 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, LKML, pm list

On Tue, 15 Dec 2009, Rafael J. Wysocki wrote:
> > 
> > Give a real example that matters.
> 
> I'll try.  Let -> denote child-parent relationships and assume dpm_list looks
> like this:

No. 

I mean something real - something like

 - if you run on a non-PC with two USB buses behind non-PCI controllers.

 - device xyz.

> If this applies to _resume_ only, then I agree, but the Arjan's data clearly
> show that serio devices take much more time to suspend than USB.

I mean in general - something where you actually have hard data that some 
device really needs anythign more than my one-liner, and really _needs_ 
some complex infrastructure.

Not "let's imagine a case like xyz".

> But if we only talk about resume, the PCI bridges don't really matter,
> because they are resumed before all devices that depend on them, so they don't
> really need to wait for anyone anyway.

But that's my _point_. That's the whole point of the one-liner patch. Read 
the comment above that one-liner.

My whole point was that by doing the whole "wait for children" in generic 
code, you also made devices - such as PCI bridges - have to wait for 
children, even though they don't need to, and don't want to.

So I suggested an admittedly ugly hack to take care of it - rather than 
some complex infrastructure.

		Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <alpine.LFD.2.00.0912150722310.14385@localhost.localdomain>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                       ` <alpine.LFD.2.00.0912150722310.14385@localhost.localdomain>
@ 2009-12-15 15:55                                         ` Alan Stern
  2009-12-16  2:11                                         ` Rafael J. Wysocki
       [not found]                                         ` <200912160311.05915.rjw@sisk.pl>
  2 siblings, 0 replies; 98+ messages in thread
From: Alan Stern @ 2009-12-15 15:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Tue, 15 Dec 2009, Linus Torvalds wrote:

> My whole point was that by doing the whole "wait for children" in generic 
> code, you also made devices - such as PCI bridges - have to wait for 
> children, even though they don't need to, and don't want to.
> 
> So I suggested an admittedly ugly hack to take care of it - rather than 
> some complex infrastructure.

It doesn't feel like an ugly hack to me.  It seems like exactly the 
Right Thing To Do: Make as many devices as possible use async 
suspend/resume.

The only reason we don't make every device async is because we don't
know whether it's safe.  In the case of PCI bridges we _do_ know --
because they don't have any work to do outside of
late_suspend/early_resume -- and so they _should_ be async.

The same goes for devices that don't have suspend or resume methods.

There remains a separate question: Should async devices also be forced
to wait for their children?  I don't see why not.  For PCI bridges it
won't make any significant difference.  As long as the async code
doesn't have to do anything, who cares when it runs?

Alan Stern

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                       ` <alpine.LFD.2.00.0912150722310.14385@localhost.localdomain>
  2009-12-15 15:55                                         ` Alan Stern
@ 2009-12-16  2:11                                         ` Rafael J. Wysocki
       [not found]                                         ` <200912160311.05915.rjw@sisk.pl>
  2 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-16  2:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Tuesday 15 December 2009, Linus Torvalds wrote:
> 
> On Tue, 15 Dec 2009, Rafael J. Wysocki wrote:
> > > 
> > > Give a real example that matters.
> > 
> > I'll try.  Let -> denote child-parent relationships and assume dpm_list looks
> > like this:
> 
> No. 
> 
> I mean something real - something like
> 
>  - if you run on a non-PC with two USB buses behind non-PCI controllers.
> 
>  - device xyz.
> 
> > If this applies to _resume_ only, then I agree, but the Arjan's data clearly
> > show that serio devices take much more time to suspend than USB.
> 
> I mean in general - something where you actually have hard data that some 
> device really needs anythign more than my one-liner, and really _needs_ 
> some complex infrastructure.
> 
> Not "let's imagine a case like xyz".

As I said I would, I made some measurements.

I measured the total time of suspending and resuming devices as shown by the
code added by this patch:
http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67
on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they are quite
different and the HP was running 64-bit kernel and user space).

I took four cases into consideration:
(1) synchronous suspend and resume (/sys/power/pm_async = 0)
(2) asynchronous suspend and resume as introduced by the async branch at:
    http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async
(3) asynchronous suspend and resume like in (2), but with your one-liner setting
    the power.async_suspend flag for PCI bridges on top
(4) asynchronous suspend and resume like in (2), but with an extra patch that
    is appended on top

For those tests I set power.async_suspend for all USB devices, all serio input
devices, the ACPI battery and the USB PCI controllers (to see the impact of the
one-liner, if any).

I carried out 5 consecutive suspend-resume cycles (started from under X) on
each box in each case, and the raw data are here (all times in milliseconds):
http://www.sisk.pl/kernel/data/async-suspend.pdf

The summarized data are below (the "big" numbers are averages and the +/-
numbers are standard deviations, all in milliseconds):

			HP nx6325		MSI Wind U100

sync suspend		1482 (+/- 40)	1180 (+/- 24)
sync resume		2955 (+/- 2)	3597 (+/- 25)

async suspend		1553 (+/- 49)	1177 (+/- 32)
async resume		2692 (+/- 326)	3556  (+/- 33)

async+one-liner suspend	1600 (+/- 39)	1212 (+/- 41)
async+one-liner resume	2692 (+/- 324)	3579 (+/- 24)

async+extra suspend	1496 (+/- 37)	1217 (+/- 38)
async+extra resume	1859 (+/- 114)	1923 (+/- 35)

So, in my opinion, with the above set of "async" devices, it doesn't
make sense to do async suspend at all, because the sync suspend is actually
the fastest on both machines.

However, it surely is worth doing async _resume_ with the extra patch appended
below, because that allows us to save 1 second or more on both machines with
respect to the sync case.  The other variants of async resume also produce some
time savings, but (on the nx6325) at the expense of huge fluctuations from one
cycle to another (so they can actually be slower than the sync resume).  Only
the async resume with the extra patch is consistently better than the sync one.
The impact of the one-liner is either negligible or slightly negative.

Now, what does the extra patch do?  Exactly the thing I was talking about, it
starts all async suspends and resumes upfront.

So, it looks like we both were wrong.  I was wrong, because I thought the
extra patch would help suspend, but not resume, while in fact it appears to
help resume big time.  You were wrong, because you thought that the one-liner
would have positive impact, while in fact it doesn't.

Concluding, at this point I'd opt for implementing asynchronous resume alone,
_without_ asynchronous suspend, which is more complicated and doesn't really
give us any time savings.  At the same time, I'd implement the asynchronous
resume in such a way that all of the async resume threads would be started
before the synchronous suspend thread, because that would give us the best
results.

Rafael

---
 drivers/base/power/main.c |   48 +++++++++++++++++++++++++++++-----------------
 1 file changed, 31 insertions(+), 17 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -523,14 +523,9 @@ static void async_resume(void *data, asy
 
 static int device_resume(struct device *dev)
 {
-	INIT_COMPLETION(dev->power.completion);
-
-	if (pm_async_enabled && dev->power.async_suspend
-	    && !pm_trace_is_enabled()) {
-		get_device(dev);
-		async_schedule(async_resume, dev);
+	if (dev->power.async_suspend && pm_async_enabled
+	    && !pm_trace_is_enabled())
 		return 0;
-	}
 
 	return __device_resume(dev, pm_transition, false);
 }
@@ -545,14 +540,28 @@ static int device_resume(struct device *
 static void dpm_resume(pm_message_t state)
 {
 	struct list_head list;
+	struct device *dev;
 	ktime_t starttime = ktime_get();
 
 	INIT_LIST_HEAD(&list);
 	mutex_lock(&dpm_list_mtx);
 	pm_transition = state;
-	while (!list_empty(&dpm_list)) {
-		struct device *dev = to_device(dpm_list.next);
 
+	list_for_each_entry(dev, &dpm_list, power.entry) {
+		if (dev->power.status < DPM_OFF)
+			continue;
+
+		INIT_COMPLETION(dev->power.completion);
+
+		if (dev->power.async_suspend && pm_async_enabled
+		    && !pm_trace_is_enabled()) {
+			get_device(dev);
+			async_schedule(async_resume, dev);
+		}
+	}
+
+	while (!list_empty(&dpm_list)) {
+		dev = to_device(dpm_list.next);
 		get_device(dev);
 		if (dev->power.status >= DPM_OFF) {
 			int error;
@@ -809,13 +818,8 @@ static void async_suspend(void *data, as
 
 static int device_suspend(struct device *dev)
 {
-	INIT_COMPLETION(dev->power.completion);
-
-	if (pm_async_enabled && dev->power.async_suspend) {
-		get_device(dev);
-		async_schedule(async_suspend, dev);
+	if (pm_async_enabled && dev->power.async_suspend)
 		return 0;
-	}
 
 	return __device_suspend(dev, pm_transition, false);
 }
@@ -827,6 +831,7 @@ static int device_suspend(struct device 
 static int dpm_suspend(pm_message_t state)
 {
 	struct list_head list;
+	struct device *dev;
 	ktime_t starttime = ktime_get();
 	int error = 0;
 
@@ -834,9 +839,18 @@ static int dpm_suspend(pm_message_t stat
 	mutex_lock(&dpm_list_mtx);
 	pm_transition = state;
 	async_error = 0;
-	while (!list_empty(&dpm_list)) {
-		struct device *dev = to_device(dpm_list.prev);
 
+	list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
+		INIT_COMPLETION(dev->power.completion);
+
+		if (pm_async_enabled && dev->power.async_suspend) {
+			get_device(dev);
+			async_schedule(async_suspend, dev);
+		}
+	}
+
+	while (!list_empty(&dpm_list)) {
+		dev = to_device(dpm_list.prev);
 		get_device(dev);
 		mutex_unlock(&dpm_list_mtx);
 

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912160311.05915.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                         ` <200912160311.05915.rjw@sisk.pl>
@ 2009-12-16  6:40                                           ` Dmitry Torokhov
  2009-12-16 15:22                                           ` Alan Stern
                                                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 98+ messages in thread
From: Dmitry Torokhov @ 2009-12-16  6:40 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, ACPI Devel Maling List, Linus Torvalds, pm list

On Wed, Dec 16, 2009 at 03:11:05AM +0100, Rafael J. Wysocki wrote:
> On Tuesday 15 December 2009, Linus Torvalds wrote:
> > 
> > On Tue, 15 Dec 2009, Rafael J. Wysocki wrote:
> > > > 
> > > > Give a real example that matters.
> > > 
> > > I'll try.  Let -> denote child-parent relationships and assume dpm_list looks
> > > like this:
> > 
> > No. 
> > 
> > I mean something real - something like
> > 
> >  - if you run on a non-PC with two USB buses behind non-PCI controllers.
> > 
> >  - device xyz.
> > 
> > > If this applies to _resume_ only, then I agree, but the Arjan's data clearly
> > > show that serio devices take much more time to suspend than USB.
> > 
> > I mean in general - something where you actually have hard data that some 
> > device really needs anythign more than my one-liner, and really _needs_ 
> > some complex infrastructure.
> > 
> > Not "let's imagine a case like xyz".
> 
> As I said I would, I made some measurements.
> 
> I measured the total time of suspending and resuming devices as shown by the
> code added by this patch:
> http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67
> on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they are quite
> different and the HP was running 64-bit kernel and user space).
> 
> I took four cases into consideration:
> (1) synchronous suspend and resume (/sys/power/pm_async = 0)
> (2) asynchronous suspend and resume as introduced by the async branch at:
>     http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async
> (3) asynchronous suspend and resume like in (2), but with your one-liner setting
>     the power.async_suspend flag for PCI bridges on top
> (4) asynchronous suspend and resume like in (2), but with an extra patch that
>     is appended on top
> 
> For those tests I set power.async_suspend for all USB devices, all serio input
> devices, the ACPI battery and the USB PCI controllers (to see the impact of the
> one-liner, if any).
> 
> I carried out 5 consecutive suspend-resume cycles (started from under X) on
> each box in each case, and the raw data are here (all times in milliseconds):
> http://www.sisk.pl/kernel/data/async-suspend.pdf
> 
> The summarized data are below (the "big" numbers are averages and the +/-
> numbers are standard deviations, all in milliseconds):
> 
> 			HP nx6325		MSI Wind U100
> 
> sync suspend		1482 (+/- 40)	1180 (+/- 24)
> sync resume		2955 (+/- 2)	3597 (+/- 25)
> 
> async suspend		1553 (+/- 49)	1177 (+/- 32)
> async resume		2692 (+/- 326)	3556  (+/- 33)
> 
> async+one-liner suspend	1600 (+/- 39)	1212 (+/- 41)
> async+one-liner resume	2692 (+/- 324)	3579 (+/- 24)
> 
> async+extra suspend	1496 (+/- 37)	1217 (+/- 38)
> async+extra resume	1859 (+/- 114)	1923 (+/- 35)
> 
> So, in my opinion, with the above set of "async" devices, it doesn't
> make sense to do async suspend at all, because the sync suspend is actually
> the fastest on both machines.

I think the async suspend is not asynchronous enough then - what kind of
time do you get if you simply comment out call to psmouse_reset() in
drivers/input/mouse/psmouse-base.c:psmouse_cleanup()?  (Just for testing
purposes only, I don't think we want to do that by default.)

-- 
Dmitry

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                         ` <200912160311.05915.rjw@sisk.pl>
  2009-12-16  6:40                                           ` Dmitry Torokhov
@ 2009-12-16 15:22                                           ` Alan Stern
  2009-12-16 15:47                                           ` Linus Torvalds
       [not found]                                           ` <20091216064025.GB2699@core.coreip.homeip.net>
  3 siblings, 0 replies; 98+ messages in thread
From: Alan Stern @ 2009-12-16 15:22 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Wed, 16 Dec 2009, Rafael J. Wysocki wrote:

> I measured the total time of suspending and resuming devices as shown by the
> code added by this patch:
> http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67
> on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they are quite
> different and the HP was running 64-bit kernel and user space).

> I carried out 5 consecutive suspend-resume cycles (started from under X) on
> each box in each case, and the raw data are here (all times in milliseconds):
> http://www.sisk.pl/kernel/data/async-suspend.pdf

I'd like to see much more detailed data.  For each device, let's get 
the device name, the parent's name, and the start time, end time, and 
duration for suspend or resume.  The start time should be measured when 
you have finished waiting for the children.  The end time should be 
measured just before the complete_all().

Alan Stern

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                         ` <200912160311.05915.rjw@sisk.pl>
  2009-12-16  6:40                                           ` Dmitry Torokhov
  2009-12-16 15:22                                           ` Alan Stern
@ 2009-12-16 15:47                                           ` Linus Torvalds
  2009-12-16 19:27                                             ` Rafael J. Wysocki
       [not found]                                             ` <200912162027.16574.rjw@sisk.pl>
       [not found]                                           ` <20091216064025.GB2699@core.coreip.homeip.net>
  3 siblings, 2 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-16 15:47 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, LKML, pm list



On Wed, 16 Dec 2009, Rafael J. Wysocki wrote:
> 
> The summarized data are below (the "big" numbers are averages and the +/-
> numbers are standard deviations, all in milliseconds):
> 
> 			HP nx6325		MSI Wind U100
> 
> sync suspend		1482 (+/- 40)	1180 (+/- 24)
> sync resume		2955 (+/- 2)	3597 (+/- 25)
> 
> async suspend		1553 (+/- 49)	1177 (+/- 32)
> async resume		2692 (+/- 326)	3556  (+/- 33)
> 
> async+one-liner suspend	1600 (+/- 39)	1212 (+/- 41)
> async+one-liner resume	2692 (+/- 324)	3579 (+/- 24)
> 
> async+extra suspend	1496 (+/- 37)	1217 (+/- 38)
> async+extra resume	1859 (+/- 114)	1923 (+/- 35)
> 
> So, in my opinion, with the above set of "async" devices, it doesn't
> make sense to do async suspend at all, because the sync suspend is actually
> the fastest on both machines.

Hmm. I certainly agree - your numbers do not seem to support any async at 
all.

However, I do note that for the "extra patch" makes a big difference at 
resume time. That implies that the resume serializes on some slow device 
that wasn't marked async - and starting the async ones early avoids that. 

But without the per-device timings, it's hard to even guess what device 
that was.

But even that doesn't really help the suspend cases, only resume.

Do you have any sample timing output with devices listed?

			Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
  2009-12-16 15:47                                           ` Linus Torvalds
@ 2009-12-16 19:27                                             ` Rafael J. Wysocki
       [not found]                                             ` <200912162027.16574.rjw@sisk.pl>
  1 sibling, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-16 19:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Wednesday 16 December 2009, Linus Torvalds wrote:
> 
> On Wed, 16 Dec 2009, Rafael J. Wysocki wrote:
> > 
> > The summarized data are below (the "big" numbers are averages and the +/-
> > numbers are standard deviations, all in milliseconds):
> > 
> > 			HP nx6325		MSI Wind U100
> > 
> > sync suspend		1482 (+/- 40)	1180 (+/- 24)
> > sync resume		2955 (+/- 2)	3597 (+/- 25)
> > 
> > async suspend		1553 (+/- 49)	1177 (+/- 32)
> > async resume		2692 (+/- 326)	3556  (+/- 33)
> > 
> > async+one-liner suspend	1600 (+/- 39)	1212 (+/- 41)
> > async+one-liner resume	2692 (+/- 324)	3579 (+/- 24)
> > 
> > async+extra suspend	1496 (+/- 37)	1217 (+/- 38)
> > async+extra resume	1859 (+/- 114)	1923 (+/- 35)
> > 
> > So, in my opinion, with the above set of "async" devices, it doesn't
> > make sense to do async suspend at all, because the sync suspend is actually
> > the fastest on both machines.
> 
> Hmm. I certainly agree - your numbers do not seem to support any async at 
> all.
> 
> However, I do note that for the "extra patch" makes a big difference at 
> resume time. That implies that the resume serializes on some slow device 
> that wasn't marked async - and starting the async ones early avoids that. 
> 
> But without the per-device timings, it's hard to even guess what device 
> that was.
> 
> But even that doesn't really help the suspend cases, only resume.
> 
> Do you have any sample timing output with devices listed?

I'm going to generate one shortly.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912162027.16574.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                             ` <200912162027.16574.rjw@sisk.pl>
@ 2009-12-16 20:59                                               ` Linus Torvalds
       [not found]                                               ` <alpine.LFD.2.00.0912161255080.3556@localhost.localdomain>
  1 sibling, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-16 20:59 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, LKML, pm list



On Wed, 16 Dec 2009, Rafael J. Wysocki wrote:
> > 
> > Do you have any sample timing output with devices listed?
> 
> I'm going to generate one shortly.

>From my bootup timings, I have this memory of SATA link bringup being 
noticeable. I wonder if that is the case on resume too...

		Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <alpine.LFD.2.00.0912161255080.3556@localhost.localdomain>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                               ` <alpine.LFD.2.00.0912161255080.3556@localhost.localdomain>
@ 2009-12-16 21:57                                                 ` Rafael J. Wysocki
       [not found]                                                 ` <200912162257.00771.rjw@sisk.pl>
  1 sibling, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-16 21:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Wednesday 16 December 2009, Linus Torvalds wrote:
> 
> On Wed, 16 Dec 2009, Rafael J. Wysocki wrote:
> > > 
> > > Do you have any sample timing output with devices listed?
> > 
> > I'm going to generate one shortly.

I've just put the first set of data, for the HP nx6325 at:
http://www.sisk.pl/kernel/data/nx6325/

The *-dmesg.log files contain full dmesg outputs starting from a cold boot and
including one suspend-resume cycle in each case, with debug_initcall enabled.

The *-suspend.log files are excerpts from the *-dmesg.log files containing
the suspend messages only, and analogously for *-resume.log.

The *-times.txt files contain suspend/resume time for every device sorted
in the decreasing order.

> From my bootup timings, I have this memory of SATA link bringup being 
> noticeable. I wonder if that is the case on resume too...

There's no SATA in the nx6325, only IDE, so we'd need to wait for the Wind data
(in the works).

The slowest suspending device in the nx6325 is the audio chip (surprise,
surprise), it takes ~220 ms alone.  Then - serio, but since i8042 was not
async, the async suspend of serio didn't really help (another ~140 ms).
Then network, FireWire, MMC, USB, SD host (~15 ms each).  [I think we can
help suspend a bit by making i8042 async, although I'm not sure that's going
to be safe.]

The slowest resuming are USB (by far) and then CardBus, audio, USB controllers,
FireWire, network and IDE (but that only takes about 7 ms).

But the main problem with async resume is that the USB devices are at the
beginning of dpm_list, so the resume of them is not even started until _all_ of
the slow devices behind them are woken up.  That's why the extra patch helps so
much IMO.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912162257.00771.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                 ` <200912162257.00771.rjw@sisk.pl>
@ 2009-12-16 22:11                                                   ` Linus Torvalds
       [not found]                                                   ` <alpine.LFD.2.00.0912161410120.3556@localhost.localdomain>
                                                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-16 22:11 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, LKML, pm list



Btw, what are the timings if you just force everything async? I think that 
worked on yur laptops, no?

It would be interestign to know - if only to see what the asymptotic upper 
bound is for all of this is..

		Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <alpine.LFD.2.00.0912161410120.3556@localhost.localdomain>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                   ` <alpine.LFD.2.00.0912161410120.3556@localhost.localdomain>
@ 2009-12-16 22:33                                                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-16 22:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Wednesday 16 December 2009, Linus Torvalds wrote:
> 
> Btw, what are the timings if you just force everything async? I think that 
> worked on yur laptops, no?

No, it didn't.  I could make all PCI async, provided that the ACPI subtree was
resumed before any PCI devices.  [Theoretically I can make that happen by
moving ACPI resume to the _noirq phase (just for testing of course).  So I can
try to make PCI async in addition to serio and USB, plus i8042 perhaps, which
should be sfficient for the nx6325 I think.]

Making all async always hanged the boxes on resume.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                 ` <200912162257.00771.rjw@sisk.pl>
  2009-12-16 22:11                                                   ` Linus Torvalds
       [not found]                                                   ` <alpine.LFD.2.00.0912161410120.3556@localhost.localdomain>
@ 2009-12-16 23:04                                                   ` Alan Stern
  2009-12-17  1:49                                                   ` Rafael J. Wysocki
  3 siblings, 0 replies; 98+ messages in thread
From: Alan Stern @ 2009-12-16 23:04 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Wed, 16 Dec 2009, Rafael J. Wysocki wrote:

> I've just put the first set of data, for the HP nx6325 at:
> http://www.sisk.pl/kernel/data/nx6325/
>  
> The *-dmesg.log files contain full dmesg outputs starting from a cold boot and
> including one suspend-resume cycle in each case, with debug_initcall enabled.
> 
> The *-suspend.log files are excerpts from the *-dmesg.log files containing
> the suspend messages only, and analogously for *-resume.log.

I've just started looking at the sync-suspend.log file.  What are all 
the '+' characters and " @ 3368" strings after the device names?

You didn't print out the parent name for each device, so the tree 
structure has been lost.

Why do those "sd 0:0:0:0 [sda]" messages appear in between two 
callbacks?  The cache-synchronization and the spin-down commands are
not executed asynchronously.

Alan Stern

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                 ` <200912162257.00771.rjw@sisk.pl>
                                                                     ` (2 preceding siblings ...)
  2009-12-16 23:04                                                   ` Alan Stern
@ 2009-12-17  1:49                                                   ` Rafael J. Wysocki
  2009-12-17 20:06                                                     ` Alan Stern
                                                                       ` (2 more replies)
  3 siblings, 3 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-17  1:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Wednesday 16 December 2009, Rafael J. Wysocki wrote:
> On Wednesday 16 December 2009, Linus Torvalds wrote:
> > 
> > On Wed, 16 Dec 2009, Rafael J. Wysocki wrote:
> > > > 
> > > > Do you have any sample timing output with devices listed?
> > > 
> > > I'm going to generate one shortly.
> 
> I've just put the first set of data, for the HP nx6325 at:
> http://www.sisk.pl/kernel/data/nx6325/

As I said in a message to Alan, the data were incomplete, because the original
Arjan's patch only covers bus types and device classes converted to
dev_pm_ops, which I only noticed earlier today.  So I added the appended patch
on top of the async tree and I applied a one-liner adding the name of the
parent to each device line during (regular) suspend and resume.

The new data sets are at:

http://www.sisk.pl/kernel/data/nx6325/
http://www.sisk.pl/kernel/data/wind/

and the format is the same as described below.

> The *-dmesg.log files contain full dmesg outputs starting from a cold boot and
> including one suspend-resume cycle in each case, with debug_initcall enabled.
> 
> The *-suspend.log files are excerpts from the *-dmesg.log files containing
> the suspend messages only, and analogously for *-resume.log.
> 
> The *-times.txt files contain suspend/resume time for every device sorted
> in the decreasing order.
> 
> > From my bootup timings, I have this memory of SATA link bringup being 
> > noticeable. I wonder if that is the case on resume too...

That actually is correct.  On the nx6325 suspend is totally dominated by disk
spindown, almost everything else is negligible compared to it (well, except for
the audio), so we can't go down below 1 s during suspend on this box.

On the Wind, disk spindown time is comparable with serio suspend time,
so at least in principle we should be able to get .5 s suspend on this box - 
if the disk spindown in async.

In turn, the resume on the Wind is dominated by disk spinup, so we can't
go below 1.5 s on this box during resume (notice that the "async+extra"
approach brings us close to this limit, although we could save .5 s more in
principle by making more devices async).

Resume on the nx6325 is a different story, though, as it is dominated by USB
and PCI devices, so marking those as async would probably bring us close to
the limit.

[Surprisingly enough to me some ACPI devices appear to take quite noticeable
amounts of time to resume on both boxes.]

Tomorrow I'll try to mark as many devices as reasonably possible as async
and see how the total suspend-resume times change.

Rafael

---
 drivers/base/power/main.c |   97 ++++++++++++++++++++++++++++++++++++----------
 1 file changed, 77 insertions(+), 20 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -165,6 +165,32 @@ void device_pm_move_last(struct device *
 	list_move_tail(&dev->power.entry, &dpm_list);
 }
 
+static ktime_t initcall_debug_start(struct device *dev)
+{
+	ktime_t calltime = ktime_set(0, 0);
+
+	if (initcall_debug) {
+		pr_info("calling  %s_i+ @ %i\n",
+				dev_name(dev), task_pid_nr(current));
+		calltime = ktime_get();
+	}
+
+	return calltime;
+}
+
+static void initcall_debug_report(struct device *dev, ktime_t calltime,
+				  int error)
+{
+	ktime_t delta, rettime;
+
+	if (initcall_debug) {
+		rettime = ktime_get();
+		delta = ktime_sub(rettime, calltime);
+		pr_info("call %s+ returned %d after %Ld usecs\n", dev_name(dev),
+			error, (unsigned long long)ktime_to_ns(delta) >> 10);
+	}
+}
+
 /**
  * dpm_wait - Wait for a PM operation to complete.
  * @dev: Device to wait for.
@@ -201,13 +227,9 @@ static int pm_op(struct device *dev,
 		 pm_message_t state)
 {
 	int error = 0;
-	ktime_t calltime, delta, rettime;
+	ktime_t calltime;
 
-	if (initcall_debug) {
-		pr_info("calling  %s+ @ %i\n",
-				dev_name(dev), task_pid_nr(current));
-		calltime = ktime_get();
-	}
+	calltime = initcall_debug_start(dev);
 
 	switch (state.event) {
 #ifdef CONFIG_SUSPEND
@@ -256,12 +278,7 @@ static int pm_op(struct device *dev,
 		error = -EINVAL;
 	}
 
-	if (initcall_debug) {
-		rettime = ktime_get();
-		delta = ktime_sub(rettime, calltime);
-		pr_info("call %s+ returned %d after %Ld usecs\n", dev_name(dev),
-			error, (unsigned long long)ktime_to_ns(delta) >> 10);
-	}
+	initcall_debug_report(dev, calltime, error);
 
 	return error;
 }
@@ -338,8 +355,9 @@ static int pm_noirq_op(struct device *de
 	if (initcall_debug) {
 		rettime = ktime_get();
 		delta = ktime_sub(rettime, calltime);
-		printk("initcall %s_i+ returned %d after %Ld usecs\n", dev_name(dev),
-			error, (unsigned long long)ktime_to_ns(delta) >> 10);
+		printk("initcall %s_i+ returned %d after %Ld usecs\n",
+			dev_name(dev), error,
+			(unsigned long long)ktime_to_ns(delta) >> 10);
 	}
 
 	return error;
@@ -456,6 +474,26 @@ void dpm_resume_noirq(pm_message_t state
 EXPORT_SYMBOL_GPL(dpm_resume_noirq);
 
 /**
+ * legacy_resume - Execute a legacy (bus or class) resume callback for device.
+ * dev: Device to resume.
+ * cb: Resume callback to execute.
+ */
+static int legacy_resume(struct device *dev, int (*cb)(struct device *dev))
+{
+	int error;
+	ktime_t calltime;
+
+	calltime = initcall_debug_start(dev);
+
+	error = cb(dev);
+	suspend_report_result(cb, error);
+
+	initcall_debug_report(dev, calltime, error);
+
+	return error;
+}
+
+/**
  * __device_resume - Execute "resume" callbacks for given device.
  * @dev: Device to handle.
  * @state: PM transition of the system being carried out.
@@ -477,7 +515,7 @@ static int __device_resume(struct device
 			error = pm_op(dev, dev->bus->pm, state);
 		} else if (dev->bus->resume) {
 			pm_dev_dbg(dev, state, "legacy ");
-			error = dev->bus->resume(dev);
+			error = legacy_resume(dev, dev->bus->resume);
 		}
 		if (error)
 			goto End;
@@ -498,7 +536,7 @@ static int __device_resume(struct device
 			error = pm_op(dev, dev->class->pm, state);
 		} else if (dev->class->resume) {
 			pm_dev_dbg(dev, state, "legacy class ");
-			error = dev->class->resume(dev);
+			error = legacy_resume(dev, dev->class->resume);
 		}
 	}
  End:
@@ -734,6 +772,27 @@ EXPORT_SYMBOL_GPL(dpm_suspend_noirq);
 static int async_error;
 
 /**
+ * legacy_suspend - Execute a legacy (bus or class) suspend callback for device.
+ * dev: Device to suspend.
+ * cb: Suspend callback to execute.
+ */
+static int legacy_suspend(struct device *dev, pm_message_t state,
+			  int (*cb)(struct device *dev, pm_message_t state))
+{
+	int error;
+	ktime_t calltime;
+
+	calltime = initcall_debug_start(dev);
+
+	error = cb(dev, state);
+	suspend_report_result(cb, error);
+
+	initcall_debug_report(dev, calltime, error);
+
+	return error;
+}
+
+/**
  * device_suspend - Execute "suspend" callbacks for given device.
  * @dev: Device to handle.
  * @state: PM transition of the system being carried out.
@@ -755,8 +814,7 @@ static int __device_suspend(struct devic
 			error = pm_op(dev, dev->class->pm, state);
 		} else if (dev->class->suspend) {
 			pm_dev_dbg(dev, state, "legacy class ");
-			error = dev->class->suspend(dev, state);
-			suspend_report_result(dev->class->suspend, error);
+			error = legacy_suspend(dev, state, dev->class->suspend);
 		}
 		if (error)
 			goto End;
@@ -777,8 +835,7 @@ static int __device_suspend(struct devic
 			error = pm_op(dev, dev->bus->pm, state);
 		} else if (dev->bus->suspend) {
 			pm_dev_dbg(dev, state, "legacy ");
-			error = dev->bus->suspend(dev, state);
-			suspend_report_result(dev->bus->suspend, error);
+			error = legacy_suspend(dev, state, dev->bus->suspend);
 		}
 	}
 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
  2009-12-17  1:49                                                   ` Rafael J. Wysocki
@ 2009-12-17 20:06                                                     ` Alan Stern
  2009-12-18  1:51                                                     ` Rafael J. Wysocki
       [not found]                                                     ` <200912180251.22655.rjw@sisk.pl>
  2 siblings, 0 replies; 98+ messages in thread
From: Alan Stern @ 2009-12-17 20:06 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Thu, 17 Dec 2009, Rafael J. Wysocki wrote:

> That actually is correct.  On the nx6325 suspend is totally dominated by disk
> spindown, almost everything else is negligible compared to it (well, except for
> the audio), so we can't go down below 1 s during suspend on this box.
> 
> On the Wind, disk spindown time is comparable with serio suspend time,
> so at least in principle we should be able to get .5 s suspend on this box - 
> if the disk spindown in async.
> 
> In turn, the resume on the Wind is dominated by disk spinup, so we can't
> go below 1.5 s on this box during resume (notice that the "async+extra"
> approach brings us close to this limit, although we could save .5 s more in
> principle by making more devices async).
> 
> Resume on the nx6325 is a different story, though, as it is dominated by USB
> and PCI devices, so marking those as async would probably bring us close to
> the limit.

The implications seem pretty clear.  If the following sorts of devices
were async:

	USB (devices and interfaces), PCI, serio, SCSI (hosts, targets,
	devices)

then we would reap close to the maximum benefit -- providing:

	async threads are started in a first pass without waiting
	for synchronous devices, and

It's not clear that making all these types of devices async will really 
work, but it's worth testing.

Alan Stern

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
  2009-12-17  1:49                                                   ` Rafael J. Wysocki
  2009-12-17 20:06                                                     ` Alan Stern
@ 2009-12-18  1:51                                                     ` Rafael J. Wysocki
       [not found]                                                     ` <200912180251.22655.rjw@sisk.pl>
  2 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-18  1:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Thursday 17 December 2009, Rafael J. Wysocki wrote:
...
> Tomorrow I'll try to mark as many devices as reasonably possible as async
> and see how the total suspend-resume times change.

I didn't manage to do that, but I was able to mark sd and i8042 as async and
see the impact of this.

The raw data are in the usual place:

http://www.sisk.pl/kernel/data/async-suspend-resume.pdf

and the individual device timings and logs are in:

http://www.sisk.pl/kernel/data/nx6325/
http://www.sisk.pl/kernel/data/wind/

This is the summary (previous results are inculded for easier reference):

			HP nx6325	MSI Wind U100

sync suspend		1482 (+/- 40)	1180 (+/- 24)
sync resume		2955 (+/- 2)	3597 (+/- 25)

async suspend		1553 (+/- 49)	1177 (+/- 32)
async resume		2692 (+/- 326)	3556 (+/- 33)

async+one-liner suspend	1600 (+/- 39)	1212 (+/- 41)
async+one-liner resume	2692 (+/- 324)	3579 (+/- 24)

async+extra suspend	1496 (+/- 37)	1217 (+/- 38)
async+extra resume	1859 (+/- 114)	1923 (+/- 35)

with "async" i8042 and sd:

async suspend		1319 (+/- 51)	1045 (+/- 41)
async resume		2929 (+/- 3)	3546 (+/- 27)

async+extra suspend	1327 (+/- 36)	(didn't work)
async+extra resume	1742 (+/- 164)	1896 (+/- 28)

(the summary is also available at: http://www.sisk.pl/kernel/data/results.txt).

So, it actually makes the case for async suspend!  Although it's not very
strong, with these two additional devices marked as "async" we get noticeable
suspend time improvement.

Still, the "extra" patch doesn't help on suspend at all and on the Wind the
suspend part of it didn't even work (I'm yet to figure out which of the two
devices crashed the suspend).  Nevertheless the resume part of the "extra"
patch worked in both cases and worked better than without the two additional
"async" devices.

To me, this means that the suspend part of the "extra" patch is not really
useful.  However, the resume part of it is _very_ useful, so I'd like to add
that part only to the async patchset.  The explanation why it helps so much
is also straightforward to me.  Namely, if slow async devices are last to
resume, then without the "extra" patch they need to wait for all of the
preceding sync devices and the speedup from executing their resume routines
asynchronously is very limited.  Now, with the "extra" patch their resume
routines start as soon as their parents complete resuming and that may be
early enough for the speedup to be significant.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912180251.22655.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                     ` <200912180251.22655.rjw@sisk.pl>
@ 2009-12-18 17:26                                                       ` Alan Stern
  2009-12-18 23:42                                                       ` Rafael J. Wysocki
  1 sibling, 0 replies; 98+ messages in thread
From: Alan Stern @ 2009-12-18 17:26 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: ACPI Devel Maling List, Linus Torvalds, LKML, pm list

On Fri, 18 Dec 2009, Rafael J. Wysocki wrote:

> I didn't manage to do that, but I was able to mark sd and i8042 as async and
> see the impact of this.

Apparently this didn't do what you wanted.  In the nx6325
sd+i8042+async+extra log, the 0:0:0:0 device (which is a SCSI disk) was
suspended by the main thread instead of an async thread.

There's an important point I neglected to mention before.  Your logs 
don't show anything for devices with no suspend callbacks at all.  
Nevertheless, these devices sit on the device list and prevent other
devices from suspending or resuming as soon as they could.

For example, the fingerprint sensor (3-1) took the most time to resume.  
But other devices were delayed until after it finished because it had
children with no callbacks, and they delayed the devices following
them in the list.

What would happen if you completed these devices immediately, as part 
of the first pass?

Alan Stern

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                     ` <200912180251.22655.rjw@sisk.pl>
  2009-12-18 17:26                                                       ` Alan Stern
@ 2009-12-18 23:42                                                       ` Rafael J. Wysocki
  1 sibling, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-18 23:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Friday 18 December 2009, Rafael J. Wysocki wrote:
> On Thursday 17 December 2009, Rafael J. Wysocki wrote:
> ...
> > Tomorrow I'll try to mark as many devices as reasonably possible as async
> > and see how the total suspend-resume times change.
> 
> I didn't manage to do that, but I was able to mark sd and i8042 as async and
> see the impact of this.
> 
> The raw data are in the usual place:
> 
> http://www.sisk.pl/kernel/data/async-suspend-resume.pdf
> 
> and the individual device timings and logs are in:
> 
> http://www.sisk.pl/kernel/data/nx6325/
> http://www.sisk.pl/kernel/data/wind/
> 
> This is the summary (previous results are inculded for easier reference):
> 
> 			HP nx6325	MSI Wind U100
> 
> sync suspend		1482 (+/- 40)	1180 (+/- 24)
> sync resume		2955 (+/- 2)	3597 (+/- 25)
> 
> async suspend		1553 (+/- 49)	1177 (+/- 32)
> async resume		2692 (+/- 326)	3556 (+/- 33)
> 
> async+one-liner suspend	1600 (+/- 39)	1212 (+/- 41)
> async+one-liner resume	2692 (+/- 324)	3579 (+/- 24)
> 
> async+extra suspend	1496 (+/- 37)	1217 (+/- 38)
> async+extra resume	1859 (+/- 114)	1923 (+/- 35)
> 
> with "async" i8042 and sd:
> 
> async suspend		1319 (+/- 51)	1045 (+/- 41)
> async resume		2929 (+/- 3)	3546 (+/- 27)
> 
> async+extra suspend	1327 (+/- 36)	(didn't work)
> async+extra resume	1742 (+/- 164)	1896 (+/- 28)
> 
> (the summary is also available at: http://www.sisk.pl/kernel/data/results.txt).
> 
> So, it actually makes the case for async suspend!  Although it's not very
> strong, with these two additional devices marked as "async" we get noticeable
> suspend time improvement.
> 
> Still, the "extra" patch doesn't help on suspend at all and on the Wind the
> suspend part of it didn't even work (I'm yet to figure out which of the two
> devices crashed the suspend).

Small update.  I've just verified that sd was the failing device, although I'm
not sure about the reason.

Apart from this, I ran some tests on the Wind with i8042 marked as "async"
and sd marked as "sync".  In that case all of the tests succeeded and I got
the following numbers:

suspend (i8042 async, full extra patch applied): 1070 (+/- 40)
resume (i8042 async, full extra patch applied): 1915,84 (+/- 27)
suspend (i8042 async, resume part of extra patch applied): 1050 (+/- 34)

First, It looks like the suspend speedup was related to marking i8042 as
"async".  Since the serio devices, which are the i8042's children, were also
"async" (just like in all of the tests before), this means that the speedup
resulted from removing a suspend stall caused by a sync parent of async
children (i8042 and serio, respectively, in this case).

However, the suspend part of the extra patch doesn't help really.  In fact it
even makes things worse.

So, I still think the resume part of the extra patch is definitely useful, but
the suspend part of it is not.  IOW, it's worth running async resumes upfront,
but it's not worth running async suspends upfront.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <20091216064025.GB2699@core.coreip.homeip.net>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                           ` <20091216064025.GB2699@core.coreip.homeip.net>
@ 2009-12-18 22:43                                             ` Rafael J. Wysocki
  2009-12-19 19:59                                               ` Dmitry Torokhov
       [not found]                                               ` <20091219195935.GB4073@core.coreip.homeip.net>
  0 siblings, 2 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-18 22:43 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: LKML, ACPI Devel Maling List, Linus Torvalds, pm list

On Wednesday 16 December 2009, Dmitry Torokhov wrote:
> On Wed, Dec 16, 2009 at 03:11:05AM +0100, Rafael J. Wysocki wrote:
> > On Tuesday 15 December 2009, Linus Torvalds wrote:
> > > 
> > > On Tue, 15 Dec 2009, Rafael J. Wysocki wrote:
> > > > > 
> > > > > Give a real example that matters.
> > > > 
> > > > I'll try.  Let -> denote child-parent relationships and assume dpm_list looks
> > > > like this:
> > > 
> > > No. 
> > > 
> > > I mean something real - something like
> > > 
> > >  - if you run on a non-PC with two USB buses behind non-PCI controllers.
> > > 
> > >  - device xyz.
> > > 
> > > > If this applies to _resume_ only, then I agree, but the Arjan's data clearly
> > > > show that serio devices take much more time to suspend than USB.
> > > 
> > > I mean in general - something where you actually have hard data that some 
> > > device really needs anythign more than my one-liner, and really _needs_ 
> > > some complex infrastructure.
> > > 
> > > Not "let's imagine a case like xyz".
> > 
> > As I said I would, I made some measurements.
> > 
> > I measured the total time of suspending and resuming devices as shown by the
> > code added by this patch:
> > http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67
> > on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they are quite
> > different and the HP was running 64-bit kernel and user space).
> > 
> > I took four cases into consideration:
> > (1) synchronous suspend and resume (/sys/power/pm_async = 0)
> > (2) asynchronous suspend and resume as introduced by the async branch at:
> >     http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async
> > (3) asynchronous suspend and resume like in (2), but with your one-liner setting
> >     the power.async_suspend flag for PCI bridges on top
> > (4) asynchronous suspend and resume like in (2), but with an extra patch that
> >     is appended on top
> > 
> > For those tests I set power.async_suspend for all USB devices, all serio input
> > devices, the ACPI battery and the USB PCI controllers (to see the impact of the
> > one-liner, if any).
> > 
> > I carried out 5 consecutive suspend-resume cycles (started from under X) on
> > each box in each case, and the raw data are here (all times in milliseconds):
> > http://www.sisk.pl/kernel/data/async-suspend.pdf
> > 
> > The summarized data are below (the "big" numbers are averages and the +/-
> > numbers are standard deviations, all in milliseconds):
> > 
> > 			HP nx6325		MSI Wind U100
> > 
> > sync suspend		1482 (+/- 40)	1180 (+/- 24)
> > sync resume		2955 (+/- 2)	3597 (+/- 25)
> > 
> > async suspend		1553 (+/- 49)	1177 (+/- 32)
> > async resume		2692 (+/- 326)	3556  (+/- 33)
> > 
> > async+one-liner suspend	1600 (+/- 39)	1212 (+/- 41)
> > async+one-liner resume	2692 (+/- 324)	3579 (+/- 24)
> > 
> > async+extra suspend	1496 (+/- 37)	1217 (+/- 38)
> > async+extra resume	1859 (+/- 114)	1923 (+/- 35)
> > 
> > So, in my opinion, with the above set of "async" devices, it doesn't
> > make sense to do async suspend at all, because the sync suspend is actually
> > the fastest on both machines.
> 
> I think the async suspend is not asynchronous enough then - what kind of
> time do you get if you simply comment out call to psmouse_reset() in
> drivers/input/mouse/psmouse-base.c:psmouse_cleanup()?  (Just for testing
> purposes only, I don't think we want to do that by default.)

The problem apparently is that the i8042 suspend/resume is synchronous.

Do you think it's safe to mark it as asynchronous?

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
  2009-12-18 22:43                                             ` Rafael J. Wysocki
@ 2009-12-19 19:59                                               ` Dmitry Torokhov
       [not found]                                               ` <20091219195935.GB4073@core.coreip.homeip.net>
  1 sibling, 0 replies; 98+ messages in thread
From: Dmitry Torokhov @ 2009-12-19 19:59 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, ACPI Devel Maling List, Linus Torvalds, pm list

On Fri, Dec 18, 2009 at 11:43:29PM +0100, Rafael J. Wysocki wrote:
> On Wednesday 16 December 2009, Dmitry Torokhov wrote:
> > On Wed, Dec 16, 2009 at 03:11:05AM +0100, Rafael J. Wysocki wrote:
> > > On Tuesday 15 December 2009, Linus Torvalds wrote:
> > > > 
> > > > On Tue, 15 Dec 2009, Rafael J. Wysocki wrote:
> > > > > > 
> > > > > > Give a real example that matters.
> > > > > 
> > > > > I'll try.  Let -> denote child-parent relationships and assume dpm_list looks
> > > > > like this:
> > > > 
> > > > No. 
> > > > 
> > > > I mean something real - something like
> > > > 
> > > >  - if you run on a non-PC with two USB buses behind non-PCI controllers.
> > > > 
> > > >  - device xyz.
> > > > 
> > > > > If this applies to _resume_ only, then I agree, but the Arjan's data clearly
> > > > > show that serio devices take much more time to suspend than USB.
> > > > 
> > > > I mean in general - something where you actually have hard data that some 
> > > > device really needs anythign more than my one-liner, and really _needs_ 
> > > > some complex infrastructure.
> > > > 
> > > > Not "let's imagine a case like xyz".
> > > 
> > > As I said I would, I made some measurements.
> > > 
> > > I measured the total time of suspending and resuming devices as shown by the
> > > code added by this patch:
> > > http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67
> > > on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they are quite
> > > different and the HP was running 64-bit kernel and user space).
> > > 
> > > I took four cases into consideration:
> > > (1) synchronous suspend and resume (/sys/power/pm_async = 0)
> > > (2) asynchronous suspend and resume as introduced by the async branch at:
> > >     http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async
> > > (3) asynchronous suspend and resume like in (2), but with your one-liner setting
> > >     the power.async_suspend flag for PCI bridges on top
> > > (4) asynchronous suspend and resume like in (2), but with an extra patch that
> > >     is appended on top
> > > 
> > > For those tests I set power.async_suspend for all USB devices, all serio input
> > > devices, the ACPI battery and the USB PCI controllers (to see the impact of the
> > > one-liner, if any).
> > > 
> > > I carried out 5 consecutive suspend-resume cycles (started from under X) on
> > > each box in each case, and the raw data are here (all times in milliseconds):
> > > http://www.sisk.pl/kernel/data/async-suspend.pdf
> > > 
> > > The summarized data are below (the "big" numbers are averages and the +/-
> > > numbers are standard deviations, all in milliseconds):
> > > 
> > > 			HP nx6325		MSI Wind U100
> > > 
> > > sync suspend		1482 (+/- 40)	1180 (+/- 24)
> > > sync resume		2955 (+/- 2)	3597 (+/- 25)
> > > 
> > > async suspend		1553 (+/- 49)	1177 (+/- 32)
> > > async resume		2692 (+/- 326)	3556  (+/- 33)
> > > 
> > > async+one-liner suspend	1600 (+/- 39)	1212 (+/- 41)
> > > async+one-liner resume	2692 (+/- 324)	3579 (+/- 24)
> > > 
> > > async+extra suspend	1496 (+/- 37)	1217 (+/- 38)
> > > async+extra resume	1859 (+/- 114)	1923 (+/- 35)
> > > 
> > > So, in my opinion, with the above set of "async" devices, it doesn't
> > > make sense to do async suspend at all, because the sync suspend is actually
> > > the fastest on both machines.
> > 
> > I think the async suspend is not asynchronous enough then - what kind of
> > time do you get if you simply comment out call to psmouse_reset() in
> > drivers/input/mouse/psmouse-base.c:psmouse_cleanup()?  (Just for testing
> > purposes only, I don't think we want to do that by default.)
> 
> The problem apparently is that the i8042 suspend/resume is synchronous.
> 
> Do you think it's safe to mark it as asynchronous?
> 

Umm.. there lie dragons. There is an implicit relationship between i8042
and PNP/ACPI devices representing keyboard and mouse ports, and I am not
sure how happy i8042 (and most importantly the BIOS) will be if they get
shut down before i8042. Also there is EC which is in theory independent
but in practice not so much.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <20091219195935.GB4073@core.coreip.homeip.net>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                               ` <20091219195935.GB4073@core.coreip.homeip.net>
@ 2009-12-19 21:33                                                 ` Rafael J. Wysocki
       [not found]                                                 ` <200912192233.44575.rjw@sisk.pl>
  1 sibling, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-19 21:33 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: LKML, ACPI Devel Maling List, Linus Torvalds, pm list

On Saturday 19 December 2009, Dmitry Torokhov wrote:
> On Fri, Dec 18, 2009 at 11:43:29PM +0100, Rafael J. Wysocki wrote:
> > On Wednesday 16 December 2009, Dmitry Torokhov wrote:
> > > On Wed, Dec 16, 2009 at 03:11:05AM +0100, Rafael J. Wysocki wrote:
> > > > On Tuesday 15 December 2009, Linus Torvalds wrote:
> > > > > 
> > > > > On Tue, 15 Dec 2009, Rafael J. Wysocki wrote:
> > > > > > > 
> > > > > > > Give a real example that matters.
> > > > > > 
> > > > > > I'll try.  Let -> denote child-parent relationships and assume dpm_list looks
> > > > > > like this:
> > > > > 
> > > > > No. 
> > > > > 
> > > > > I mean something real - something like
> > > > > 
> > > > >  - if you run on a non-PC with two USB buses behind non-PCI controllers.
> > > > > 
> > > > >  - device xyz.
> > > > > 
> > > > > > If this applies to _resume_ only, then I agree, but the Arjan's data clearly
> > > > > > show that serio devices take much more time to suspend than USB.
> > > > > 
> > > > > I mean in general - something where you actually have hard data that some 
> > > > > device really needs anythign more than my one-liner, and really _needs_ 
> > > > > some complex infrastructure.
> > > > > 
> > > > > Not "let's imagine a case like xyz".
> > > > 
> > > > As I said I would, I made some measurements.
> > > > 
> > > > I measured the total time of suspending and resuming devices as shown by the
> > > > code added by this patch:
> > > > http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67
> > > > on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they are quite
> > > > different and the HP was running 64-bit kernel and user space).
> > > > 
> > > > I took four cases into consideration:
> > > > (1) synchronous suspend and resume (/sys/power/pm_async = 0)
> > > > (2) asynchronous suspend and resume as introduced by the async branch at:
> > > >     http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async
> > > > (3) asynchronous suspend and resume like in (2), but with your one-liner setting
> > > >     the power.async_suspend flag for PCI bridges on top
> > > > (4) asynchronous suspend and resume like in (2), but with an extra patch that
> > > >     is appended on top
> > > > 
> > > > For those tests I set power.async_suspend for all USB devices, all serio input
> > > > devices, the ACPI battery and the USB PCI controllers (to see the impact of the
> > > > one-liner, if any).
> > > > 
> > > > I carried out 5 consecutive suspend-resume cycles (started from under X) on
> > > > each box in each case, and the raw data are here (all times in milliseconds):
> > > > http://www.sisk.pl/kernel/data/async-suspend.pdf
> > > > 
> > > > The summarized data are below (the "big" numbers are averages and the +/-
> > > > numbers are standard deviations, all in milliseconds):
> > > > 
> > > > 			HP nx6325		MSI Wind U100
> > > > 
> > > > sync suspend		1482 (+/- 40)	1180 (+/- 24)
> > > > sync resume		2955 (+/- 2)	3597 (+/- 25)
> > > > 
> > > > async suspend		1553 (+/- 49)	1177 (+/- 32)
> > > > async resume		2692 (+/- 326)	3556  (+/- 33)
> > > > 
> > > > async+one-liner suspend	1600 (+/- 39)	1212 (+/- 41)
> > > > async+one-liner resume	2692 (+/- 324)	3579 (+/- 24)
> > > > 
> > > > async+extra suspend	1496 (+/- 37)	1217 (+/- 38)
> > > > async+extra resume	1859 (+/- 114)	1923 (+/- 35)
> > > > 
> > > > So, in my opinion, with the above set of "async" devices, it doesn't
> > > > make sense to do async suspend at all, because the sync suspend is actually
> > > > the fastest on both machines.
> > > 
> > > I think the async suspend is not asynchronous enough then - what kind of
> > > time do you get if you simply comment out call to psmouse_reset() in
> > > drivers/input/mouse/psmouse-base.c:psmouse_cleanup()?  (Just for testing
> > > purposes only, I don't think we want to do that by default.)
> > 
> > The problem apparently is that the i8042 suspend/resume is synchronous.
> > 
> > Do you think it's safe to mark it as asynchronous?
> > 
> 
> Umm.. there lie dragons. There is an implicit relationship between i8042
> and PNP/ACPI devices representing keyboard and mouse ports, and I am not
> sure how happy i8042 (and most importantly the BIOS) will be if they get
> shut down before i8042. Also there is EC which is in theory independent
> but in practice not so much.

I see.

Is this possible to identify ACPI devices that should wait for the i8042
suspend and that should be waited for by it on resume?

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912192233.44575.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                 ` <200912192233.44575.rjw@sisk.pl>
@ 2009-12-19 22:29                                                   ` Rafael J. Wysocki
       [not found]                                                   ` <200912192329.03251.rjw@sisk.pl>
                                                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-19 22:29 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: LKML, ACPI Devel Maling List, Linus Torvalds, pm list

On Saturday 19 December 2009, Rafael J. Wysocki wrote:
> On Saturday 19 December 2009, Dmitry Torokhov wrote:
> > On Fri, Dec 18, 2009 at 11:43:29PM +0100, Rafael J. Wysocki wrote:
> > > On Wednesday 16 December 2009, Dmitry Torokhov wrote:
> > > > On Wed, Dec 16, 2009 at 03:11:05AM +0100, Rafael J. Wysocki wrote:
> > > > > On Tuesday 15 December 2009, Linus Torvalds wrote:
> > > > > > 
> > > > > > On Tue, 15 Dec 2009, Rafael J. Wysocki wrote:
> > > > > > > > 
> > > > > > > > Give a real example that matters.
> > > > > > > 
> > > > > > > I'll try.  Let -> denote child-parent relationships and assume dpm_list looks
> > > > > > > like this:
> > > > > > 
> > > > > > No. 
> > > > > > 
> > > > > > I mean something real - something like
> > > > > > 
> > > > > >  - if you run on a non-PC with two USB buses behind non-PCI controllers.
> > > > > > 
> > > > > >  - device xyz.
> > > > > > 
> > > > > > > If this applies to _resume_ only, then I agree, but the Arjan's data clearly
> > > > > > > show that serio devices take much more time to suspend than USB.
> > > > > > 
> > > > > > I mean in general - something where you actually have hard data that some 
> > > > > > device really needs anythign more than my one-liner, and really _needs_ 
> > > > > > some complex infrastructure.
> > > > > > 
> > > > > > Not "let's imagine a case like xyz".
> > > > > 
> > > > > As I said I would, I made some measurements.
> > > > > 
> > > > > I measured the total time of suspending and resuming devices as shown by the
> > > > > code added by this patch:
> > > > > http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67
> > > > > on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they are quite
> > > > > different and the HP was running 64-bit kernel and user space).
> > > > > 
> > > > > I took four cases into consideration:
> > > > > (1) synchronous suspend and resume (/sys/power/pm_async = 0)
> > > > > (2) asynchronous suspend and resume as introduced by the async branch at:
> > > > >     http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async
> > > > > (3) asynchronous suspend and resume like in (2), but with your one-liner setting
> > > > >     the power.async_suspend flag for PCI bridges on top
> > > > > (4) asynchronous suspend and resume like in (2), but with an extra patch that
> > > > >     is appended on top
> > > > > 
> > > > > For those tests I set power.async_suspend for all USB devices, all serio input
> > > > > devices, the ACPI battery and the USB PCI controllers (to see the impact of the
> > > > > one-liner, if any).
> > > > > 
> > > > > I carried out 5 consecutive suspend-resume cycles (started from under X) on
> > > > > each box in each case, and the raw data are here (all times in milliseconds):
> > > > > http://www.sisk.pl/kernel/data/async-suspend.pdf
> > > > > 
> > > > > The summarized data are below (the "big" numbers are averages and the +/-
> > > > > numbers are standard deviations, all in milliseconds):
> > > > > 
> > > > > 			HP nx6325		MSI Wind U100
> > > > > 
> > > > > sync suspend		1482 (+/- 40)	1180 (+/- 24)
> > > > > sync resume		2955 (+/- 2)	3597 (+/- 25)
> > > > > 
> > > > > async suspend		1553 (+/- 49)	1177 (+/- 32)
> > > > > async resume		2692 (+/- 326)	3556  (+/- 33)
> > > > > 
> > > > > async+one-liner suspend	1600 (+/- 39)	1212 (+/- 41)
> > > > > async+one-liner resume	2692 (+/- 324)	3579 (+/- 24)
> > > > > 
> > > > > async+extra suspend	1496 (+/- 37)	1217 (+/- 38)
> > > > > async+extra resume	1859 (+/- 114)	1923 (+/- 35)
> > > > > 
> > > > > So, in my opinion, with the above set of "async" devices, it doesn't
> > > > > make sense to do async suspend at all, because the sync suspend is actually
> > > > > the fastest on both machines.
> > > > 
> > > > I think the async suspend is not asynchronous enough then - what kind of
> > > > time do you get if you simply comment out call to psmouse_reset() in
> > > > drivers/input/mouse/psmouse-base.c:psmouse_cleanup()?  (Just for testing
> > > > purposes only, I don't think we want to do that by default.)
> > > 
> > > The problem apparently is that the i8042 suspend/resume is synchronous.
> > > 
> > > Do you think it's safe to mark it as asynchronous?
> > > 
> > 
> > Umm.. there lie dragons. There is an implicit relationship between i8042
> > and PNP/ACPI devices representing keyboard and mouse ports, and I am not
> > sure how happy i8042 (and most importantly the BIOS) will be if they get
> > shut down before i8042. Also there is EC which is in theory independent
> > but in practice not so much.
> 
> I see.
> 
> Is this possible to identify ACPI devices that should wait for the i8042
> suspend and that should be waited for by it on resume?

Wait, if you look at the logs at

http://www.sisk.pl/kernel/data/nx6325/
http://www.sisk.pl/kernel/data/wind/

you'll see that the i8042 suspend is called before any ACPI devices are
suspended anyway.  In fact, it is suspended right after its serio children
which is very early in the suspend sequence.

So, it seems, if there were any problems with i8042 vs ACPI, we'd experience
them anyway.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912192329.03251.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                   ` <200912192329.03251.rjw@sisk.pl>
@ 2009-12-19 22:43                                                     ` Dmitry Torokhov
  0 siblings, 0 replies; 98+ messages in thread
From: Dmitry Torokhov @ 2009-12-19 22:43 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, ACPI Devel Maling List, Linus Torvalds, pm list

On Dec 19, 2009, at 2:29 PM, "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Saturday 19 December 2009, Rafael J. Wysocki wrote:
>> On Saturday 19 December 2009, Dmitry Torokhov wrote:
>>> On Fri, Dec 18, 2009 at 11:43:29PM +0100, Rafael J. Wysocki wrote:
>>>> On Wednesday 16 December 2009, Dmitry Torokhov wrote:
>>>>> On Wed, Dec 16, 2009 at 03:11:05AM +0100, Rafael J. Wysocki wrote:
>>>>>> On Tuesday 15 December 2009, Linus Torvalds wrote:
>>>>>>>
>>>>>>> On Tue, 15 Dec 2009, Rafael J. Wysocki wrote:
>>>>>>>>>
>>>>>>>>> Give a real example that matters.
>>>>>>>>
>>>>>>>> I'll try.  Let -> denote child-parent relationships and  
>>>>>>>> assume dpm_list looks
>>>>>>>> like this:
>>>>>>>
>>>>>>> No.
>>>>>>>
>>>>>>> I mean something real - something like
>>>>>>>
>>>>>>> - if you run on a non-PC with two USB buses behind non-PCI  
>>>>>>> controllers.
>>>>>>>
>>>>>>> - device xyz.
>>>>>>>
>>>>>>>> If this applies to _resume_ only, then I agree, but the  
>>>>>>>> Arjan's data clearly
>>>>>>>> show that serio devices take much more time to suspend than  
>>>>>>>> USB.
>>>>>>>
>>>>>>> I mean in general - something where you actually have hard  
>>>>>>> data that some
>>>>>>> device really needs anythign more than my one-liner, and  
>>>>>>> really _needs_
>>>>>>> some complex infrastructure.
>>>>>>>
>>>>>>> Not "let's imagine a case like xyz".
>>>>>>
>>>>>> As I said I would, I made some measurements.
>>>>>>
>>>>>> I measured the total time of suspending and resuming devices as  
>>>>>> shown by the
>>>>>> code added by this patch:
>>>>>> http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67
>>>>>> on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they  
>>>>>> are quite
>>>>>> different and the HP was running 64-bit kernel and user space).
>>>>>>
>>>>>> I took four cases into consideration:
>>>>>> (1) synchronous suspend and resume (/sys/power/pm_async = 0)
>>>>>> (2) asynchronous suspend and resume as introduced by the async  
>>>>>> branch at:
>>>>>>    http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async
>>>>>> (3) asynchronous suspend and resume like in (2), but with your  
>>>>>> one-liner setting
>>>>>>    the power.async_suspend flag for PCI bridges on top
>>>>>> (4) asynchronous suspend and resume like in (2), but with an  
>>>>>> extra patch that
>>>>>>    is appended on top
>>>>>>
>>>>>> For those tests I set power.async_suspend for all USB devices,  
>>>>>> all serio input
>>>>>> devices, the ACPI battery and the USB PCI controllers (to see  
>>>>>> the impact of the
>>>>>> one-liner, if any).
>>>>>>
>>>>>> I carried out 5 consecutive suspend-resume cycles (started from  
>>>>>> under X) on
>>>>>> each box in each case, and the raw data are here (all times in  
>>>>>> milliseconds):
>>>>>> http://www.sisk.pl/kernel/data/async-suspend.pdf
>>>>>>
>>>>>> The summarized data are below (the "big" numbers are averages  
>>>>>> and the +/-
>>>>>> numbers are standard deviations, all in milliseconds):
>>>>>>
>>>>>>            HP nx6325        MSI Wind U100
>>>>>>
>>>>>> sync suspend        1482 (+/- 40)    1180 (+/- 24)
>>>>>> sync resume        2955 (+/- 2)    3597 (+/- 25)
>>>>>>
>>>>>> async suspend        1553 (+/- 49)    1177 (+/- 32)
>>>>>> async resume        2692 (+/- 326)    3556  (+/- 33)
>>>>>>
>>>>>> async+one-liner suspend    1600 (+/- 39)    1212 (+/- 41)
>>>>>> async+one-liner resume    2692 (+/- 324)    3579 (+/- 24)
>>>>>>
>>>>>> async+extra suspend    1496 (+/- 37)    1217 (+/- 38)
>>>>>> async+extra resume    1859 (+/- 114)    1923 (+/- 35)
>>>>>>
>>>>>> So, in my opinion, with the above set of "async" devices, it  
>>>>>> doesn't
>>>>>> make sense to do async suspend at all, because the sync suspend  
>>>>>> is actually
>>>>>> the fastest on both machines.
>>>>>
>>>>> I think the async suspend is not asynchronous enough then - what  
>>>>> kind of
>>>>> time do you get if you simply comment out call to psmouse_reset 
>>>>> () in
>>>>> drivers/input/mouse/psmouse-base.c:psmouse_cleanup()?  (Just for  
>>>>> testing
>>>>> purposes only, I don't think we want to do that by default.)
>>>>
>>>> The problem apparently is that the i8042 suspend/resume is  
>>>> synchronous.
>>>>
>>>> Do you think it's safe to mark it as asynchronous?
>>>>
>>>
>>> Umm.. there lie dragons. There is an implicit relationship between  
>>> i8042
>>> and PNP/ACPI devices representing keyboard and mouse ports, and I  
>>> am not
>>> sure how happy i8042 (and most importantly the BIOS) will be if  
>>> they get
>>> shut down before i8042. Also there is EC which is in theory  
>>> independent
>>> but in practice not so much.
>>
>> I see.
>>
>> Is this possible to identify ACPI devices that should wait for the  
>> i8042
>> suspend and that should be waited for by it on resume?
>
> Wait, if you look at the logs at
>
> http://www.sisk.pl/kernel/data/nx6325/
> http://www.sisk.pl/kernel/data/wind/
>
> you'll see that the i8042 suspend is called before any ACPI devices  
> are
> suspended anyway.  In fact, it is suspended right after its serio  
> children
> which is very early in the suspend sequence.

Right, and we do want to "suspend" i8042 (well, reset to the initial  
state we found it at bootup) before touching ACPI.

If i8042 is async, given the fact that psmouse reset takes a long  
time, it is possible that we start suspending PNP before we are done  
with i8042.

-- 
>

Dmitry

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                 ` <200912192233.44575.rjw@sisk.pl>
  2009-12-19 22:29                                                   ` Rafael J. Wysocki
       [not found]                                                   ` <200912192329.03251.rjw@sisk.pl>
@ 2009-12-19 22:47                                                   ` Dmitry Torokhov
       [not found]                                                   ` <A37A0A6F-3662-40C9-BE1F-B9F6A38CD80B@gmail.com>
  3 siblings, 0 replies; 98+ messages in thread
From: Dmitry Torokhov @ 2009-12-19 22:47 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, ACPI Devel Maling List, Linus Torvalds, pm list

On Dec 19, 2009, at 1:33 PM, "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Saturday 19 December 2009, Dmitry Torokhov wrote:
>> On Fri, Dec 18, 2009 at 11:43:29PM +0100, Rafael J. Wysocki wrote:
>>> On Wednesday 16 December 2009, Dmitry Torokhov wrote:
>>>> On Wed, Dec 16, 2009 at 03:11:05AM +0100, Rafael J. Wysocki wrote:
>>>>> On Tuesday 15 December 2009, Linus Torvalds wrote:
>>>>>>
>>>>>> On Tue, 15 Dec 2009, Rafael J. Wysocki wrote:
>>>>>>>>
>>>>>>>> Give a real example that matters.
>>>>>>>
>>>>>>> I'll try.  Let -> denote child-parent relationships and assume  
>>>>>>> dpm_list looks
>>>>>>> like this:
>>>>>>
>>>>>> No.
>>>>>>
>>>>>> I mean something real - something like
>>>>>>
>>>>>> - if you run on a non-PC with two USB buses behind non-PCI  
>>>>>> controllers.
>>>>>>
>>>>>> - device xyz.
>>>>>>
>>>>>>> If this applies to _resume_ only, then I agree, but the  
>>>>>>> Arjan's data clearly
>>>>>>> show that serio devices take much more time to suspend than USB.
>>>>>>
>>>>>> I mean in general - something where you actually have hard data  
>>>>>> that some
>>>>>> device really needs anythign more than my one-liner, and really  
>>>>>> _needs_
>>>>>> some complex infrastructure.
>>>>>>
>>>>>> Not "let's imagine a case like xyz".
>>>>>
>>>>> As I said I would, I made some measurements.
>>>>>
>>>>> I measured the total time of suspending and resuming devices as  
>>>>> shown by the
>>>>> code added by this patch:
>>>>> http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67
>>>>> on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they  
>>>>> are quite
>>>>> different and the HP was running 64-bit kernel and user space).
>>>>>
>>>>> I took four cases into consideration:
>>>>> (1) synchronous suspend and resume (/sys/power/pm_async = 0)
>>>>> (2) asynchronous suspend and resume as introduced by the async  
>>>>> branch at:
>>>>>    http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async
>>>>> (3) asynchronous suspend and resume like in (2), but with your  
>>>>> one-liner setting
>>>>>    the power.async_suspend flag for PCI bridges on top
>>>>> (4) asynchronous suspend and resume like in (2), but with an  
>>>>> extra patch that
>>>>>    is appended on top
>>>>>
>>>>> For those tests I set power.async_suspend for all USB devices,  
>>>>> all serio input
>>>>> devices, the ACPI battery and the USB PCI controllers (to see  
>>>>> the impact of the
>>>>> one-liner, if any).
>>>>>
>>>>> I carried out 5 consecutive suspend-resume cycles (started from  
>>>>> under X) on
>>>>> each box in each case, and the raw data are here (all times in  
>>>>> milliseconds):
>>>>> http://www.sisk.pl/kernel/data/async-suspend.pdf
>>>>>
>>>>> The summarized data are below (the "big" numbers are averages  
>>>>> and the +/-
>>>>> numbers are standard deviations, all in milliseconds):
>>>>>
>>>>>            HP nx6325        MSI Wind U100
>>>>>
>>>>> sync suspend        1482 (+/- 40)    1180 (+/- 24)
>>>>> sync resume        2955 (+/- 2)    3597 (+/- 25)
>>>>>
>>>>> async suspend        1553 (+/- 49)    1177 (+/- 32)
>>>>> async resume        2692 (+/- 326)    3556  (+/- 33)
>>>>>
>>>>> async+one-liner suspend    1600 (+/- 39)    1212 (+/- 41)
>>>>> async+one-liner resume    2692 (+/- 324)    3579 (+/- 24)
>>>>>
>>>>> async+extra suspend    1496 (+/- 37)    1217 (+/- 38)
>>>>> async+extra resume    1859 (+/- 114)    1923 (+/- 35)
>>>>>
>>>>> So, in my opinion, with the above set of "async" devices, it  
>>>>> doesn't
>>>>> make sense to do async suspend at all, because the sync suspend  
>>>>> is actually
>>>>> the fastest on both machines.
>>>>
>>>> I think the async suspend is not asynchronous enough then - what  
>>>> kind of
>>>> time do you get if you simply comment out call to psmouse_reset()  
>>>> in
>>>> drivers/input/mouse/psmouse-base.c:psmouse_cleanup()?  (Just for  
>>>> testing
>>>> purposes only, I don't think we want to do that by default.)
>>>
>>> The problem apparently is that the i8042 suspend/resume is  
>>> synchronous.
>>>
>>> Do you think it's safe to mark it as asynchronous?
>>>
>>
>> Umm.. there lie dragons. There is an implicit relationship between  
>> i8042
>> and PNP/ACPI devices representing keyboard and mouse ports, and I  
>> am not
>> sure how happy i8042 (and most importantly the BIOS) will be if  
>> they get
>> shut down before i8042. Also there is EC which is in theory  
>> independent
>> but in practice not so much.
>
> I see.
>
> Is this possible to identify ACPI devices that should wait for the  
> i8042
> suspend and that should be waited for by it on resume?

We could try to add some dependencies while discovering PNP to get KBC  
addresses in i8042 but we need tomake sure we do it even in presence  
of i8042.nopnp.

-- 
Dmitry
  

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <A37A0A6F-3662-40C9-BE1F-B9F6A38CD80B@gmail.com>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                   ` <A37A0A6F-3662-40C9-BE1F-B9F6A38CD80B@gmail.com>
@ 2009-12-19 23:10                                                     ` Rafael J. Wysocki
       [not found]                                                     ` <200912200010.19899.rjw@sisk.pl>
  1 sibling, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-19 23:10 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: LKML, ACPI Devel Maling List, Linus Torvalds, pm list

On Saturday 19 December 2009, Dmitry Torokhov wrote:
> On Dec 19, 2009, at 1:33 PM, "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > On Saturday 19 December 2009, Dmitry Torokhov wrote:
> >> On Fri, Dec 18, 2009 at 11:43:29PM +0100, Rafael J. Wysocki wrote:
> >>> On Wednesday 16 December 2009, Dmitry Torokhov wrote:
> >>>> On Wed, Dec 16, 2009 at 03:11:05AM +0100, Rafael J. Wysocki wrote:
> >>>>> On Tuesday 15 December 2009, Linus Torvalds wrote:
> >>>>>>
> >>>>>> On Tue, 15 Dec 2009, Rafael J. Wysocki wrote:
> >>>>>>>>
> >>>>>>>> Give a real example that matters.
> >>>>>>>
> >>>>>>> I'll try.  Let -> denote child-parent relationships and assume  
> >>>>>>> dpm_list looks
> >>>>>>> like this:
> >>>>>>
> >>>>>> No.
> >>>>>>
> >>>>>> I mean something real - something like
> >>>>>>
> >>>>>> - if you run on a non-PC with two USB buses behind non-PCI  
> >>>>>> controllers.
> >>>>>>
> >>>>>> - device xyz.
> >>>>>>
> >>>>>>> If this applies to _resume_ only, then I agree, but the  
> >>>>>>> Arjan's data clearly
> >>>>>>> show that serio devices take much more time to suspend than USB.
> >>>>>>
> >>>>>> I mean in general - something where you actually have hard data  
> >>>>>> that some
> >>>>>> device really needs anythign more than my one-liner, and really  
> >>>>>> _needs_
> >>>>>> some complex infrastructure.
> >>>>>>
> >>>>>> Not "let's imagine a case like xyz".
> >>>>>
> >>>>> As I said I would, I made some measurements.
> >>>>>
> >>>>> I measured the total time of suspending and resuming devices as  
> >>>>> shown by the
> >>>>> code added by this patch:
> >>>>> http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67
> >>>>> on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they  
> >>>>> are quite
> >>>>> different and the HP was running 64-bit kernel and user space).
> >>>>>
> >>>>> I took four cases into consideration:
> >>>>> (1) synchronous suspend and resume (/sys/power/pm_async = 0)
> >>>>> (2) asynchronous suspend and resume as introduced by the async  
> >>>>> branch at:
> >>>>>    http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async
> >>>>> (3) asynchronous suspend and resume like in (2), but with your  
> >>>>> one-liner setting
> >>>>>    the power.async_suspend flag for PCI bridges on top
> >>>>> (4) asynchronous suspend and resume like in (2), but with an  
> >>>>> extra patch that
> >>>>>    is appended on top
> >>>>>
> >>>>> For those tests I set power.async_suspend for all USB devices,  
> >>>>> all serio input
> >>>>> devices, the ACPI battery and the USB PCI controllers (to see  
> >>>>> the impact of the
> >>>>> one-liner, if any).
> >>>>>
> >>>>> I carried out 5 consecutive suspend-resume cycles (started from  
> >>>>> under X) on
> >>>>> each box in each case, and the raw data are here (all times in  
> >>>>> milliseconds):
> >>>>> http://www.sisk.pl/kernel/data/async-suspend.pdf
> >>>>>
> >>>>> The summarized data are below (the "big" numbers are averages  
> >>>>> and the +/-
> >>>>> numbers are standard deviations, all in milliseconds):
> >>>>>
> >>>>>            HP nx6325        MSI Wind U100
> >>>>>
> >>>>> sync suspend        1482 (+/- 40)    1180 (+/- 24)
> >>>>> sync resume        2955 (+/- 2)    3597 (+/- 25)
> >>>>>
> >>>>> async suspend        1553 (+/- 49)    1177 (+/- 32)
> >>>>> async resume        2692 (+/- 326)    3556  (+/- 33)
> >>>>>
> >>>>> async+one-liner suspend    1600 (+/- 39)    1212 (+/- 41)
> >>>>> async+one-liner resume    2692 (+/- 324)    3579 (+/- 24)
> >>>>>
> >>>>> async+extra suspend    1496 (+/- 37)    1217 (+/- 38)
> >>>>> async+extra resume    1859 (+/- 114)    1923 (+/- 35)
> >>>>>
> >>>>> So, in my opinion, with the above set of "async" devices, it  
> >>>>> doesn't
> >>>>> make sense to do async suspend at all, because the sync suspend  
> >>>>> is actually
> >>>>> the fastest on both machines.
> >>>>
> >>>> I think the async suspend is not asynchronous enough then - what  
> >>>> kind of
> >>>> time do you get if you simply comment out call to psmouse_reset()  
> >>>> in
> >>>> drivers/input/mouse/psmouse-base.c:psmouse_cleanup()?  (Just for  
> >>>> testing
> >>>> purposes only, I don't think we want to do that by default.)
> >>>
> >>> The problem apparently is that the i8042 suspend/resume is  
> >>> synchronous.
> >>>
> >>> Do you think it's safe to mark it as asynchronous?
> >>>
> >>
> >> Umm.. there lie dragons. There is an implicit relationship between  
> >> i8042
> >> and PNP/ACPI devices representing keyboard and mouse ports, and I  
> >> am not
> >> sure how happy i8042 (and most importantly the BIOS) will be if  
> >> they get
> >> shut down before i8042. Also there is EC which is in theory  
> >> independent
> >> but in practice not so much.
> >
> > I see.
> >
> > Is this possible to identify ACPI devices that should wait for the  
> > i8042
> > suspend and that should be waited for by it on resume?
> 
> We could try to add some dependencies while discovering PNP to get KBC  
> addresses in i8042 but we need tomake sure we do it even in presence  
> of i8042.nopnp.

Well, I guess this is the example of the off-tree dependencies that actually
matter Linus wanted. :-)

I guess there are quite a few devices that can depend on the i8042 in
principle, is this correct?

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912200010.19899.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                     ` <200912200010.19899.rjw@sisk.pl>
@ 2009-12-19 23:22                                                       ` Dmitry Torokhov
  2009-12-19 23:23                                                       ` Linus Torvalds
                                                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 98+ messages in thread
From: Dmitry Torokhov @ 2009-12-19 23:22 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, ACPI Devel Maling List, Linus Torvalds, pm list

On Dec 19, 2009, at 3:10 PM, "Rafael J. Wysocki" <rjw@sisk.pl> wrote:

> On Saturday 19 December 2009, Dmitry Torokhov wrote:
>> On Dec 19, 2009, at 1:33 PM, "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
>>
>>> On Saturday 19 December 2009, Dmitry Torokhov wrote:
>>>> On Fri, Dec 18, 2009 at 11:43:29PM +0100, Rafael J. Wysocki wrote:
>>>>> On Wednesday 16 December 2009, Dmitry Torokhov wrote:
>>>>>> On Wed, Dec 16, 2009 at 03:11:05AM +0100, Rafael J. Wysocki  
>>>>>> wrote:
>>>>>>> On Tuesday 15 December 2009, Linus Torvalds wrote:
>>>>>>>>
>>>>>>>> On Tue, 15 Dec 2009, Rafael J. Wysocki wrote:
>>>>>>>>>>
>>>>>>>>>> Give a real example that matters.
>>>>>>>>>
>>>>>>>>> I'll try.  Let -> denote child-parent relationships and assume
>>>>>>>>> dpm_list looks
>>>>>>>>> like this:
>>>>>>>>
>>>>>>>> No.
>>>>>>>>
>>>>>>>> I mean something real - something like
>>>>>>>>
>>>>>>>> - if you run on a non-PC with two USB buses behind non-PCI
>>>>>>>> controllers.
>>>>>>>>
>>>>>>>> - device xyz.
>>>>>>>>
>>>>>>>>> If this applies to _resume_ only, then I agree, but the
>>>>>>>>> Arjan's data clearly
>>>>>>>>> show that serio devices take much more time to suspend than  
>>>>>>>>> USB.
>>>>>>>>
>>>>>>>> I mean in general - something where you actually have hard data
>>>>>>>> that some
>>>>>>>> device really needs anythign more than my one-liner, and really
>>>>>>>> _needs_
>>>>>>>> some complex infrastructure.
>>>>>>>>
>>>>>>>> Not "let's imagine a case like xyz".
>>>>>>>
>>>>>>> As I said I would, I made some measurements.
>>>>>>>
>>>>>>> I measured the total time of suspending and resuming devices as
>>>>>>> shown by the
>>>>>>> code added by this patch:
>>>>>>> http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67
>>>>>>> on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they
>>>>>>> are quite
>>>>>>> different and the HP was running 64-bit kernel and user space).
>>>>>>>
>>>>>>> I took four cases into consideration:
>>>>>>> (1) synchronous suspend and resume (/sys/power/pm_async = 0)
>>>>>>> (2) asynchronous suspend and resume as introduced by the async
>>>>>>> branch at:
>>>>>>>   http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async
>>>>>>> (3) asynchronous suspend and resume like in (2), but with your
>>>>>>> one-liner setting
>>>>>>>   the power.async_suspend flag for PCI bridges on top
>>>>>>> (4) asynchronous suspend and resume like in (2), but with an
>>>>>>> extra patch that
>>>>>>>   is appended on top
>>>>>>>
>>>>>>> For those tests I set power.async_suspend for all USB devices,
>>>>>>> all serio input
>>>>>>> devices, the ACPI battery and the USB PCI controllers (to see
>>>>>>> the impact of the
>>>>>>> one-liner, if any).
>>>>>>>
>>>>>>> I carried out 5 consecutive suspend-resume cycles (started from
>>>>>>> under X) on
>>>>>>> each box in each case, and the raw data are here (all times in
>>>>>>> milliseconds):
>>>>>>> http://www.sisk.pl/kernel/data/async-suspend.pdf
>>>>>>>
>>>>>>> The summarized data are below (the "big" numbers are averages
>>>>>>> and the +/-
>>>>>>> numbers are standard deviations, all in milliseconds):
>>>>>>>
>>>>>>>           HP nx6325        MSI Wind U100
>>>>>>>
>>>>>>> sync suspend        1482 (+/- 40)    1180 (+/- 24)
>>>>>>> sync resume        2955 (+/- 2)    3597 (+/- 25)
>>>>>>>
>>>>>>> async suspend        1553 (+/- 49)    1177 (+/- 32)
>>>>>>> async resume        2692 (+/- 326)    3556  (+/- 33)
>>>>>>>
>>>>>>> async+one-liner suspend    1600 (+/- 39)    1212 (+/- 41)
>>>>>>> async+one-liner resume    2692 (+/- 324)    3579 (+/- 24)
>>>>>>>
>>>>>>> async+extra suspend    1496 (+/- 37)    1217 (+/- 38)
>>>>>>> async+extra resume    1859 (+/- 114)    1923 (+/- 35)
>>>>>>>
>>>>>>> So, in my opinion, with the above set of "async" devices, it
>>>>>>> doesn't
>>>>>>> make sense to do async suspend at all, because the sync suspend
>>>>>>> is actually
>>>>>>> the fastest on both machines.
>>>>>>
>>>>>> I think the async suspend is not asynchronous enough then - what
>>>>>> kind of
>>>>>> time do you get if you simply comment out call to psmouse_reset()
>>>>>> in
>>>>>> drivers/input/mouse/psmouse-base.c:psmouse_cleanup()?  (Just for
>>>>>> testing
>>>>>> purposes only, I don't think we want to do that by default.)
>>>>>
>>>>> The problem apparently is that the i8042 suspend/resume is
>>>>> synchronous.
>>>>>
>>>>> Do you think it's safe to mark it as asynchronous?
>>>>>
>>>>
>>>> Umm.. there lie dragons. There is an implicit relationship between
>>>> i8042
>>>> and PNP/ACPI devices representing keyboard and mouse ports, and I
>>>> am not
>>>> sure how happy i8042 (and most importantly the BIOS) will be if
>>>> they get
>>>> shut down before i8042. Also there is EC which is in theory
>>>> independent
>>>> but in practice not so much.
>>>
>>> I see.
>>>
>>> Is this possible to identify ACPI devices that should wait for the
>>> i8042
>>> suspend and that should be waited for by it on resume?
>>
>> We could try to add some dependencies while discovering PNP to get  
>> KBC
>> addresses in i8042 but we need tomake sure we do it even in presence
>> of i8042.nopnp.
>
> Well, I guess this is the example of the off-tree dependencies that  
> actually
> matter Linus wanted. :-)
>
> I guess there are quite a few devices that can depend on the i8042 in
> principle, is this correct?

The devices that depend on i8042 are serio ports that are it's  
children. I8042 itself may have indirect dependency on a couple of PNP  
devices.

>
I hope this answers your question...

-- 
Dmitry 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                     ` <200912200010.19899.rjw@sisk.pl>
  2009-12-19 23:22                                                       ` Dmitry Torokhov
@ 2009-12-19 23:23                                                       ` Linus Torvalds
       [not found]                                                       ` <43A402BB-6AB3-4127-A441-D53EDE09F22E@gmail.com>
       [not found]                                                       ` <alpine.LFD.2.00.0912191521180.3712@localhost.localdomain>
  3 siblings, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-19 23:23 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Dmitry Torokhov, LKML, ACPI Devel Maling List, pm list



On Sun, 20 Dec 2009, Rafael J. Wysocki wrote:
> 
> Well, I guess this is the example of the off-tree dependencies that actually
> matter Linus wanted. :-)

It's also the kind of dependency where I say "if we get into these kinds 
of messes, then the whole async crap isn't worth it".

Really. Having to try to match things up with ACPI and PnP is a nightmare. 
Especially since I doubt Windows does anything like this, which means that 
there's no reason for BIOS vendors to do the tables so that we'd even 
know.

			Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <43A402BB-6AB3-4127-A441-D53EDE09F22E@gmail.com>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                       ` <43A402BB-6AB3-4127-A441-D53EDE09F22E@gmail.com>
@ 2009-12-19 23:33                                                         ` Rafael J. Wysocki
  0 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-19 23:33 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: LKML, ACPI Devel Maling List, Linus Torvalds, pm list

On Sunday 20 December 2009, Dmitry Torokhov wrote:
> On Dec 19, 2009, at 3:10 PM, "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > On Saturday 19 December 2009, Dmitry Torokhov wrote:
> >> On Dec 19, 2009, at 1:33 PM, "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> >>
> >>> On Saturday 19 December 2009, Dmitry Torokhov wrote:
> >>>> On Fri, Dec 18, 2009 at 11:43:29PM +0100, Rafael J. Wysocki wrote:
> >>>>> On Wednesday 16 December 2009, Dmitry Torokhov wrote:
> >>>>>> On Wed, Dec 16, 2009 at 03:11:05AM +0100, Rafael J. Wysocki  
> >>>>>> wrote:
> >>>>>>> On Tuesday 15 December 2009, Linus Torvalds wrote:
> >>>>>>>>
> >>>>>>>> On Tue, 15 Dec 2009, Rafael J. Wysocki wrote:
> >>>>>>>>>>
> >>>>>>>>>> Give a real example that matters.
> >>>>>>>>>
> >>>>>>>>> I'll try.  Let -> denote child-parent relationships and assume
> >>>>>>>>> dpm_list looks
> >>>>>>>>> like this:
> >>>>>>>>
> >>>>>>>> No.
> >>>>>>>>
> >>>>>>>> I mean something real - something like
> >>>>>>>>
> >>>>>>>> - if you run on a non-PC with two USB buses behind non-PCI
> >>>>>>>> controllers.
> >>>>>>>>
> >>>>>>>> - device xyz.
> >>>>>>>>
> >>>>>>>>> If this applies to _resume_ only, then I agree, but the
> >>>>>>>>> Arjan's data clearly
> >>>>>>>>> show that serio devices take much more time to suspend than  
> >>>>>>>>> USB.
> >>>>>>>>
> >>>>>>>> I mean in general - something where you actually have hard data
> >>>>>>>> that some
> >>>>>>>> device really needs anythign more than my one-liner, and really
> >>>>>>>> _needs_
> >>>>>>>> some complex infrastructure.
> >>>>>>>>
> >>>>>>>> Not "let's imagine a case like xyz".
> >>>>>>>
> >>>>>>> As I said I would, I made some measurements.
> >>>>>>>
> >>>>>>> I measured the total time of suspending and resuming devices as
> >>>>>>> shown by the
> >>>>>>> code added by this patch:
> >>>>>>> http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=commitdiff_plain;h=c1b8fc0a8bff7707c10f31f3d26bfa88e18ccd94;hp=087dbf5f079f1b55cbd3964c9ce71268473d5b67
> >>>>>>> on two boxes, HP nx6325 and MSI Wind U100 (hardware-wise they
> >>>>>>> are quite
> >>>>>>> different and the HP was running 64-bit kernel and user space).
> >>>>>>>
> >>>>>>> I took four cases into consideration:
> >>>>>>> (1) synchronous suspend and resume (/sys/power/pm_async = 0)
> >>>>>>> (2) asynchronous suspend and resume as introduced by the async
> >>>>>>> branch at:
> >>>>>>>   http://git.kernel.org/?p=linux/kernel/git/rafael/suspend-2.6.git;a=shortlog;h=refs/heads/async
> >>>>>>> (3) asynchronous suspend and resume like in (2), but with your
> >>>>>>> one-liner setting
> >>>>>>>   the power.async_suspend flag for PCI bridges on top
> >>>>>>> (4) asynchronous suspend and resume like in (2), but with an
> >>>>>>> extra patch that
> >>>>>>>   is appended on top
> >>>>>>>
> >>>>>>> For those tests I set power.async_suspend for all USB devices,
> >>>>>>> all serio input
> >>>>>>> devices, the ACPI battery and the USB PCI controllers (to see
> >>>>>>> the impact of the
> >>>>>>> one-liner, if any).
> >>>>>>>
> >>>>>>> I carried out 5 consecutive suspend-resume cycles (started from
> >>>>>>> under X) on
> >>>>>>> each box in each case, and the raw data are here (all times in
> >>>>>>> milliseconds):
> >>>>>>> http://www.sisk.pl/kernel/data/async-suspend.pdf
> >>>>>>>
> >>>>>>> The summarized data are below (the "big" numbers are averages
> >>>>>>> and the +/-
> >>>>>>> numbers are standard deviations, all in milliseconds):
> >>>>>>>
> >>>>>>>           HP nx6325        MSI Wind U100
> >>>>>>>
> >>>>>>> sync suspend        1482 (+/- 40)    1180 (+/- 24)
> >>>>>>> sync resume        2955 (+/- 2)    3597 (+/- 25)
> >>>>>>>
> >>>>>>> async suspend        1553 (+/- 49)    1177 (+/- 32)
> >>>>>>> async resume        2692 (+/- 326)    3556  (+/- 33)
> >>>>>>>
> >>>>>>> async+one-liner suspend    1600 (+/- 39)    1212 (+/- 41)
> >>>>>>> async+one-liner resume    2692 (+/- 324)    3579 (+/- 24)
> >>>>>>>
> >>>>>>> async+extra suspend    1496 (+/- 37)    1217 (+/- 38)
> >>>>>>> async+extra resume    1859 (+/- 114)    1923 (+/- 35)
> >>>>>>>
> >>>>>>> So, in my opinion, with the above set of "async" devices, it
> >>>>>>> doesn't
> >>>>>>> make sense to do async suspend at all, because the sync suspend
> >>>>>>> is actually
> >>>>>>> the fastest on both machines.
> >>>>>>
> >>>>>> I think the async suspend is not asynchronous enough then - what
> >>>>>> kind of
> >>>>>> time do you get if you simply comment out call to psmouse_reset()
> >>>>>> in
> >>>>>> drivers/input/mouse/psmouse-base.c:psmouse_cleanup()?  (Just for
> >>>>>> testing
> >>>>>> purposes only, I don't think we want to do that by default.)
> >>>>>
> >>>>> The problem apparently is that the i8042 suspend/resume is
> >>>>> synchronous.
> >>>>>
> >>>>> Do you think it's safe to mark it as asynchronous?
> >>>>>
> >>>>
> >>>> Umm.. there lie dragons. There is an implicit relationship between
> >>>> i8042
> >>>> and PNP/ACPI devices representing keyboard and mouse ports, and I
> >>>> am not
> >>>> sure how happy i8042 (and most importantly the BIOS) will be if
> >>>> they get
> >>>> shut down before i8042. Also there is EC which is in theory
> >>>> independent
> >>>> but in practice not so much.
> >>>
> >>> I see.
> >>>
> >>> Is this possible to identify ACPI devices that should wait for the
> >>> i8042
> >>> suspend and that should be waited for by it on resume?
> >>
> >> We could try to add some dependencies while discovering PNP to get  
> >> KBC
> >> addresses in i8042 but we need tomake sure we do it even in presence
> >> of i8042.nopnp.
> >
> > Well, I guess this is the example of the off-tree dependencies that  
> > actually
> > matter Linus wanted. :-)
> >
> > I guess there are quite a few devices that can depend on the i8042 in
> > principle, is this correct?
> 
> The devices that depend on i8042 are serio ports that are it's  
> children.

That I already knew. :-)

> I8042 itself may have indirect dependency on a couple of PNP devices.

I was really asking about these.

> I hope this answers your question...

Yes, thanks.

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <alpine.LFD.2.00.0912191521180.3712@localhost.localdomain>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                       ` <alpine.LFD.2.00.0912191521180.3712@localhost.localdomain>
@ 2009-12-19 23:40                                                         ` Rafael J. Wysocki
       [not found]                                                         ` <200912200040.18944.rjw@sisk.pl>
  1 sibling, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-19 23:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dmitry Torokhov, LKML, ACPI Devel Maling List, pm list

On Sunday 20 December 2009, Linus Torvalds wrote:
> 
> On Sun, 20 Dec 2009, Rafael J. Wysocki wrote:
> > 
> > Well, I guess this is the example of the off-tree dependencies that actually
> > matter Linus wanted. :-)
> 
> It's also the kind of dependency where I say "if we get into these kinds 
> of messes, then the whole async crap isn't worth it".
> 
> Really. Having to try to match things up with ACPI and PnP is a nightmare. 
> Especially since I doubt Windows does anything like this, which means that 
> there's no reason for BIOS vendors to do the tables so that we'd even 
> know.

OK, so this means we can just forget about suspending/resuming i8042
asynchronously, which is a pity, because that gave us some real suspend
speedup on my test systems.

Well, whatever.

So, seriously, do you think it makes sense to do asynchronous suspend at all?
I'm asking, because we're likely to get into troubles like this during suspend
for other kinds of devices too and without resolving them we won't get any
significant speedup from asynchronous suspend.

That said, to me it's definitely worth doing asynchronous resume with the
"start asynch threads upfront" modification, as the results of the tests show
that quite clearly.  I hope you agree.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912200040.18944.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                         ` <200912200040.18944.rjw@sisk.pl>
@ 2009-12-19 23:46                                                           ` Linus Torvalds
       [not found]                                                           ` <alpine.LFD.2.00.0912191542570.3712@localhost.localdomain>
  2009-12-20  3:59                                                           ` Alan Stern
  2 siblings, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-19 23:46 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Dmitry Torokhov, LKML, ACPI Devel Maling List, pm list

On Sun, 20 Dec 2009, Rafael J. Wysocki wrote:
> 
> OK, so this means we can just forget about suspending/resuming i8042
> asynchronously, which is a pity, because that gave us some real suspend
> speedup on my test systems.

No. What it means is that you shouldn't try to come up with these idiotic 
scenarios just trying to make trouble for yourself, and using it as an 
excuse for crap.

I suggest you try to treat the i8042 controller async, and see if it is 
problematic. If it isn't, don't do that then. But we actually have no real 
reason to believe that it would be problematic, at least on a PC where the 
actual logic is on the SB (presumably behind the LPC controller).

Why would it be?

The fact that PnP and ACPI enumerates those devices has exactly _what_ to 
do with anything?

		Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <alpine.LFD.2.00.0912191542570.3712@localhost.localdomain>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                           ` <alpine.LFD.2.00.0912191542570.3712@localhost.localdomain>
@ 2009-12-19 23:47                                                             ` Linus Torvalds
  2009-12-19 23:53                                                             ` Rafael J. Wysocki
                                                                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-19 23:47 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Dmitry Torokhov, LKML, ACPI Devel Maling List, pm list



On Sat, 19 Dec 2009, Linus Torvalds wrote:
> 
> I suggest you try to treat the i8042 controller async, and see if it is 
> problematic. If it isn't, don't do that then.

I obviously meant: "If it _is_ problematic, don't do that then". "Is", not 
"isn't".

		Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                           ` <alpine.LFD.2.00.0912191542570.3712@localhost.localdomain>
  2009-12-19 23:47                                                             ` Linus Torvalds
@ 2009-12-19 23:53                                                             ` Rafael J. Wysocki
       [not found]                                                             ` <alpine.LFD.2.00.0912191546250.3712@localhost.localdomain>
       [not found]                                                             ` <200912200053.45988.rjw@sisk.pl>
  3 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-19 23:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dmitry Torokhov, LKML, ACPI Devel Maling List, pm list

On Sunday 20 December 2009, Linus Torvalds wrote:
> 
> On Sun, 20 Dec 2009, Rafael J. Wysocki wrote:
> > 
> > OK, so this means we can just forget about suspending/resuming i8042
> > asynchronously, which is a pity, because that gave us some real suspend
> > speedup on my test systems.
> 
> No. What it means is that you shouldn't try to come up with these idiotic 
> scenarios just trying to make trouble for yourself,

I haven't.  I've just asked Dmitry for his opinion and got it.  The fact that
you don't like it doesn't mean it's actually "idiotic".

> and using it as an excuse for crap.

I'm not sure what you mean exactly, but whatever.

> I suggest you try to treat the i8042 controller async, and see if it is 
> problematic.

I already have and I don't see problems with it, but quite obviously I can't
test all possible configurations out there.

> If it isn't, don't do that then. But we actually have no real 
> reason to believe that it would be problematic, at least on a PC where the 
> actual logic is on the SB (presumably behind the LPC controller).
> 
> Why would it be?

The embedded controller may depend on it.
 
Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <alpine.LFD.2.00.0912191546250.3712@localhost.localdomain>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                             ` <alpine.LFD.2.00.0912191546250.3712@localhost.localdomain>
@ 2009-12-19 23:54                                                               ` Rafael J. Wysocki
  0 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-19 23:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dmitry Torokhov, LKML, ACPI Devel Maling List, pm list

On Sunday 20 December 2009, Linus Torvalds wrote:
> 
> On Sat, 19 Dec 2009, Linus Torvalds wrote:
> > 
> > I suggest you try to treat the i8042 controller async, and see if it is 
> > problematic. If it isn't, don't do that then.
> 
> I obviously meant: "If it _is_ problematic, don't do that then". "Is", not 
> "isn't".

Sure, I understood that was a typo. :-)

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912200053.45988.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                             ` <200912200053.45988.rjw@sisk.pl>
@ 2009-12-20  0:09                                                               ` Linus Torvalds
       [not found]                                                               ` <alpine.LFD.2.00.0912191557320.3712@localhost.localdomain>
  2009-12-20  2:45                                                               ` Async suspend-resume patch w/ completions (was: Re: Async suspend-resume " Dmitry Torokhov
  2 siblings, 0 replies; 98+ messages in thread
From: Linus Torvalds @ 2009-12-20  0:09 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Dmitry Torokhov, LKML, ACPI Devel Maling List, pm list

On Sun, 20 Dec 2009, Rafael J. Wysocki wrote:
> > 
> > Why would it be?
> 
> The embedded controller may depend on it.

Again, I say "why?"

Anything can be true. That doesn't _make_ everything true. There's no real 
reason why PnP/ACPI suspend/resume should really care.

We can try it. Not for 2.6.33, but by the 34 merge window maybe we'll have 
a patch-series that is ready to be tested, and that aggressively tries to 
do the devices that matter asynchronously.

So instead of you trying to make up some idiotic cross-device worries, 
just see if those worries have any actual background in reality. So far I 
haven't actually heard anything but "in theory, anything is possible", 
which is such a truism that it's not even worth voicing.

That said, I still get the feeling that we'd be even better off simply 
trying to avoid the whole keyboard reset entirely. Apparently we do it for 
a few HP laptops. It's entirely possible that we'd be better off simply 
not _doing_ the slow thing in the first place.

For example, we may be _much_ better off doing that whole keyboard reset 
at resume time than at suspend time. That's what we do when we probe 
things on initialization - and the resume-time keyboard code is actually 
already asynchronous, it does that atkbd_reconnect asynchronously by 
queuing it as an event.

So again, all these problems may not at all be fundamnetal problems: the 
keyboard driver does certain things, but there is no guarantee that it 
_needs_ to do those things. Turning the driver async may be totally the 
wrong thing to do, when we could potentially fix latency problems at the 
driver level instead.

			Linus

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <alpine.LFD.2.00.0912191557320.3712@localhost.localdomain>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                               ` <alpine.LFD.2.00.0912191557320.3712@localhost.localdomain>
@ 2009-12-20  0:35                                                                 ` Rafael J. Wysocki
  2009-12-20  2:41                                                                 ` Dmitry Torokhov
       [not found]                                                                 ` <20091220024142.GC4073@core.coreip.homeip.net>
  2 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-20  0:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dmitry Torokhov, LKML, ACPI Devel Maling List, pm list

On Sunday 20 December 2009, Linus Torvalds wrote:
> 
> On Sun, 20 Dec 2009, Rafael J. Wysocki wrote:
> > > 
> > > Why would it be?
> > 
> > The embedded controller may depend on it.
> 
> Again, I say "why?"
> 
> Anything can be true. That doesn't _make_ everything true. There's no real 
> reason why PnP/ACPI suspend/resume should really care.
> 
> We can try it. Not for 2.6.33, but by the 34 merge window maybe we'll have 
> a patch-series that is ready to be tested, and that aggressively tries to 
> do the devices that matter asynchronously.

Yes, I'd like to have such a patch series for 2.6.34.

So far I've been able to confirm that doing serio+i8042, USB and ACPI battery
asynchronously may give us significant time savings, especially during resume.

> So instead of you trying to make up some idiotic cross-device worries, 
> just see if those worries have any actual background in reality. So far I 
> haven't actually heard anything but "in theory, anything is possible", 
> which is such a truism that it's not even worth voicing.
> 
> That said, I still get the feeling that we'd be even better off simply 
> trying to avoid the whole keyboard reset entirely. Apparently we do it for 
> a few HP laptops. It's entirely possible that we'd be better off simply 
> not _doing_ the slow thing in the first place.

That very well may be the case, but I'm not the right person to confirm or deny
that.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                               ` <alpine.LFD.2.00.0912191557320.3712@localhost.localdomain>
  2009-12-20  0:35                                                                 ` Rafael J. Wysocki
@ 2009-12-20  2:41                                                                 ` Dmitry Torokhov
       [not found]                                                                 ` <20091220024142.GC4073@core.coreip.homeip.net>
  2 siblings, 0 replies; 98+ messages in thread
From: Dmitry Torokhov @ 2009-12-20  2:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: LKML, ACPI Devel Maling List, Vojtech Pavlik, pm list

On Sat, Dec 19, 2009 at 04:09:07PM -0800, Linus Torvalds wrote:
> 
> That said, I still get the feeling that we'd be even better off simply 
> trying to avoid the whole keyboard reset entirely. Apparently we do it for 
> a few HP laptops.

I was mistaken, HP laptops do not like mouse disabled when suspending,
not sure about the rest of the state.

> It's entirely possible that we'd be better off simply 
> not _doing_ the slow thing in the first place.
>

The reset appeared first in 2.5.42. I expect that some BIOSes get very
confused when tehy find mouse speaking something that they do not
unserstand (i.e. synaptics, ALPS or anything else that is not bare PS/2
or intellimouse), but maybe Vojtech remembers better?

> For example, we may be _much_ better off doing that whole keyboard reset 
> at resume time than at suspend time.

We do the reset for the different reasons - at resume we want the device
in known state to ensure that it properly responds to the probes we
send to it. At suspend we trying to reset things into original state so
that the firmware will not be confused.

If we want to try to live without reset we could to PSMOUSE_CMD_RESET_DIS
instead of PSMOUSE_CMD_RESET_BAT which is much heavier. We should
probably not wait for .34 then because the bulk of testing will happen
only when .33 is close to be released because that's when most of
regular users will start using the new code and try to suspend and
resume.

Rafael, how long does suspend take if you change call to psmouse_reset()
in psmouse_cleanup() to ps2_command(&psmouse->ps2dev, NULL, PSMOUSE_CMD_RESET_DIS)?
And do the same for atkbd...

BTW, making just serio asynchronous while keeping i8042 synchronous
makes no sense because I serialize access to i8042 - the thing does not
survive simultaneous [command] access to both keyboard and mouse...

-- 
Dmitry

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <20091220024142.GC4073@core.coreip.homeip.net>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                                 ` <20091220024142.GC4073@core.coreip.homeip.net>
@ 2009-12-20 19:25                                                                   ` Rafael J. Wysocki
       [not found]                                                                   ` <200912202025.25618.rjw@sisk.pl>
  1 sibling, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-20 19:25 UTC (permalink / raw)
  To: linux-pm
  Cc: ACPI Devel Maling List, Dmitry Torokhov, Linus Torvalds, LKML,
	Vojtech Pavlik

On Sunday 20 December 2009, Dmitry Torokhov wrote:
> On Sat, Dec 19, 2009 at 04:09:07PM -0800, Linus Torvalds wrote:
> > 
> > That said, I still get the feeling that we'd be even better off simply 
> > trying to avoid the whole keyboard reset entirely. Apparently we do it for 
> > a few HP laptops.
> 
> I was mistaken, HP laptops do not like mouse disabled when suspending,
> not sure about the rest of the state.
> 
> > It's entirely possible that we'd be better off simply 
> > not _doing_ the slow thing in the first place.
> >
> 
> The reset appeared first in 2.5.42. I expect that some BIOSes get very
> confused when tehy find mouse speaking something that they do not
> unserstand (i.e. synaptics, ALPS or anything else that is not bare PS/2
> or intellimouse), but maybe Vojtech remembers better?
>  
> > For example, we may be _much_ better off doing that whole keyboard reset 
> > at resume time than at suspend time.
> 
> We do the reset for the different reasons - at resume we want the device
> in known state to ensure that it properly responds to the probes we
> send to it. At suspend we trying to reset things into original state so
> that the firmware will not be confused.
> 
> If we want to try to live without reset we could to PSMOUSE_CMD_RESET_DIS
> instead of PSMOUSE_CMD_RESET_BAT which is much heavier. We should
> probably not wait for .34 then because the bulk of testing will happen
> only when .33 is close to be released because that's when most of
> regular users will start using the new code and try to suspend and
> resume.
> 
> Rafael, how long does suspend take if you change call to psmouse_reset()
> in psmouse_cleanup() to ps2_command(&psmouse->ps2dev, NULL, PSMOUSE_CMD_RESET_DIS)?
> And do the same for atkbd...

On the nx6325 that appears to reduce the suspend time as much so the effect
of async is not visible any more.  On the Wind it decreases the total suspend
time almost by half!

Please push this patch to Linus. :-)

> BTW, making just serio asynchronous while keeping i8042 synchronous
> makes no sense because I serialize access to i8042 - the thing does not
> survive simultaneous [command] access to both keyboard and mouse...

OK

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <200912202025.25618.rjw@sisk.pl>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async?suspend-resume patch w/ rwsems)
       [not found]                                                                   ` <200912202025.25618.rjw@sisk.pl>
@ 2009-12-21  7:39                                                                     ` Dmitry Torokhov
       [not found]                                                                     ` <20091221073915.GC3234@core.coreip.homeip.net>
  1 sibling, 0 replies; 98+ messages in thread
From: Dmitry Torokhov @ 2009-12-21  7:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: ACPI Devel Maling List, linux-pm, Vojtech Pavlik, Linus Torvalds,
	LKML

On Sun, Dec 20, 2009 at 08:25:25PM +0100, Rafael J. Wysocki wrote:
> On Sunday 20 December 2009, Dmitry Torokhov wrote:
> > On Sat, Dec 19, 2009 at 04:09:07PM -0800, Linus Torvalds wrote:
> > > 
> > > That said, I still get the feeling that we'd be even better off simply 
> > > trying to avoid the whole keyboard reset entirely. Apparently we do it for 
> > > a few HP laptops.
> > 
> > I was mistaken, HP laptops do not like mouse disabled when suspending,
> > not sure about the rest of the state.
> > 
> > > It's entirely possible that we'd be better off simply 
> > > not _doing_ the slow thing in the first place.
> > >
> > 
> > The reset appeared first in 2.5.42. I expect that some BIOSes get very
> > confused when tehy find mouse speaking something that they do not
> > unserstand (i.e. synaptics, ALPS or anything else that is not bare PS/2
> > or intellimouse), but maybe Vojtech remembers better?
> >  
> > > For example, we may be _much_ better off doing that whole keyboard reset 
> > > at resume time than at suspend time.
> > 
> > We do the reset for the different reasons - at resume we want the device
> > in known state to ensure that it properly responds to the probes we
> > send to it. At suspend we trying to reset things into original state so
> > that the firmware will not be confused.
> > 
> > If we want to try to live without reset we could to PSMOUSE_CMD_RESET_DIS
> > instead of PSMOUSE_CMD_RESET_BAT which is much heavier. We should
> > probably not wait for .34 then because the bulk of testing will happen
> > only when .33 is close to be released because that's when most of
> > regular users will start using the new code and try to suspend and
> > resume.
> > 
> > Rafael, how long does suspend take if you change call to psmouse_reset()
> > in psmouse_cleanup() to ps2_command(&psmouse->ps2dev, NULL, PSMOUSE_CMD_RESET_DIS)?
> > And do the same for atkbd...
> 
> On the nx6325 that appears to reduce the suspend time as much so the effect
> of async is not visible any more.  On the Wind it decreases the total suspend
> time almost by half!
> 
> Please push this patch to Linus. :-)
> 

Let's see if I manage to solicit some testers first. FWIW it seems to be
working on my boxes.

But if this works then I am not sure we even want to bother with async
suspend of i8042 and serios. And serio already does resume
asynchronously through kseriod.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 98+ messages in thread

[parent not found: <20091221073915.GC3234@core.coreip.homeip.net>]

* Re: Async suspend-resume patch w/ completions (was: Re: Async?suspend-resume patch w/ rwsems)
       [not found]                                                                     ` <20091221073915.GC3234@core.coreip.homeip.net>
@ 2009-12-21 11:20                                                                       ` Vojtech Pavlik
  0 siblings, 0 replies; 98+ messages in thread
From: Vojtech Pavlik @ 2009-12-21 11:20 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: ACPI Devel Maling List, linux-pm, Linus Torvalds, LKML

On Sun, Dec 20, 2009 at 11:39:15PM -0800, Dmitry Torokhov wrote:

> > On the nx6325 that appears to reduce the suspend time as much so the effect
> > of async is not visible any more.  On the Wind it decreases the total suspend
> > time almost by half!
> > 
> > Please push this patch to Linus. :-)
> > 
> 
> Let's see if I manage to solicit some testers first. FWIW it seems to be
> working on my boxes.
> 
> But if this works then I am not sure we even want to bother with async
> suspend of i8042 and serios. And serio already does resume
> asynchronously through kseriod.

I'm kind of wondering where this will break, but I don't remember why
the RESET_BAT was put in exactly - the point of making sure the BIOS
doesn't get confused by the advanced modes is correct, and is required
at least when a keyboard is set to "Set 3", but RESET_BAT is a too heavy
hammer anyway - we could just make sure to switch the kbd/mouse to
'default' modes instead of doing a full reset.

-- 
Vojtech Pavlik
Director SuSE Labs

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                             ` <200912200053.45988.rjw@sisk.pl>
  2009-12-20  0:09                                                               ` Linus Torvalds
       [not found]                                                               ` <alpine.LFD.2.00.0912191557320.3712@localhost.localdomain>
@ 2009-12-20  2:45                                                               ` Dmitry Torokhov
  2 siblings, 0 replies; 98+ messages in thread
From: Dmitry Torokhov @ 2009-12-20  2:45 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, ACPI Devel Maling List, Linus Torvalds, pm list

On Sun, Dec 20, 2009 at 12:53:45AM +0100, Rafael J. Wysocki wrote:
> On Sunday 20 December 2009, Linus Torvalds wrote:
> > 
> > If it isn't, don't do that then. But we actually have no real 
> > reason to believe that it would be problematic, at least on a PC where the 
> > actual logic is on the SB (presumably behind the LPC controller).
> > 
> > Why would it be?
> 
> The embedded controller may depend on it.
>

No, not really depend but rather wierd things may happen if you
accessing both. Witness regressions where touching embedded controller
makes us lose data from touchpad, I think you are CCed on that bug.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
       [not found]                                                         ` <200912200040.18944.rjw@sisk.pl>
  2009-12-19 23:46                                                           ` Linus Torvalds
       [not found]                                                           ` <alpine.LFD.2.00.0912191542570.3712@localhost.localdomain>
@ 2009-12-20  3:59                                                           ` Alan Stern
  2 siblings, 0 replies; 98+ messages in thread
From: Alan Stern @ 2009-12-20  3:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Dmitry Torokhov, LKML, ACPI Devel Maling List, Linus Torvalds,
	pm list

On Sun, 20 Dec 2009, Rafael J. Wysocki wrote:

> So, seriously, do you think it makes sense to do asynchronous suspend at all?
> I'm asking, because we're likely to get into troubles like this during suspend
> for other kinds of devices too and without resolving them we won't get any
> significant speedup from asynchronous suspend.
> 
> That said, to me it's definitely worth doing asynchronous resume with the
> "start asynch threads upfront" modification, as the results of the tests show
> that quite clearly.  I hope you agree.

It's too early to come to this sort of conclusion (i.e., that suspend
and resume react very differently to an asynchronous approach).  Unless
you have some definite _reason_ for thinking that resume will benefit
more than suspend, you shouldn't try to generalize so much from tests
on only two systems.

Alan Stern

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
  2009-12-12 18:54                       ` Linus Torvalds
  2009-12-12 22:34                         ` Rafael J. Wysocki
@ 2009-12-13 13:08                         ` Rafael J. Wysocki
  2009-12-13 17:30                         ` Alan Stern
  2 siblings, 0 replies; 98+ messages in thread
From: Rafael J. Wysocki @ 2009-12-13 13:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Saturday 12 December 2009, Linus Torvalds wrote:
> 
> On Sat, 12 Dec 2009, Rafael J. Wysocki wrote:
> > 
> > I'd like to put it into my tree in this form, if you don't mind.
> 
> This version still has a major problem, which is not related to 
> completions vs rwsems, but simply to the fact that you wanted to do this 
> at the generic device layer level rather than do it at the actual 
> low-level suspend/resume level.
> 
> Namely that there's no apparent sane way to say "don't wait for children".

There is, if the partent would really do something that could disturb the
children.  This isn't always the case, but at least in a few important cases
it is (think of a USB controller and USB devices behind it, for example).

I thought we had this discussion already, but perhaps that was with someone
else and in a slightly different context.

The main reasons why I think it's useful to do this at the generic device layer
level are that, if we do it this way:

a. Drivers that don't want to be "asynchronous" don't need to care in any case.

b. Drivers whose suspend and resume routines are guaranteed not to disturb
   anyone else can mark their devices as "async" and be done with it, no other
   modification of the code is needed (drivers that do nothing in their suspend
   and resume routines also fall into this category).

Now, if it's done at the low-level suspend/resume level, a. will not be true
any more in general.  Say device A has parent B and the driver of A wants to
suspend asynchrnously.  It needs to split its suspend into synchronous and
asynchronous part and at one point start an async thread to run the latter.
Now assume B has a real reason not to suspend before the suspens of A has
finished.  Then, the driver of B has to be modified so that it waits for the
A's async suspend to complete (some sort of synchronization between the two
has to be added).  So, even if B is "synchronous", its driver has to be
modified to handle the asynchronous suspend of A.

Similarly, b. will no longer be true if it's done at the low-level
suspend/resume level, because now every driver that wants to be
"asynchronous" will need to take care of running an async thread etc.
Moreover, it will need to make sure that the device parent's driver doesn't
need to be modified, because the parent's suspend may do something that will
disturb the child's asynchronous suspend.  Furthermore, if the parent's driver
doesn't need to be modified, it will need to consider the parent of the parent,
because that one may potentially disturb the asynchronous suspend of its
grand child and so on up to a device without a parent.

That already is a pain to a driver writer, but the problem you're saying would
be solved by doing this at the low-level suspend/resume level is still there
in general!  Namely, go back do the example with devices A and B and say B
_really_ has to wait for A's suspend to complete.  Then, since B is after A in
dpm_list, the PM core will not start the suspend of any device after B until
the suspend of B returns.  Now, if the suspend of B waits for the suspend of
A, then the PM core will effectively wait for the suspend of A to complete
before suspending any other devices.  Worse yet, if that happens, we can't do
anything about it at the low-level suspend/resume level, althouth at the PM
core level we can.

Rafael

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)
  2009-12-12 18:54                       ` Linus Torvalds
  2009-12-12 22:34                         ` Rafael J. Wysocki
  2009-12-13 13:08                         ` Rafael J. Wysocki
@ 2009-12-13 17:30                         ` Alan Stern
  2 siblings, 0 replies; 98+ messages in thread
From: Alan Stern @ 2009-12-13 17:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ACPI Devel Maling List, LKML, pm list

On Sat, 12 Dec 2009, Linus Torvalds wrote:

> This version still has a major problem, which is not related to 
> completions vs rwsems, but simply to the fact that you wanted to do this 
> at the generic device layer level rather than do it at the actual 
> low-level suspend/resume level.
> 
> Namely that there's no apparent sane way to say "don't wait for children".
> 
> PCI bridges that don't suspend at all - or any other device that only 
> suspends in the 'suspend_late()' thing, for that matter - don't have any 
> reason what-so-ever to wait for children, since they aren't actually 
> suspending in the first place. But you make them wait regardless, which 
> then serializes things unnecessarily (for example, two unrelated USB 
> controllers).

In reality this should never be a problem.

Consider that ultimately we want to achieve the following two goals:

	Implement a two-pass algorithm, so that synchronous devices
	can't cause spurious dependencies between two async devices.
	(This will fix the issue of an intermediate PCI bridge
	serializing two unrelated USB controllers.)

	Convert all lengthy suspend/resume operations to async.

Obviously we don't want to do this all at once.  But until the goals
are achieved, there's no point worrying about devices being forced to
wait for their children or parents.  And after the goals are achieved,
it won't matter.

Why not?  Consider the devices which would be delayed.  If they use
synchronous suspend/resume then they won't take much time, so delaying
them won't matter.  Indeed, based on Arjan's preliminary measurements
it's fair to say that the total time taken by all the synchronous
suspends/resumes put together should be negligible.  Even if all of
them were somehow delayed until all the async activities were complete,
nobody would notice or care.  (And conversely, if all the async
activities could somehow be forced to wait until all the synchronous
suspends/resumes were done, nobody would notice or care.)

Okay, so consider a case where A comes before B in dpm_list and B is 
the parent of C.  Suppose B doesn't need to wait for C to suspend, but 
we force it to wait anyhow.

If A or C is synchronous then we're okay, by the considerations above.  
Suppose A is async.  Then it wouldn't be delayed unless it was one of
B's ancestors, so suppose it is.  Now we are potentially delaying A
more than necessary.

Or are we?  Even though B might not need to wait for C to suspend,
there's an excellent chance that A _does_ need to wait for C.  If we
allow B to suspend before C then there would be nothing to prevent A
from suspending too quickly.  A's driver would need to wait explicitly 
for C -- which is unreasonable since C isn't one of A's children.  
(Rafael made a similar point.)

In short, allowing devices to suspend before their children would be 
dangerous and probably would not save a significant amount of time.

Alan Stern

^ permalink raw reply	[flat|nested] 98+ messages in thread

end of thread, other threads:[~2009-12-21 11:20 UTC | newest]

Thread overview: 98+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.44L0.0912111938310.32493-100000@netrider.rowland.org>
2009-12-12 17:35 ` Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems) Rafael J. Wysocki
     [not found] <Pine.LNX.4.44L0.0912201434340.27137-100000@netrider.rowland.org>
2009-12-20 19:51 ` Rafael J. Wysocki
     [not found] <200912201910.26895.rjw@sisk.pl>
2009-12-20 19:38 ` Alan Stern
     [not found] <Pine.LNX.4.44L0.0912201210300.24162-100000@netrider.rowland.org>
2009-12-20 18:10 ` Rafael J. Wysocki
     [not found] <200912201352.07689.rjw@sisk.pl>
2009-12-20 17:12 ` Alan Stern
     [not found] <Pine.LNX.4.44L0.0912192232360.6618-100000@netrider.rowland.org>
2009-12-20 12:55 ` Rafael J. Wysocki
     [not found] <Pine.LNX.4.44L0.0912192253200.6618-100000@netrider.rowland.org>
2009-12-20 12:52 ` Rafael J. Wysocki
     [not found] <200912192241.03991.rjw@sisk.pl>
2009-12-20  3:48 ` Alan Stern
     [not found] <Pine.LNX.4.44L0.0912181205290.2987-100000@iolanthe.rowland.org>
2009-12-19 21:41 ` Rafael J. Wysocki
     [not found] <Pine.LNX.4.44L0.0912171444040.2645-100000@iolanthe.rowland.org>
2009-12-17 20:36 ` Rafael J. Wysocki
     [not found] <Pine.LNX.4.44L0.0912161753540.2643-100000@iolanthe.rowland.org>
2009-12-16 23:18 ` Rafael J. Wysocki
     [not found] ` <200912170018.05175.rjw@sisk.pl>
2009-12-17  1:30   ` Rafael J. Wysocki
     [not found] <Pine.LNX.4.44L0.0912161018100.2909-100000@iolanthe.rowland.org>
2009-12-16 19:26 ` Rafael J. Wysocki
     [not found] <alpine.LFD.2.00.0912151337350.14385@localhost.localdomain>
2009-12-15 22:27 ` Alan Stern
     [not found] <200912152226.22578.rjw@sisk.pl>
2009-12-15 22:01 ` Alan Stern
     [not found] <Pine.LNX.4.44L0.0912151444010.2643-100000@iolanthe.rowland.org>
2009-12-15 21:26 ` Rafael J. Wysocki
2009-12-15 21:54 ` Linus Torvalds
     [not found] <Pine.LNX.4.44L0.0912151047410.3566-100000@iolanthe.rowland.org>
2009-12-15 16:28 ` Linus Torvalds
     [not found] ` <alpine.LFD.2.00.0912150803250.14385@localhost.localdomain>
2009-12-15 18:57   ` Linus Torvalds
2009-12-15 20:26   ` Alan Stern
     [not found] <Pine.LNX.4.44L0.0912131221210.1111-100000@netrider.rowland.org>
2009-12-13 19:02 ` Alan Stern
     [not found] <200912112317.31668.rjw@sisk.pl>
2009-12-12  0:38 ` Alan Stern
     [not found] <Pine.LNX.4.44L0.0912102155390.12136-100000@netrider.rowland.org>
2009-12-11 22:17 ` Rafael J. Wysocki
     [not found] <Pine.LNX.4.44L0.0912101321020.2680-100000@iolanthe.rowland.org>
2009-12-10 23:51 ` Linus Torvalds
     [not found] <Pine.LNX.4.44L0.0912101653120.2680-100000@iolanthe.rowland.org>
2009-12-10 23:45 ` Rafael J. Wysocki
     [not found] <200912102214.40310.rjw@sisk.pl>
2009-12-10 22:17 ` Alan Stern
     [not found] <Pine.LNX.4.44L0.0912101010090.2825-100000@iolanthe.rowland.org>
2009-12-10 15:45 ` Linus Torvalds
2009-12-10 21:14 ` Rafael J. Wysocki
     [not found] <alpine.LFD.2.00.0912100739260.3560@localhost.localdomain>
2009-12-10 18:37 ` Alan Stern
     [not found] <Pine.LNX.4.44L0.0912091729530.2672-100000@iolanthe.rowland.org>
2009-12-09 23:18 ` Rafael J. Wysocki
     [not found] ` <200912100018.19723.rjw@sisk.pl>
2009-12-10  2:51   ` Linus Torvalds
2009-12-10 15:31   ` Alan Stern
     [not found]   ` <alpine.LFD.2.00.0912091835280.3560@localhost.localdomain>
2009-12-10 19:40     ` Rafael J. Wysocki
     [not found]     ` <200912102040.11063.rjw@sisk.pl>
2009-12-10 23:30       ` Linus Torvalds
     [not found]       ` <alpine.LFD.2.00.0912101507550.3560@localhost.localdomain>
2009-12-11  1:02         ` Rafael J. Wysocki
     [not found]         ` <200912110202.28536.rjw@sisk.pl>
2009-12-11  1:25           ` Linus Torvalds
     [not found]           ` <alpine.LFD.2.00.0912101713440.3560@localhost.localdomain>
2009-12-11  3:42             ` Alan Stern
2009-12-11 22:11             ` Rafael J. Wysocki
     [not found]             ` <200912112311.08548.rjw@sisk.pl>
2009-12-11 22:31               ` Linus Torvalds
     [not found]               ` <alpine.LFD.2.00.0912111415160.3922@localhost.localdomain>
2009-12-11 23:48                 ` Rafael J. Wysocki
     [not found]                 ` <200912120048.46180.rjw@sisk.pl>
2009-12-11 23:53                   ` Linus Torvalds
2009-12-12  0:43                   ` Alan Stern
     [not found]                   ` <alpine.LFD.2.00.0912111552330.3526@localhost.localdomain>
2009-12-12 17:48                     ` Rafael J. Wysocki
2009-12-12 18:54                       ` Linus Torvalds
2009-12-12 22:34                         ` Rafael J. Wysocki
2009-12-12 22:40                           ` Rafael J. Wysocki
2009-12-14 18:21                           ` Linus Torvalds
     [not found]                           ` <alpine.LFD.2.00.0912141015240.26135@localhost.localdomain>
2009-12-14 22:11                             ` Rafael J. Wysocki
     [not found]                             ` <200912142311.31658.rjw@sisk.pl>
2009-12-14 22:41                               ` Linus Torvalds
     [not found]                               ` <alpine.LFD.2.00.0912141416040.26135@localhost.localdomain>
2009-12-14 22:43                                 ` Linus Torvalds
2009-12-14 23:18                                 ` Rafael J. Wysocki
     [not found]                                 ` <200912150018.11837.rjw@sisk.pl>
2009-12-15  0:10                                   ` Linus Torvalds
     [not found]                                   ` <alpine.LFD.2.00.0912141609020.14385@localhost.localdomain>
2009-12-15  0:11                                     ` Linus Torvalds
2009-12-15 11:03                                     ` Rafael J. Wysocki
     [not found]                                     ` <alpine.LFD.2.00.0912141610460.14385@localhost.localdomain>
2009-12-15 11:14                                       ` Rafael J. Wysocki
     [not found]                                       ` <200912151214.10980.rjw@sisk.pl>
2009-12-15 15:31                                         ` Linus Torvalds
     [not found]                                     ` <200912151203.22916.rjw@sisk.pl>
2009-12-15 15:26                                       ` Linus Torvalds
     [not found]                                       ` <alpine.LFD.2.00.0912150722310.14385@localhost.localdomain>
2009-12-15 15:55                                         ` Alan Stern
2009-12-16  2:11                                         ` Rafael J. Wysocki
     [not found]                                         ` <200912160311.05915.rjw@sisk.pl>
2009-12-16  6:40                                           ` Dmitry Torokhov
2009-12-16 15:22                                           ` Alan Stern
2009-12-16 15:47                                           ` Linus Torvalds
2009-12-16 19:27                                             ` Rafael J. Wysocki
     [not found]                                             ` <200912162027.16574.rjw@sisk.pl>
2009-12-16 20:59                                               ` Linus Torvalds
     [not found]                                               ` <alpine.LFD.2.00.0912161255080.3556@localhost.localdomain>
2009-12-16 21:57                                                 ` Rafael J. Wysocki
     [not found]                                                 ` <200912162257.00771.rjw@sisk.pl>
2009-12-16 22:11                                                   ` Linus Torvalds
     [not found]                                                   ` <alpine.LFD.2.00.0912161410120.3556@localhost.localdomain>
2009-12-16 22:33                                                     ` Rafael J. Wysocki
2009-12-16 23:04                                                   ` Alan Stern
2009-12-17  1:49                                                   ` Rafael J. Wysocki
2009-12-17 20:06                                                     ` Alan Stern
2009-12-18  1:51                                                     ` Rafael J. Wysocki
     [not found]                                                     ` <200912180251.22655.rjw@sisk.pl>
2009-12-18 17:26                                                       ` Alan Stern
2009-12-18 23:42                                                       ` Rafael J. Wysocki
     [not found]                                           ` <20091216064025.GB2699@core.coreip.homeip.net>
2009-12-18 22:43                                             ` Rafael J. Wysocki
2009-12-19 19:59                                               ` Dmitry Torokhov
     [not found]                                               ` <20091219195935.GB4073@core.coreip.homeip.net>
2009-12-19 21:33                                                 ` Rafael J. Wysocki
     [not found]                                                 ` <200912192233.44575.rjw@sisk.pl>
2009-12-19 22:29                                                   ` Rafael J. Wysocki
     [not found]                                                   ` <200912192329.03251.rjw@sisk.pl>
2009-12-19 22:43                                                     ` Dmitry Torokhov
2009-12-19 22:47                                                   ` Dmitry Torokhov
     [not found]                                                   ` <A37A0A6F-3662-40C9-BE1F-B9F6A38CD80B@gmail.com>
2009-12-19 23:10                                                     ` Rafael J. Wysocki
     [not found]                                                     ` <200912200010.19899.rjw@sisk.pl>
2009-12-19 23:22                                                       ` Dmitry Torokhov
2009-12-19 23:23                                                       ` Linus Torvalds
     [not found]                                                       ` <43A402BB-6AB3-4127-A441-D53EDE09F22E@gmail.com>
2009-12-19 23:33                                                         ` Rafael J. Wysocki
     [not found]                                                       ` <alpine.LFD.2.00.0912191521180.3712@localhost.localdomain>
2009-12-19 23:40                                                         ` Rafael J. Wysocki
     [not found]                                                         ` <200912200040.18944.rjw@sisk.pl>
2009-12-19 23:46                                                           ` Linus Torvalds
     [not found]                                                           ` <alpine.LFD.2.00.0912191542570.3712@localhost.localdomain>
2009-12-19 23:47                                                             ` Linus Torvalds
2009-12-19 23:53                                                             ` Rafael J. Wysocki
     [not found]                                                             ` <alpine.LFD.2.00.0912191546250.3712@localhost.localdomain>
2009-12-19 23:54                                                               ` Rafael J. Wysocki
     [not found]                                                             ` <200912200053.45988.rjw@sisk.pl>
2009-12-20  0:09                                                               ` Linus Torvalds
     [not found]                                                               ` <alpine.LFD.2.00.0912191557320.3712@localhost.localdomain>
2009-12-20  0:35                                                                 ` Rafael J. Wysocki
2009-12-20  2:41                                                                 ` Dmitry Torokhov
     [not found]                                                                 ` <20091220024142.GC4073@core.coreip.homeip.net>
2009-12-20 19:25                                                                   ` Rafael J. Wysocki
     [not found]                                                                   ` <200912202025.25618.rjw@sisk.pl>
2009-12-21  7:39                                                                     ` Async suspend-resume patch w/ completions (was: Re: Async?suspend-resume " Dmitry Torokhov
     [not found]                                                                     ` <20091221073915.GC3234@core.coreip.homeip.net>
2009-12-21 11:20                                                                       ` Vojtech Pavlik
2009-12-20  2:45                                                               ` Async suspend-resume patch w/ completions (was: Re: Async suspend-resume " Dmitry Torokhov
2009-12-20  3:59                                                           ` Alan Stern
2009-12-13 13:08                         ` Rafael J. Wysocki
2009-12-13 17:30                         ` Alan Stern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox