Wakeup-events implementation

public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed

* Wakeup-events implementation
       [not found] <AANLkTimgJUAW9TU37XE4BqYWf3yCWvg-vgk3E0YLqcm9@mail.gmail.com>
@ 2010-08-13 15:59 ` Alan Stern
  2010-08-14  4:59   ` Arve Hjønnevåg
  0 siblings, 1 reply; 18+ messages in thread
From: Alan Stern @ 2010-08-13 15:59 UTC (permalink / raw)
  To: Arve Hjønnevåg; +Cc: Linux-pm mailing list

Since this area of the discussion has strayed away from the path of 
Paul McKenney's original summary, I'm starting a new email thread and 
drastically cutting the CC list.

On Sun, 8 Aug 2010, Arve Hjønnevåg wrote:

> > Maybe.  There aren't enough examples coded up yet to be sure.  At this
> > point we could easily make the device argument required instead of
> > optional, and we could add a device argument to pm_relax.
> >
> I think that would be a good start. I can create dummy devices for now
> where I don't already have a device.

Rafael and I spent some time discussing all this during LinuxCon.  
Combining what he thought with my own ideas led us to decide that the 
wakeup information should be stored in a separate structure, not in 
dev_pm_info.  These structures will be allocated dynamically when a 
device is enabled for remote wakeup (and presumably deallocated if a 
device is disabled for remote wakeup).  A pointer to the new structure 
will be stored in dev_pm_info.  We expect that most devices won't need 
a wakeup structure, because most devices aren't wakeup sources.

This new structure (as yet unnamed) will be a lot like your
suspend_blocker structure, but with some differences.  It won't need a 
name, since the device already has a name.  It won't have a timer, if 
we can get away with leaving one out.  But it will optionally contain 
various counters that can be used for statistics reporting and 
debugging.  The API for exporting these fields to userspace has not yet 
been decided upon.

The device argument will become mandatory for pm_stay_awake, pm_relax, 
and pm_wakeup_event, so that the counter fields can be updated 
properly.

> > In the scheme we're talking about, the suspend-blocking interface for
> > uesrspace would be located entirely in the power manager.  Hence it
> > could maintain all the necessary statistics, without involving the
> > kernel.  During early stages of system startup, before the power
> > manager is running, lower-level services would not be able to block
> > suspend.  This wouldn't matter because at those times the system simply
> > would not ever suspend -- because suspends have to be initiated by the
> > power manager.
> 
> If we do that, a low level process could try to block suspend while
> the power manager is not running. Then the power manager could start
> and decide to suspend not knowing that a low level process wanted to
> block suspend.

There's a simple solution: Put all this in a _new_, low-level power
manager program that can be started very early during boot-up (before
any of the other low-level services that need to block suspend).  Then
all the other programs would communicate their suspend-blocker
requirements to this program instead of to the kernel.

You expressed concern about the new power manager deadlocking.  This
can be avoided by designing the program properly.  It should run in two
threads.  Thread 1 will handle IPC, keeping track of userspace
suspend-blocker requests, their statistics, and so on.  Thread 2 will
manage the PM interface to the kernel, and it will execute the
following simple loop:

	for (;;) {
		read /sys/power/wakeup_count	/* blocking read */
		if (there are any active userspace suspend blockers) {
			wait until they are all deactivated
		} else {
			if (write to /sys/power/wakeup_count succeeds)
				write "mem" to /sys/power/state
		}
	}

I think this will resolve the problem you were worried about.

> >> >> I don't know if they are all kernel-internal but these drivers appear
> >> >> to use timeouts and active cancellation on the same wakelock:
> >> >> wifi driver, mmc core, alarm driver, evdev (suspend blocker version
> >> >> removes the timeout).
> >
> > Roughly speaking, what do the timings work out to be?  That is, how
> > long are the timeouts for these wakelocks, how long does it usually
> > take for one of them to be actively cancelled, and what percentage of
> > them end up timing out?
> >
> The evdev timeout is a few seconds, and never timeout unless user
> space is misbehaving. I don't know how the wifi and mmc wakelocks are
> used. The alarm driver uses a 1 or 2 second timeout to abort suspend
> when an alarm is set to close to program an rtc alarm. I think this
> one is cancelled when the alarm triggers, and the timeout only handles
> the case where the client cancelled the alarm before it trigger, but I
> don't remember for sure.

Evdev turns out to be tricky for a couple of reasons.  One is this
business of combining active cancellation with timeouts (we might end
up needing a special-purpose timer for this).  Another is the way all
input events funnel into a single event queue.

For example, suppose you want the keyboard to be a wakeup device but 
not the mouse.  Once events get into the input queue, there's no way to 
tell where they came from.  Hence mouse motion events would prevent the 
system from suspending, even though they weren't supposed to.

> You cannot easily mix timeouts, cancellations and nesting.
> Wakelocks/suspend blockers allow you to mix timeouts and
> cancellations.

That's true.  And it turns out that we do need to mix cancellations 
with nesting.  For example, it may well happen that pm_request_resume 
gets called several times before the pm workqueue thread gets around to 
resuming the device.  If each of those calls has a corresponding call 
to pm_stay_awake then we would like to cancel all of them with a single 
pm_relax after the driver's runtime_resume callback is finished.

On the other hand, the driver's runtime_resume routine might need to 
carry out some work in a different kernel thread.  It would call 
pm_stay_awake, and we wouldn't want _this_ call to be cancelled until 
the other thread has run.

In the end, the best answer may be to keep a count of pending wakeup
events in the new wakeup structure (as well as a global count of the
number of devices whose pending count is > 0).  Then pm_stay_awake
would increment the count, pm_relax would decrement it, and there could
be a variant of pm_stay_awake that would increment the count only if it
was currently equal to 0.

> The cleanup is a little simpler when you don't nest and there is less
> chance that a missed edge case prevents suspend forever (we had a few
> of those bugs in our userspace code that used nested wakelocks).

Bugs in the kernel can be tracked down and fixed.

> If
> the above is all you do, then you block suspend forever if someone
> closes an non-empty input device.

That sounds like a fixable bug.  :-)

Alan Stern

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Wakeup-events implementation
  2010-08-13 15:59 ` Wakeup-events implementation Alan Stern
@ 2010-08-14  4:59   ` Arve Hjønnevåg
  2010-08-16 19:22     ` Alan Stern
  0 siblings, 1 reply; 18+ messages in thread
From: Arve Hjønnevåg @ 2010-08-14  4:59 UTC (permalink / raw)
  To: Alan Stern; +Cc: Linux-pm mailing list

2010/8/13 Alan Stern <stern@rowland.harvard.edu>:
> Since this area of the discussion has strayed away from the path of
> Paul McKenney's original summary, I'm starting a new email thread and
> drastically cutting the CC list.
>
> On Sun, 8 Aug 2010, Arve Hjønnevåg wrote:
>
>> > Maybe.  There aren't enough examples coded up yet to be sure.  At this
>> > point we could easily make the device argument required instead of
>> > optional, and we could add a device argument to pm_relax.
>> >
>> I think that would be a good start. I can create dummy devices for now
>> where I don't already have a device.
>
> Rafael and I spent some time discussing all this during LinuxCon.
> Combining what he thought with my own ideas led us to decide that the
> wakeup information should be stored in a separate structure, not in
> dev_pm_info.  These structures will be allocated dynamically when a
> device is enabled for remote wakeup (and presumably deallocated if a
> device is disabled for remote wakeup).  A pointer to the new structure
> will be stored in dev_pm_info.  We expect that most devices won't need
> a wakeup structure, because most devices aren't wakeup sources.
>
> This new structure (as yet unnamed) will be a lot like your
> suspend_blocker structure, but with some differences.  It won't need a
> name, since the device already has a name.

A name is still useful if we allocate multiple handles per device (see
below for why one handle per device is not enough).

> It won't have a timer, if
> we can get away with leaving one out.

The suspend_blocker structure does not have a timer, it has a timeout.
This was intentional, since it avoids unnecessary timers from firing
but it also means it takes a lot less space than the stats. We do need
the timeout in the structure to support some of our drivers since they
mix wake_lock_timeout and wake_unlock.

> But it will optionally contain
> various counters that can be used for statistics reporting and
> debugging.  The API for exporting these fields to userspace has not yet
> been decided upon.
>
> The device argument will become mandatory for pm_stay_awake, pm_relax,
> and pm_wakeup_event, so that the counter fields can be updated
> properly.
>
I think this is a step in the right direction, but it is not enough to
support mixing timeouts and pm_relax in all the drivers.

>> > In the scheme we're talking about, the suspend-blocking interface for
>> > uesrspace would be located entirely in the power manager.  Hence it
>> > could maintain all the necessary statistics, without involving the
>> > kernel.  During early stages of system startup, before the power
>> > manager is running, lower-level services would not be able to block
>> > suspend.  This wouldn't matter because at those times the system simply
>> > would not ever suspend -- because suspends have to be initiated by the
>> > power manager.
>>
>> If we do that, a low level process could try to block suspend while
>> the power manager is not running. Then the power manager could start
>> and decide to suspend not knowing that a low level process wanted to
>> block suspend.
>
> There's a simple solution: Put all this in a _new_, low-level power
> manager program that can be started very early during boot-up (before
> any of the other low-level services that need to block suspend).  Then
> all the other programs would communicate their suspend-blocker
> requirements to this program instead of to the kernel.
>

Maintaining an out of mainline driver to export allow user space to
block suspend seems like a more appealing solution than adding another
process for everyone to depend on. Having a user space interface to
block suspend in the mainline kernel would still be valuable, though.
Like I mentioned in another message, this would allow some low level
services, like the user-space battery monitor some devices have, to
not be android specific.

> You expressed concern about the new power manager deadlocking.  This
> can be avoided by designing the program properly.  It should run in two
> threads.  Thread 1 will handle IPC, keeping track of userspace
> suspend-blocker requests, their statistics, and so on.  Thread 2 will
> manage the PM interface to the kernel, and it will execute the
> following simple loop:
>
>        for (;;) {
>                read /sys/power/wakeup_count    /* blocking read */
>                if (there are any active userspace suspend blockers) {
>                        wait until they are all deactivated
>                } else {
>                        if (write to /sys/power/wakeup_count succeeds)
>                                write "mem" to /sys/power/state
>                }
>        }
>
> I think this will resolve the problem you were worried about.
>

I know how to avoid the deadlock. My point was that adding the code in
the drivers to support the overlapping wakelock model of handling
wakeup events, means that a single threaded power manager that did not
deadlock before now could deadlock.

>> >> >> I don't know if they are all kernel-internal but these drivers appear
>> >> >> to use timeouts and active cancellation on the same wakelock:
>> >> >> wifi driver, mmc core, alarm driver, evdev (suspend blocker version
>> >> >> removes the timeout).
>> >
>> > Roughly speaking, what do the timings work out to be?  That is, how
>> > long are the timeouts for these wakelocks, how long does it usually
>> > take for one of them to be actively cancelled, and what percentage of
>> > them end up timing out?
>> >
>> The evdev timeout is a few seconds, and never timeout unless user
>> space is misbehaving. I don't know how the wifi and mmc wakelocks are
>> used. The alarm driver uses a 1 or 2 second timeout to abort suspend
>> when an alarm is set to close to program an rtc alarm. I think this
>> one is cancelled when the alarm triggers, and the timeout only handles
>> the case where the client cancelled the alarm before it trigger, but I
>> don't remember for sure.
>
> Evdev turns out to be tricky for a couple of reasons.  One is this
> business of combining active cancellation with timeouts (we might end
> up needing a special-purpose timer for this).  Another is the way all
> input events funnel into a single event queue.
>

Input events do not funnel to a single event queue in the kernel (at
least not with the interface I have used). There are separate queues
for each client of every device.

> For example, suppose you want the keyboard to be a wakeup device but
> not the mouse.  Once events get into the input queue, there's no way to
> tell where they came from.  Hence mouse motion events would prevent the
> system from suspending, even though they weren't supposed to.
>
>> You cannot easily mix timeouts, cancellations and nesting.
>> Wakelocks/suspend blockers allow you to mix timeouts and
>> cancellations.
>
> That's true.  And it turns out that we do need to mix cancellations
> with nesting.  For example, it may well happen that pm_request_resume
> gets called several times before the pm workqueue thread gets around to
> resuming the device.  If each of those calls has a corresponding call
> to pm_stay_awake then we would like to cancel all of them with a single
> pm_relax after the driver's runtime_resume callback is finished.
>
> On the other hand, the driver's runtime_resume routine might need to
> carry out some work in a different kernel thread.  It would call
> pm_stay_awake, and we wouldn't want _this_ call to be cancelled until
> the other thread has run.
>
> In the end, the best answer may be to keep a count of pending wakeup
> events in the new wakeup structure (as well as a global count of the
> number of devices whose pending count is > 0).  Then pm_stay_awake
> would increment the count, pm_relax would decrement it, and there could
> be a variant of pm_stay_awake that would increment the count only if it
> was currently equal to 0.
>

If I correctly understand what you are suggesting, that does not work.
If two subsystems use the same device an one of them expects the
pm_stay_awake call to be nested and the other one does not, then the
subsystem that uses the non-nested api is not preventing suspend if
the other subsystem was preventing suspend when it called
pm_stay_awake_if_not_zero (the non-nested pm_relax variant would cause
similar problems). Also, to support all our existing drivers we need
multiple handles per device. For each input device we have a wakelock
with a timeout per client. We only allow suspend when all the clients
have emptied their queue or had their wakelock timeout.

>> The cleanup is a little simpler when you don't nest and there is less
>> chance that a missed edge case prevents suspend forever (we had a few
>> of those bugs in our userspace code that used nested wakelocks).
>
> Bugs in the kernel can be tracked down and fixed.
>

Yes, bugs can be fixed, but not all bugs are discovered quickly and
products ship before some bugs are fixed. For instance, when applying
our patch that holds a wakelock when the input queue is not empty to
this weeks kernel, I noticed that someone had fixed a bug where a
queue overflow could leave the queue empty. If you hit this bug with
the wakelock with timeout implementation that we currently use, then
it would cause suspend to be blocked for 5 seconds (less if another
event occurred or the file descriptor was closed). With the patch that
uses a suspend blocker without a timeout, it would block suspend until
another event from the same device occurred or user space closes the
device. With a nesting api like the current pm_stay_awake/pm_relax, it
will prevent suspend forever (unless another driver has the opposite
bug).

Do we need the nested api for something? We need a non-nested api to
mix timeouts and pm_relax, and unless we need the nested variants, it
is easier to just make all the calls non-nesting.

>> If
>> the above is all you do, then you block suspend forever if someone
>> closes an non-empty input device.
>
> That sounds like a fixable bug.  :-)
>
> Alan Stern
>
>



-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Wakeup-events implementation
  2010-08-14  4:59   ` Arve Hjønnevåg
@ 2010-08-16 19:22     ` Alan Stern
  2010-08-17  0:30       ` Arve Hjønnevåg
  0 siblings, 1 reply; 18+ messages in thread
From: Alan Stern @ 2010-08-16 19:22 UTC (permalink / raw)
  To: Arve Hjønnevåg; +Cc: Linux-pm mailing list

On Fri, 13 Aug 2010, Arve Hjønnevåg wrote:

> > Rafael and I spent some time discussing all this during LinuxCon.
> > Combining what he thought with my own ideas led us to decide that the
> > wakeup information should be stored in a separate structure, not in
> > dev_pm_info.  These structures will be allocated dynamically when a
> > device is enabled for remote wakeup (and presumably deallocated if a
> > device is disabled for remote wakeup).  A pointer to the new structure
> > will be stored in dev_pm_info.  We expect that most devices won't need
> > a wakeup structure, because most devices aren't wakeup sources.
> >
> > This new structure (as yet unnamed) will be a lot like your
> > suspend_blocker structure, but with some differences.  It won't need a
> > name, since the device already has a name.
> 
> A name is still useful if we allocate multiple handles per device (see
> below for why one handle per device is not enough).

Are the input queues are the only place where this is needed?

And BTW, I'm puzzled with regard to something you wrote earlier:

> > ... I can see how relying on devices alone
> > wouldn't give enough information.  There could be two separate programs
> > both reading input events, and you wouldn't know which one was
> > responsible for failing to drain its queue.
> >
> We don't know which one now either since we don't give them unique
> names, but we know that one of them is not reading (and it is usually
> not the normal input path).

If you don't keep track of the input queues by name, why do you need a
separate wakeup structure for each queue?

> > It won't have a timer, if
> > we can get away with leaving one out.
> 
> The suspend_blocker structure does not have a timer, it has a timeout.
> This was intentional, since it avoids unnecessary timers from firing
> but it also means it takes a lot less space than the stats. We do need
> the timeout in the structure to support some of our drivers since they
> mix wake_lock_timeout and wake_unlock.

A timeout is okay.

> > The device argument will become mandatory for pm_stay_awake, pm_relax,
> > and pm_wakeup_event, so that the counter fields can be updated
> > properly.
> >
> I think this is a step in the right direction, but it is not enough to
> support mixing timeouts and pm_relax in all the drivers.

We'll figure out the best approach.  You mentioned that when the
timeouts and pm_relax do get mixed, the timeout value is around 1 or 2
seconds.  For these cases, would it be so terrible just to use the
timeout alone and forget about pm_relax?

> Maintaining an out of mainline driver to export allow user space to
> block suspend seems like a more appealing solution than adding another
> process for everyone to depend on. Having a user space interface to
> block suspend in the mainline kernel would still be valuable, though.
> Like I mentioned in another message, this would allow some low level
> services, like the user-space battery monitor some devices have, to
> not be android specific.

Adding out-of-mainline drivers is of course always possible.  As for
your second point, I guess it's a matter of perspective.  You see the
kernel as always trying to suspend, with userspace needing to do
something special to keep it awake.  Rafael and I see it as userspace
occasionally asking the kernel to suspend, and the kernel delaying or
aborting if a wakeup event occurs.  This is because we believe that
suspends should be initiated by userspace, not by the kernel.  From our
point of view there doesn't have to be a user interface to block
suspends; you only need a way for one part of userspace to tell another
to stop requesting a suspend.

This could become non-Android-specific as well, if people decide on 
some standard way for system management programs to decide when the 
system should go to sleep.

> > In the end, the best answer may be to keep a count of pending wakeup
> > events in the new wakeup structure (as well as a global count of the
> > number of devices whose pending count is > 0).  Then pm_stay_awake
> > would increment the count, pm_relax would decrement it, and there could
> > be a variant of pm_stay_awake that would increment the count only if it
> > was currently equal to 0.
> >
> 
> If I correctly understand what you are suggesting, that does not work.
> If two subsystems use the same device an one of them expects the
> pm_stay_awake call to be nested and the other one does not, then the
> subsystem that uses the non-nested api is not preventing suspend if
> the other subsystem was preventing suspend when it called
> pm_stay_awake_if_not_zero (the non-nested pm_relax variant would cause
> similar problems).

Do you have any examples where two subsystems use the same device but
with differing expectations?  In the examples I can think of, the
non-nesting call is used only when a wakeup request is received (which
happens only when the device is at low power), so the nesting calls
can't interfere.

> Also, to support all our existing drivers we need
> multiple handles per device. For each input device we have a wakelock
> with a timeout per client. We only allow suspend when all the clients
> have emptied their queue or had their wakelock timeout.

We may end up attaching these wakeup structures not just to devices but
also to input queues (or other kernel->userspace communication
channels).

> > Bugs in the kernel can be tracked down and fixed.
> >
> 
> Yes, bugs can be fixed, but not all bugs are discovered quickly and
> products ship before some bugs are fixed. For instance, when applying
> our patch that holds a wakelock when the input queue is not empty to
> this weeks kernel, I noticed that someone had fixed a bug where a
> queue overflow could leave the queue empty. If you hit this bug with
> the wakelock with timeout implementation that we currently use, then
> it would cause suspend to be blocked for 5 seconds (less if another
> event occurred or the file descriptor was closed). With the patch that
> uses a suspend blocker without a timeout, it would block suspend until
> another event from the same device occurred or user space closes the
> device. With a nesting api like the current pm_stay_awake/pm_relax, it
> will prevent suspend forever (unless another driver has the opposite
> bug).
> 
> Do we need the nested api for something? We need a non-nested api to
> mix timeouts and pm_relax, and unless we need the nested variants, it
> is easier to just make all the calls non-nesting.

Here's an example where nested pm_stay_awake calls are needed.  
Suppose a USB root hub is enabled for wakeup (in other words, plugging
or unplugging a USB device into the computer should wake the system
up).  Suppose the root hub is at low power (autosuspended) when a
wakeup event occurs.  The interrupt handler calls pm_stay_awake and
indirectly invokes the USB runtime-resume routine.  This routine
brings the root hub back to full power, informs the hub driver, and
then calls pm_relax.

But the hub driver carries out its main work in a separate kernel
thread (i.e., khubd).  So the hub_resume routine has to call
pm_stay_awake, and then khubd calls pm_relax when it is finished
processing the root hub's event.  Thus we end up with:

	usb_runtime_resume:	call pm_stay_awake (1)
				call hub_resume
		hub_resume:		call pm_stay_awake (2)
	usb_runtime_resume:	call pm_relax (balances 1)
	khubd:			call pm_relax (balances 2)

As you can see, this won't work right if the pm wakeup calls aren't
cumulative.  ("Nested" isn't quite the right word.)  The system would
be allowed to sleep before khubd could process the root hub's wakeup
event.

Alan Stern

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Wakeup-events implementation
  2010-08-16 19:22     ` Alan Stern
@ 2010-08-17  0:30       ` Arve Hjønnevåg
  2010-08-17 15:53         ` Alan Stern
  0 siblings, 1 reply; 18+ messages in thread
From: Arve Hjønnevåg @ 2010-08-17  0:30 UTC (permalink / raw)
  To: Alan Stern; +Cc: Linux-pm mailing list

2010/8/16 Alan Stern <stern@rowland.harvard.edu>:
> On Fri, 13 Aug 2010, Arve Hjønnevåg wrote:
>
>> > Rafael and I spent some time discussing all this during LinuxCon.
>> > Combining what he thought with my own ideas led us to decide that the
>> > wakeup information should be stored in a separate structure, not in
>> > dev_pm_info.  These structures will be allocated dynamically when a
>> > device is enabled for remote wakeup (and presumably deallocated if a
>> > device is disabled for remote wakeup).  A pointer to the new structure
>> > will be stored in dev_pm_info.  We expect that most devices won't need
>> > a wakeup structure, because most devices aren't wakeup sources.
>> >
>> > This new structure (as yet unnamed) will be a lot like your
>> > suspend_blocker structure, but with some differences.  It won't need a
>> > name, since the device already has a name.
>>
>> A name is still useful if we allocate multiple handles per device (see
>> below for why one handle per device is not enough).
>
> Are the input queues are the only place where this is needed?
>

I don't know if it is the only place that is using more than one
wakelock with a timeout per device. There are other drivers that use
more than one wakelock, but it is possible they could use a single
nested lock instead (with some detail loss in the the stats). I'm not
sure every wakelock can be associated with a device though. For
instance the mmc core wakelock does not have a corresponding device.

> And BTW, I'm puzzled with regard to something you wrote earlier:
>
>> > ... I can see how relying on devices alone
>> > wouldn't give enough information.  There could be two separate programs
>> > both reading input events, and you wouldn't know which one was
>> > responsible for failing to drain its queue.
>> >
>> We don't know which one now either since we don't give them unique
>> names, but we know that one of them is not reading (and it is usually
>> not the normal input path).
>
> If you don't keep track of the input queues by name, why do you need a
> separate wakeup structure for each queue?
>

Primarily for the timeouts, but the individual stats are still useful
even if they don't name the client.

>> > It won't have a timer, if
>> > we can get away with leaving one out.
>>
>> The suspend_blocker structure does not have a timer, it has a timeout.
>> This was intentional, since it avoids unnecessary timers from firing
>> but it also means it takes a lot less space than the stats. We do need
>> the timeout in the structure to support some of our drivers since they
>> mix wake_lock_timeout and wake_unlock.
>
> A timeout is okay.
>
>> > The device argument will become mandatory for pm_stay_awake, pm_relax,
>> > and pm_wakeup_event, so that the counter fields can be updated
>> > properly.
>> >
>> I think this is a step in the right direction, but it is not enough to
>> support mixing timeouts and pm_relax in all the drivers.
>
> We'll figure out the best approach.  You mentioned that when the
> timeouts and pm_relax do get mixed, the timeout value is around 1 or 2
> seconds.  For these cases, would it be so terrible just to use the
> timeout alone and forget about pm_relax?
>

The timeout used when mixing timeouts and wake_unlock are generally
longer than when only using only timeouts since the timeouts are used
to cover edge cases were wake_unlock did not get called. Allowing
timeouts and pm_relax to be mixed may allow a gradual path from using
timeouts to block suspend only when needed (timeouts only -> timeouts
for edge cases -> no timeouts).

We have also had bugs, where sensors generate input events whenever
the system is not suspended. The 5 second timeout we have on input
events is not short enough to allow the system to suspend at all in
this case.

>> Maintaining an out of mainline driver to export allow user space to
>> block suspend seems like a more appealing solution than adding another
>> process for everyone to depend on. Having a user space interface to
>> block suspend in the mainline kernel would still be valuable, though.
>> Like I mentioned in another message, this would allow some low level
>> services, like the user-space battery monitor some devices have, to
>> not be android specific.
>
> Adding out-of-mainline drivers is of course always possible.  As for
> your second point, I guess it's a matter of perspective.  You see the
> kernel as always trying to suspend, with userspace needing to do
> something special to keep it awake.

I see the kernel as trying to suspend if user space asked it to
suspend (until it user space asks the kernel to stop suspending).
Wakeup events that don't reach user space should not need user space
support to reenter suspend.

> Rafael and I see it as userspace
> occasionally asking the kernel to suspend, and the kernel delaying or
> aborting if a wakeup event occurs.  This is because we believe that
> suspends should be initiated by userspace, not by the kernel.

You want each suspend attempt to be initiated by user space. Suspend
is initiated from user space in both cases, the difference is that you
want the suspend request to be one-shot, which means that the kernel
may cancel the user space suspend request, where I want user space to
explicitly cancel the request.

>  From our
> point of view there doesn't have to be a user interface to block
> suspends; you only need a way for one part of userspace to tell another
> to stop requesting a suspend.
>

The user space interface to block suspend is not all that relevant
here.  Both suspend interfaces can be used with or without a user
space interface to block suspend.

> This could become non-Android-specific as well, if people decide on
> some standard way for system management programs to decide when the
> system should go to sleep.
>
>> > In the end, the best answer may be to keep a count of pending wakeup
>> > events in the new wakeup structure (as well as a global count of the
>> > number of devices whose pending count is > 0).  Then pm_stay_awake
>> > would increment the count, pm_relax would decrement it, and there could
>> > be a variant of pm_stay_awake that would increment the count only if it
>> > was currently equal to 0.
>> >
>>
>> If I correctly understand what you are suggesting, that does not work.
>> If two subsystems use the same device an one of them expects the
>> pm_stay_awake call to be nested and the other one does not, then the
>> subsystem that uses the non-nested api is not preventing suspend if
>> the other subsystem was preventing suspend when it called
>> pm_stay_awake_if_not_zero (the non-nested pm_relax variant would cause
>> similar problems).
>
> Do you have any examples where two subsystems use the same device but
> with differing expectations?  In the examples I can think of, the
> non-nesting call is used only when a wakeup request is received (which
> happens only when the device is at low power), so the nesting calls
> can't interfere.
>

The wakeup request need to block suspend until the driver has seen the
event. If the driver and the wakeup code uses the same handle how can
you avoid interference?

>> Also, to support all our existing drivers we need
>> multiple handles per device. For each input device we have a wakelock
>> with a timeout per client. We only allow suspend when all the clients
>> have emptied their queue or had their wakelock timeout.
>
> We may end up attaching these wakeup structures not just to devices but
> also to input queues (or other kernel->userspace communication
> channels).
>

That is we do now (with wakelocks), but I think we need to pass this
handle to the pm_stay_wake... calls, and not the device, for this to
work.

>> > Bugs in the kernel can be tracked down and fixed.
>> >
>>
>> Yes, bugs can be fixed, but not all bugs are discovered quickly and
>> products ship before some bugs are fixed. For instance, when applying
>> our patch that holds a wakelock when the input queue is not empty to
>> this weeks kernel, I noticed that someone had fixed a bug where a
>> queue overflow could leave the queue empty. If you hit this bug with
>> the wakelock with timeout implementation that we currently use, then
>> it would cause suspend to be blocked for 5 seconds (less if another
>> event occurred or the file descriptor was closed). With the patch that
>> uses a suspend blocker without a timeout, it would block suspend until
>> another event from the same device occurred or user space closes the
>> device. With a nesting api like the current pm_stay_awake/pm_relax, it
>> will prevent suspend forever (unless another driver has the opposite
>> bug).
>>
>> Do we need the nested api for something? We need a non-nested api to
>> mix timeouts and pm_relax, and unless we need the nested variants, it
>> is easier to just make all the calls non-nesting.
>
> Here's an example where nested pm_stay_awake calls are needed.
> Suppose a USB root hub is enabled for wakeup (in other words, plugging
> or unplugging a USB device into the computer should wake the system
> up).  Suppose the root hub is at low power (autosuspended) when a
> wakeup event occurs.  The interrupt handler calls pm_stay_awake and
> indirectly invokes the USB runtime-resume routine.  This routine
> brings the root hub back to full power, informs the hub driver, and
> then calls pm_relax.
>
> But the hub driver carries out its main work in a separate kernel
> thread (i.e., khubd).  So the hub_resume routine has to call
> pm_stay_awake, and then khubd calls pm_relax when it is finished
> processing the root hub's event.  Thus we end up with:
>
>        usb_runtime_resume:     call pm_stay_awake (1)
>                                call hub_resume
>                hub_resume:             call pm_stay_awake (2)
>        usb_runtime_resume:     call pm_relax (balances 1)
>        khubd:                  call pm_relax (balances 2)
>
> As you can see, this won't work right if the pm wakeup calls aren't
> cumulative.  ("Nested" isn't quite the right word.)  The system would
> be allowed to sleep before khubd could process the root hub's wakeup
> event.
>

Too me this is an example of where two handles are needed, not where
nesting is needed. Yes, you can use a single nested handle here, but
the stats will not tell you which driver blocked suspend if there is a
problem.

-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Wakeup-events implementation
  2010-08-17  0:30       ` Arve Hjønnevåg
@ 2010-08-17 15:53         ` Alan Stern
  2010-08-17 18:37           ` David Brownell
                             ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Alan Stern @ 2010-08-17 15:53 UTC (permalink / raw)
  To: Arve Hjønnevåg; +Cc: Linux-pm mailing list

On Mon, 16 Aug 2010, Arve Hjønnevåg wrote:

> >> A name is still useful if we allocate multiple handles per device (see
> >> below for why one handle per device is not enough).
> >
> > Are the input queues are the only place where this is needed?
> >
> 
> I don't know if it is the only place that is using more than one
> wakelock with a timeout per device. There are other drivers that use
> more than one wakelock, but it is possible they could use a single
> nested lock instead (with some detail loss in the the stats). I'm not
> sure every wakelock can be associated with a device though. For
> instance the mmc core wakelock does not have a corresponding device.

What other drivers need more than one wakelock per device?

For that matter, why does the MMC core need wakelocks at all?  I wasn't
aware that memory cards could generate wakeup events.

> > We'll figure out the best approach.  You mentioned that when the
> > timeouts and pm_relax do get mixed, the timeout value is around 1 or 2
> > seconds.  For these cases, would it be so terrible just to use the
> > timeout alone and forget about pm_relax?
> >
> 
> The timeout used when mixing timeouts and wake_unlock are generally
> longer than when only using only timeouts since the timeouts are used
> to cover edge cases were wake_unlock did not get called. Allowing
> timeouts and pm_relax to be mixed may allow a gradual path from using
> timeouts to block suspend only when needed (timeouts only -> timeouts
> for edge cases -> no timeouts).

In fact, I rather expected you would say that the combined
cancellations/timeouts were useful for pinpointing applications that
don't read their input events quickly enough.  If the wakelocks were
always allowed to time out instead of being cancelled when the data was
read, you wouldn't be able to tell whether the data was read in a
timely manner or not.

> We have also had bugs, where sensors generate input events whenever
> the system is not suspended. The 5 second timeout we have on input
> events is not short enough to allow the system to suspend at all in
> this case.

This sounds like the sort of bug that would be found very quickly and 
easily if you didn't have active cancellation.  It would be pretty 
obvious when the system didn't suspend at all.

> I see the kernel as trying to suspend if user space asked it to
> suspend (until it user space asks the kernel to stop suspending).
> Wakeup events that don't reach user space should not need user space
> support to reenter suspend.

That's debatable.  For example, consider a Wake-On-LAN magic packet.  
It generally doesn't go anywhere -- it gets dropped once the system is 
awake.

> You want each suspend attempt to be initiated by user space. Suspend
> is initiated from user space in both cases, the difference is that you
> want the suspend request to be one-shot, which means that the kernel
> may cancel the user space suspend request, where I want user space to
> explicitly cancel the request.

Yes, that's a fair way to describe the situation.

> > Do you have any examples where two subsystems use the same device but
> > with differing expectations?  In the examples I can think of, the
> > non-nesting call is used only when a wakeup request is received (which
> > happens only when the device is at low power), so the nesting calls
> > can't interfere.
> >
> 
> The wakeup request need to block suspend until the driver has seen the
> event. If the driver and the wakeup code uses the same handle how can
> you avoid interference?

Let's say the interrupt handlers responsible for invoking
pm_request_resume use non-nesting calls.  This may happen several
times, depending on the wakeup and interrupt delivery mechanism, but
the end result is that the counter is incremented only once.  When the
pm_runtime_resume call ends, pm_relax does a single decrement.

Meanwhile, the driver uses nesting calls.  It can increment the counter
as much as it wants, and it has to perform an equal number of
decrements.  The key point is that pm_request_resume never gets called
while the driver has any outstanding pm_stay_awake requests.  Hence the 
non-nesting calls never got "lost", since the first one is always made 
while the counter is 0.

> > We may end up attaching these wakeup structures not just to devices but
> > also to input queues (or other kernel->userspace communication
> > channels).
> >
> 
> That is we do now (with wakelocks), but I think we need to pass this
> handle to the pm_stay_wake... calls, and not the device, for this to
> work.

You may be right.  Or we may need additional calls taking the handle as
an argument rather than the device.  The design requirements are still
simmering in my mind, and I'm waiting to hear what Rafael thinks.

> > Here's an example where nested pm_stay_awake calls are needed.
> > Suppose a USB root hub is enabled for wakeup (in other words, plugging
> > or unplugging a USB device into the computer should wake the system
> > up).  Suppose the root hub is at low power (autosuspended) when a
> > wakeup event occurs.  The interrupt handler calls pm_stay_awake and
> > indirectly invokes the USB runtime-resume routine.  This routine
> > brings the root hub back to full power, informs the hub driver, and
> > then calls pm_relax.
> >
> > But the hub driver carries out its main work in a separate kernel
> > thread (i.e., khubd).  So the hub_resume routine has to call
> > pm_stay_awake, and then khubd calls pm_relax when it is finished
> > processing the root hub's event.  Thus we end up with:
> >
> >        usb_runtime_resume:     call pm_stay_awake (1)
> >                                call hub_resume
> >                hub_resume:             call pm_stay_awake (2)
> >        usb_runtime_resume:     call pm_relax (balances 1)
> >        khubd:                  call pm_relax (balances 2)
> >
> > As you can see, this won't work right if the pm wakeup calls aren't
> > cumulative.  ("Nested" isn't quite the right word.)  The system would
> > be allowed to sleep before khubd could process the root hub's wakeup
> > event.
> >
> 
> Too me this is an example of where two handles are needed, not where
> nesting is needed. Yes, you can use a single nested handle here, but
> the stats will not tell you which driver blocked suspend if there is a
> problem.

It sounds like you're suggesting that handles should be allocated 
per-driver rather than per-device.  That would tell you which driver 
blocked suspend, but not which device.

This raises a few other questions.

With Android-based handsets, you're in a setting where you have a
pretty good idea of what all the wakeup sources are and how they might
get used.  On a general system this isn't so.  Many devices are
potential wakeup sources, even though only a few of them end up getting
enabled.  In the example above, I said "Suppose a USB root hub is
enabled for wakeup".  But usually people won't want to do this; they
don't want their suspended laptops to wake up when a USB mouse is
unplugged or plugged in.

Allocating these wakeup structures for all possible wakeup sources
would be a big waste.  On the other hand, a driver generally doesn't
know until a suspend starts whether or not its devices should be
enabled for wakeup.  That makes things difficult, especially if you
contemplate having multiple structures for each device.  Have you
really sat down and thought about what wakelocks would be needed on a
general system such as a desktop or laptop machine?

Also, I frankly have to wonder why you demand such a large amount of
detailed debugging about the kernel's wakeup/wakelock subsystem (as
opposed to userspace wakelocks).  Yes, it's clear that if something
goes wrong then you will drain your phone's battery in a few hours
rather than a few days.  But what if something goes wrong with the VM
subsystem, or the RF subsystem, or the keypad driver?  The device would
crash immediately or at best become almost totally unusable.  These are
much worse scenarios than losing some battery life -- have you added
comparable quantities of debugging to those other parts of the kernel?

In addition, you retain all these debugging facilities in your
production builds, not just in the debug kernels.  Doesn't this seem
excessive?

Alan Stern

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Wakeup-events implementation
  2010-08-17 15:53         ` Alan Stern
@ 2010-08-17 18:37           ` David Brownell
  2010-08-17 19:19             ` Alan Stern
  2010-08-17 19:21           ` Rafael J. Wysocki
  2010-08-18  7:04           ` Wakeup-events implementation Florian Mickler
  2 siblings, 1 reply; 18+ messages in thread
From: David Brownell @ 2010-08-17 18:37 UTC (permalink / raw)
  To: Arve Hjønnevåg, Alan Stern; +Cc: Linux-pm mailing list

--- On Tue, 8/17/10, Alan Stern <stern@rowland.harvard.edu> wrote:

I wasn't
> aware that memory cards could generate wakeup events.

Almost all MMC/SD hardware can generate reliable
insert/remove events, in my experience.  So MMC/SD
drivers I've modified tend to treat them as wake
events ... userspace can respond without polling
constantly "did it change yet, huh, did it???"

Unfortunately Pierre spent a lot of time hard-wiring
UNSAFE_RESUME assumptions into the MMC core, so lots
of that stuff is currently (and needlessly) broken.

And of more experimental interest, something I don't
recall being discussed here:  I/O completions as
wake events.  Start an I/O, enter low power state,
and wake when it completes.  Easily done on lots of
embedded hardware..

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Wakeup-events implementation
  2010-08-17 18:37           ` David Brownell
@ 2010-08-17 19:19             ` Alan Stern
  0 siblings, 0 replies; 18+ messages in thread
From: Alan Stern @ 2010-08-17 19:19 UTC (permalink / raw)
  To: David Brownell; +Cc: Linux-pm mailing list

On Tue, 17 Aug 2010, David Brownell wrote:

> --- On Tue, 8/17/10, Alan Stern <stern@rowland.harvard.edu> wrote:
> 
> I wasn't
> > aware that memory cards could generate wakeup events.
> 
> Almost all MMC/SD hardware can generate reliable
> insert/remove events, in my experience.  So MMC/SD
> drivers I've modified tend to treat them as wake
> events ... userspace can respond without polling
> constantly "did it change yet, huh, did it???"

This is interesting in light of the fact that testing has shown many
USB card readers don't track insert/remove events while in low-power
mode.  When they return to full power, if a card is present they always
report a media change -- even if the card had been there all along.  
This makes autosuspend less useful than it might be.

> Unfortunately Pierre spent a lot of time hard-wiring
> UNSAFE_RESUME assumptions into the MMC core, so lots
> of that stuff is currently (and needlessly) broken.

There's a difference between generating wakeup events for insert/remove
and keeping track of media-change events.  I wouldn't want my sleeping
laptop to wake up when I insert or remove a memory card, but I would
like it to recognize when a card has been changed.  If the hardware can
do one but not the other then Pierre's choice is understandable
(although not in accord with Linus's preference).

Alan Stern

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Wakeup-events implementation
  2010-08-17 15:53         ` Alan Stern
  2010-08-17 18:37           ` David Brownell
@ 2010-08-17 19:21           ` Rafael J. Wysocki
  2010-08-18  0:40             ` Arve Hjønnevåg
  2010-08-18  7:04           ` Wakeup-events implementation Florian Mickler
  2 siblings, 1 reply; 18+ messages in thread
From: Rafael J. Wysocki @ 2010-08-17 19:21 UTC (permalink / raw)
  To: Alan Stern; +Cc: Linux-pm mailing list

On Tuesday, August 17, 2010, Alan Stern wrote:
> On Mon, 16 Aug 2010, Arve Hjønnevåg wrote:
> 
> > >> A name is still useful if we allocate multiple handles per device (see
> > >> below for why one handle per device is not enough).
> > >
> > > Are the input queues are the only place where this is needed?
> > >
> > 
> > I don't know if it is the only place that is using more than one
> > wakelock with a timeout per device. There are other drivers that use
> > more than one wakelock, but it is possible they could use a single
> > nested lock instead (with some detail loss in the the stats). I'm not
> > sure every wakelock can be associated with a device though. For
> > instance the mmc core wakelock does not have a corresponding device.
> 
> What other drivers need more than one wakelock per device?
> 
> For that matter, why does the MMC core need wakelocks at all?  I wasn't
> aware that memory cards could generate wakeup events.

Yes, that's interesting, although I _guess_ this is related to the events
generated by inserting/removing a card.

> > > We'll figure out the best approach.  You mentioned that when the
> > > timeouts and pm_relax do get mixed, the timeout value is around 1 or 2
> > > seconds.  For these cases, would it be so terrible just to use the
> > > timeout alone and forget about pm_relax?
> > >
> > 
> > The timeout used when mixing timeouts and wake_unlock are generally
> > longer than when only using only timeouts since the timeouts are used
> > to cover edge cases were wake_unlock did not get called. Allowing
> > timeouts and pm_relax to be mixed may allow a gradual path from using
> > timeouts to block suspend only when needed (timeouts only -> timeouts
> > for edge cases -> no timeouts).
> 
> In fact, I rather expected you would say that the combined
> cancellations/timeouts were useful for pinpointing applications that
> don't read their input events quickly enough.  If the wakelocks were
> always allowed to time out instead of being cancelled when the data was
> read, you wouldn't be able to tell whether the data was read in a
> timely manner or not.
> 
> > We have also had bugs, where sensors generate input events whenever
> > the system is not suspended. The 5 second timeout we have on input
> > events is not short enough to allow the system to suspend at all in
> > this case.
> 
> This sounds like the sort of bug that would be found very quickly and 
> easily if you didn't have active cancellation.  It would be pretty 
> obvious when the system didn't suspend at all.
> 
> > I see the kernel as trying to suspend if user space asked it to
> > suspend (until it user space asks the kernel to stop suspending).
> > Wakeup events that don't reach user space should not need user space
> > support to reenter suspend.
> 
> That's debatable.  For example, consider a Wake-On-LAN magic packet.  
> It generally doesn't go anywhere -- it gets dropped once the system is 
> awake.

Yes.  I think every event that would wake up a sleeping system should result
in unfreezing user space.

> > You want each suspend attempt to be initiated by user space. Suspend
> > is initiated from user space in both cases,

Not really.  With wakelocks suspend may be initiated as a result of a driver
dropping its wakelock, which doesn't qualify as "from user space" to me.

> > the difference is that you want the suspend request to be one-shot, which
> > means that the kernel may cancel the user space suspend request, where I
> > want user space to explicitly cancel the request.
> 
> Yes, that's a fair way to describe the situation.

I somewhat agree.

> > > Do you have any examples where two subsystems use the same device but
> > > with differing expectations?  In the examples I can think of, the
> > > non-nesting call is used only when a wakeup request is received (which
> > > happens only when the device is at low power), so the nesting calls
> > > can't interfere.
> > >
> > 
> > The wakeup request need to block suspend until the driver has seen the
> > event. If the driver and the wakeup code uses the same handle how can
> > you avoid interference?
> 
> Let's say the interrupt handlers responsible for invoking
> pm_request_resume use non-nesting calls.  This may happen several
> times, depending on the wakeup and interrupt delivery mechanism, but
> the end result is that the counter is incremented only once.  When the
> pm_runtime_resume call ends, pm_relax does a single decrement.
> 
> Meanwhile, the driver uses nesting calls.  It can increment the counter
> as much as it wants, and it has to perform an equal number of
> decrements.  The key point is that pm_request_resume never gets called
> while the driver has any outstanding pm_stay_awake requests.  Hence the 
> non-nesting calls never got "lost", since the first one is always made 
> while the counter is 0.
> 
> > > We may end up attaching these wakeup structures not just to devices but
> > > also to input queues (or other kernel->userspace communication
> > > channels).
> > >
> > 
> > That is we do now (with wakelocks), but I think we need to pass this
> > handle to the pm_stay_wake... calls, and not the device, for this to
> > work.
> 
> You may be right.  Or we may need additional calls taking the handle as
> an argument rather than the device.  The design requirements are still
> simmering in my mind, and I'm waiting to hear what Rafael thinks.

I actually like the idea of having two levels of calls, ie. pm_stay_awake(dev)
calling something like __pm_stay_awake(handle), where the handle is read from
struct dev_pm_info.  And analogously for pm_relax().

Then, the object associated with the handle might be created when the device
is enabled to wake up and destroyed when it's disabled to do that.

OTOH, if it's really necessary to use a "raw" handle (ie. without struct
device), the caller of __pm_stay_awake() will allocate and free the
"wakelock" object by itself as necessary.

As far as the names are concerned, I'd like to avoid allocating space for them
whenever it is not necessary (like when the "wakelock" is pointed to by struct
dev_pm_info, in which case it is simply sufficient to use the device name to
refer to it).

> > > Here's an example where nested pm_stay_awake calls are needed.
> > > Suppose a USB root hub is enabled for wakeup (in other words, plugging
> > > or unplugging a USB device into the computer should wake the system
> > > up).  Suppose the root hub is at low power (autosuspended) when a
> > > wakeup event occurs.  The interrupt handler calls pm_stay_awake and
> > > indirectly invokes the USB runtime-resume routine.  This routine
> > > brings the root hub back to full power, informs the hub driver, and
> > > then calls pm_relax.
> > >
> > > But the hub driver carries out its main work in a separate kernel
> > > thread (i.e., khubd).  So the hub_resume routine has to call
> > > pm_stay_awake, and then khubd calls pm_relax when it is finished
> > > processing the root hub's event.  Thus we end up with:
> > >
> > >        usb_runtime_resume:     call pm_stay_awake (1)
> > >                                call hub_resume
> > >                hub_resume:             call pm_stay_awake (2)
> > >        usb_runtime_resume:     call pm_relax (balances 1)
> > >        khubd:                  call pm_relax (balances 2)
> > >
> > > As you can see, this won't work right if the pm wakeup calls aren't
> > > cumulative.  ("Nested" isn't quite the right word.)  The system would
> > > be allowed to sleep before khubd could process the root hub's wakeup
> > > event.
> > >
> > 
> > Too me this is an example of where two handles are needed, not where
> > nesting is needed. Yes, you can use a single nested handle here, but
> > the stats will not tell you which driver blocked suspend if there is a
> > problem.
> 
> It sounds like you're suggesting that handles should be allocated 
> per-driver rather than per-device.  That would tell you which driver 
> blocked suspend, but not which device.
> 
> This raises a few other questions.
> 
> With Android-based handsets, you're in a setting where you have a
> pretty good idea of what all the wakeup sources are and how they might
> get used.  On a general system this isn't so.  Many devices are
> potential wakeup sources, even though only a few of them end up getting
> enabled.  In the example above, I said "Suppose a USB root hub is
> enabled for wakeup".  But usually people won't want to do this; they
> don't want their suspended laptops to wake up when a USB mouse is
> unplugged or plugged in.
> 
> Allocating these wakeup structures for all possible wakeup sources
> would be a big waste.

Indeed.

> On the other hand, a driver generally doesn't
> know until a suspend starts whether or not its devices should be
> enabled for wakeup.  That makes things difficult, especially if you
> contemplate having multiple structures for each device.  Have you
> really sat down and thought about what wakelocks would be needed on a
> general system such as a desktop or laptop machine?
> 
> Also, I frankly have to wonder why you demand such a large amount of
> detailed debugging about the kernel's wakeup/wakelock subsystem (as
> opposed to userspace wakelocks).

That also is my question.

> Yes, it's clear that if something
> goes wrong then you will drain your phone's battery in a few hours
> rather than a few days.  But what if something goes wrong with the VM
> subsystem, or the RF subsystem, or the keypad driver?  The device would
> crash immediately or at best become almost totally unusable.  These are
> much worse scenarios than losing some battery life -- have you added
> comparable quantities of debugging to those other parts of the kernel?
> 
> In addition, you retain all these debugging facilities in your
> production builds, not just in the debug kernels.  Doesn't this seem
> excessive?

It definitely seems so.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Wakeup-events implementation
  2010-08-17 19:21           ` Rafael J. Wysocki
@ 2010-08-18  0:40             ` Arve Hjønnevåg
  2010-08-18 10:32               ` Mark Brown
  2010-08-18 15:20               ` Alan Stern
  0 siblings, 2 replies; 18+ messages in thread
From: Arve Hjønnevåg @ 2010-08-18  0:40 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux-pm mailing list

2010/8/17 Rafael J. Wysocki <rjw@sisk.pl>:
> On Tuesday, August 17, 2010, Alan Stern wrote:
>> On Mon, 16 Aug 2010, Arve Hjønnevåg wrote:
>>
>> > >> A name is still useful if we allocate multiple handles per device (see
>> > >> below for why one handle per device is not enough).
>> > >
>> > > Are the input queues are the only place where this is needed?
>> > >
>> >
>> > I don't know if it is the only place that is using more than one
>> > wakelock with a timeout per device. There are other drivers that use
>> > more than one wakelock, but it is possible they could use a single
>> > nested lock instead (with some detail loss in the the stats). I'm not
>> > sure every wakelock can be associated with a device though. For
>> > instance the mmc core wakelock does not have a corresponding device.
>>
>> What other drivers need more than one wakelock per device?
>>
>> For that matter, why does the MMC core need wakelocks at all?  I wasn't
>> aware that memory cards could generate wakeup events.
>
> Yes, that's interesting, although I _guess_ this is related to the events
> generated by inserting/removing a card.
>

That is my guess too, but it is also possible sdio wakeup events could
also hit this code path.

>> > > We'll figure out the best approach.  You mentioned that when the
>> > > timeouts and pm_relax do get mixed, the timeout value is around 1 or 2
>> > > seconds.  For these cases, would it be so terrible just to use the
>> > > timeout alone and forget about pm_relax?
>> > >
>> >
>> > The timeout used when mixing timeouts and wake_unlock are generally
>> > longer than when only using only timeouts since the timeouts are used
>> > to cover edge cases were wake_unlock did not get called. Allowing
>> > timeouts and pm_relax to be mixed may allow a gradual path from using
>> > timeouts to block suspend only when needed (timeouts only -> timeouts
>> > for edge cases -> no timeouts).
>>
>> In fact, I rather expected you would say that the combined
>> cancellations/timeouts were useful for pinpointing applications that
>> don't read their input events quickly enough.  If the wakelocks were
>> always allowed to time out instead of being cancelled when the data was
>> read, you wouldn't be able to tell whether the data was read in a
>> timely manner or not.
>>
>> > We have also had bugs, where sensors generate input events whenever
>> > the system is not suspended. The 5 second timeout we have on input
>> > events is not short enough to allow the system to suspend at all in
>> > this case.
>>
>> This sounds like the sort of bug that would be found very quickly and
>> easily if you didn't have active cancellation.  It would be pretty
>> obvious when the system didn't suspend at all.
>>

We don't uncover all bugs during testing before we ship. For instance
the bug could be triggered by a 3db party app, and there is value in
making the system as robust as possible to both driver and user space
bugs.

>> > I see the kernel as trying to suspend if user space asked it to
>> > suspend (until it user space asks the kernel to stop suspending).
>> > Wakeup events that don't reach user space should not need user space
>> > support to reenter suspend.
>>
>> That's debatable.  For example, consider a Wake-On-LAN magic packet.
>> It generally doesn't go anywhere -- it gets dropped once the system is
>> awake.
>
> Yes.  I think every event that would wake up a sleeping system should result
> in unfreezing user space.
>

Returning all the way to user space for every wakeup event is a poor
substitute for delivering the actual event to user space. If user
space want to stay awake for 60 seconds after every wake-on-lan
packet, how can it accomplish this if it does not get the event. If
user space delays suspend for 60 seconds after every failed suspend
attempt, a wake-on-lan packet that arrives just before it decided to
suspend will be ignored.

>> > You want each suspend attempt to be initiated by user space. Suspend
>> > is initiated from user space in both cases,
>
> Not really.  With wakelocks suspend may be initiated as a result of a driver
> dropping its wakelock, which doesn't qualify as "from user space" to me.
>
Sure, but only if user-space had already requested suspend.

>> > the difference is that you want the suspend request to be one-shot, which
>> > means that the kernel may cancel the user space suspend request, where I
>> > want user space to explicitly cancel the request.
>>
>> Yes, that's a fair way to describe the situation.
>
> I somewhat agree.
>
>> > > Do you have any examples where two subsystems use the same device but
>> > > with differing expectations?  In the examples I can think of, the
>> > > non-nesting call is used only when a wakeup request is received (which
>> > > happens only when the device is at low power), so the nesting calls
>> > > can't interfere.
>> > >
>> >
>> > The wakeup request need to block suspend until the driver has seen the
>> > event. If the driver and the wakeup code uses the same handle how can
>> > you avoid interference?
>>
>> Let's say the interrupt handlers responsible for invoking
>> pm_request_resume use non-nesting calls.  This may happen several
>> times, depending on the wakeup and interrupt delivery mechanism, but
>> the end result is that the counter is incremented only once.  When the
>> pm_runtime_resume call ends, pm_relax does a single decrement.
>>
>> Meanwhile, the driver uses nesting calls.  It can increment the counter
>> as much as it wants, and it has to perform an equal number of
>> decrements.  The key point is that pm_request_resume never gets called
>> while the driver has any outstanding pm_stay_awake requests.  Hence the
>> non-nesting calls never got "lost", since the first one is always made
>> while the counter is 0.
>>

It sounds as if you are saying that the two drivers don't interfere
because one of them always gets there first. In your example if
another interrupt occurs before the second driver still has released
all its references on the counter, the interrupt will (incorrectly)
not increment the count.

>> > > We may end up attaching these wakeup structures not just to devices but
>> > > also to input queues (or other kernel->userspace communication
>> > > channels).
>> > >
>> >
>> > That is we do now (with wakelocks), but I think we need to pass this
>> > handle to the pm_stay_wake... calls, and not the device, for this to
>> > work.
>>
>> You may be right.  Or we may need additional calls taking the handle as
>> an argument rather than the device.  The design requirements are still
>> simmering in my mind, and I'm waiting to hear what Rafael thinks.
>
> I actually like the idea of having two levels of calls, ie. pm_stay_awake(dev)
> calling something like __pm_stay_awake(handle), where the handle is read from
> struct dev_pm_info.  And analogously for pm_relax().
>
> Then, the object associated with the handle might be created when the device
> is enabled to wake up and destroyed when it's disabled to do that.
>

When multiple drivers use the same device, who creates and destroys the handle?

> OTOH, if it's really necessary to use a "raw" handle (ie. without struct
> device), the caller of __pm_stay_awake() will allocate and free the
> "wakelock" object by itself as necessary.
>
> As far as the names are concerned, I'd like to avoid allocating space for them
> whenever it is not necessary (like when the "wakelock" is pointed to by struct
> dev_pm_info, in which case it is simply sufficient to use the device name to
> refer to it).
>
>> > > Here's an example where nested pm_stay_awake calls are needed.
>> > > Suppose a USB root hub is enabled for wakeup (in other words, plugging
>> > > or unplugging a USB device into the computer should wake the system
>> > > up).  Suppose the root hub is at low power (autosuspended) when a
>> > > wakeup event occurs.  The interrupt handler calls pm_stay_awake and
>> > > indirectly invokes the USB runtime-resume routine.  This routine
>> > > brings the root hub back to full power, informs the hub driver, and
>> > > then calls pm_relax.
>> > >
>> > > But the hub driver carries out its main work in a separate kernel
>> > > thread (i.e., khubd).  So the hub_resume routine has to call
>> > > pm_stay_awake, and then khubd calls pm_relax when it is finished
>> > > processing the root hub's event.  Thus we end up with:
>> > >
>> > >        usb_runtime_resume:     call pm_stay_awake (1)
>> > >                                call hub_resume
>> > >                hub_resume:             call pm_stay_awake (2)
>> > >        usb_runtime_resume:     call pm_relax (balances 1)
>> > >        khubd:                  call pm_relax (balances 2)
>> > >
>> > > As you can see, this won't work right if the pm wakeup calls aren't
>> > > cumulative.  ("Nested" isn't quite the right word.)  The system would
>> > > be allowed to sleep before khubd could process the root hub's wakeup
>> > > event.
>> > >
>> >
>> > Too me this is an example of where two handles are needed, not where
>> > nesting is needed. Yes, you can use a single nested handle here, but
>> > the stats will not tell you which driver blocked suspend if there is a
>> > problem.
>>
>> It sounds like you're suggesting that handles should be allocated
>> per-driver rather than per-device.  That would tell you which driver
>> blocked suspend, but not which device.
>>

Typically a driver puts a wakelock in the state that it already
allocates for each device, so you have a handle for each device/driver
combo that needs to block suspend.

>> This raises a few other questions.
>>
>> With Android-based handsets, you're in a setting where you have a
>> pretty good idea of what all the wakeup sources are and how they might
>> get used.  On a general system this isn't so.  Many devices are
>> potential wakeup sources, even though only a few of them end up getting
>> enabled.  In the example above, I said "Suppose a USB root hub is
>> enabled for wakeup".  But usually people won't want to do this; they
>> don't want their suspended laptops to wake up when a USB mouse is
>> unplugged or plugged in.
>>
>> Allocating these wakeup structures for all possible wakeup sources
>> would be a big waste.
>
> Indeed.
>
>> On the other hand, a driver generally doesn't
>> know until a suspend starts whether or not its devices should be
>> enabled for wakeup.

Why? How is the driver notified on suspend that it needs to be enabled
for wakeup.

>> That makes things difficult, especially if you
>> contemplate having multiple structures for each device.  Have you
>> really sat down and thought about what wakelocks would be needed on a
>> general system such as a desktop or laptop machine?

No, I don't know what wakelocks are needed on a general system, but
you need code to turn the wakeup events on and off events anyway, so I
assume allocating a handle for the wakeup event is possible.

>>
>> Also, I frankly have to wonder why you demand such a large amount of
>> detailed debugging about the kernel's wakeup/wakelock subsystem (as
>> opposed to userspace wakelocks).
>
> That also is my question.
>

It is not that different from what you get from /proc/timer_list,
/proc/meminfo etc. When the system does not suspend I want to know
why.

>> Yes, it's clear that if something
>> goes wrong then you will drain your phone's battery in a few hours
>> rather than a few days.  But what if something goes wrong with the VM
>> subsystem, or the RF subsystem, or the keypad driver?  The device would
>> crash immediately or at best become almost totally unusable.  These are
>> much worse scenarios than losing some battery life -- have you added
>> comparable quantities of debugging to those other parts of the kernel?
>>

We do track crashes and the kernel already dumps detailed debug
information in that case. Also, I don't agree that a kernel crash is
always worse than draining the battery. If you are away from a power
source and the kernel crashes, the device reboots and you have a
usable device again. If the battery drained, you cannot use the device
again until you get to a power source.

>> In addition, you retain all these debugging facilities in your
>> production builds, not just in the debug kernels.  Doesn't this seem
>> excessive?
>
> It definitely seems so.
>

There are two reasons for keeping the debugging facilities in production builds.
1. The debugging information is useful for debugging problems in the field.
2. We want to test with the same code we ship.


-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Wakeup-events implementation
  2010-08-18  0:40             ` Arve Hjønnevåg
@ 2010-08-18 10:32               ` Mark Brown
  2010-08-18 15:20               ` Alan Stern
  1 sibling, 0 replies; 18+ messages in thread
From: Mark Brown @ 2010-08-18 10:32 UTC (permalink / raw)
  To: Arve Hj?nnev?g; +Cc: Linux-pm mailing list

On Tue, Aug 17, 2010 at 05:40:37PM -0700, Arve Hj?nnev?g wrote:
> 2010/8/17 Rafael J. Wysocki <rjw@sisk.pl>:

> >> On the other hand, a driver generally doesn't
> >> know until a suspend starts whether or not its devices should be
> >> enabled for wakeup.

> Why? How is the driver notified on suspend that it needs to be enabled
> for wakeup.

There's the wakeup API in pm_wakeup.h, plus subsystem specific knowledge
about what sensible defaults for the hardware in much the same way as we
handle what to do when we suspend.

> >> In addition, you retain all these debugging facilities in your
> >> production builds, not just in the debug kernels. ?Doesn't this seem
> >> excessive?

> > It definitely seems so.

> There are two reasons for keeping the debugging facilities in production builds.
> 1. The debugging information is useful for debugging problems in the field.
> 2. We want to test with the same code we ship.

Plus the fact that power usage tends to be very dynamic often with
strong dependencies on usage patterns.  Power usage is something that
end users will want to debug as well as developers, though most likely
not in the same detail.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Wakeup-events implementation
  2010-08-18  0:40             ` Arve Hjønnevåg
  2010-08-18 10:32               ` Mark Brown
@ 2010-08-18 15:20               ` Alan Stern
  2010-08-18 19:17                 ` Rafael J. Wysocki
  1 sibling, 1 reply; 18+ messages in thread
From: Alan Stern @ 2010-08-18 15:20 UTC (permalink / raw)
  To: Arve Hjønnevåg; +Cc: Linux-pm mailing list

On Tue, 17 Aug 2010, Arve Hjønnevåg wrote:

> We don't uncover all bugs during testing before we ship. For instance
> the bug could be triggered by a 3db party app, and there is value in
> making the system as robust as possible to both driver and user space
> bugs.

By that argument, all in-kernel wakelocks should always have a timeout.

What happens if somebody puts down their phone and then puts something 
on top of it, in such a way that one of the keypad buttons is 
depressed?  If the keypad driver's wakelock doesn't have a timeout then 
the battery would drain quickly.

> >> > Wakeup events that don't reach user space should not need user space
> >> > support to reenter suspend.
> >>
> >> That's debatable.  For example, consider a Wake-On-LAN magic packet.
> >> It generally doesn't go anywhere -- it gets dropped once the system is
> >> awake.
> >
> > Yes.  I think every event that would wake up a sleeping system should result
> > in unfreezing user space.

In fact they do -- there's almost no way to avoid it.  However, 
unfreezing userspace is different from sending a wakeup event to 
userspace.

The question is whether the kernel should automatically try to
re-suspend following a wakeup for which no information was sent to
userspace (e.g., WOL).  The alternative, of course, is that the kernel
does not try to re-suspend until a power-manager program tells it to.  
To me this seems more flexible and general (the PM program can always 
start a new suspend, whereas there's no way to prevent an automatic 
in-kernel re-suspend).

> >> Let's say the interrupt handlers responsible for invoking
> >> pm_request_resume use non-nesting calls.  This may happen several
> >> times, depending on the wakeup and interrupt delivery mechanism, but
> >> the end result is that the counter is incremented only once.  When the
> >> pm_runtime_resume call ends, pm_relax does a single decrement.
> >>
> >> Meanwhile, the driver uses nesting calls.  It can increment the counter
> >> as much as it wants, and it has to perform an equal number of
> >> decrements.  The key point is that pm_request_resume never gets called
> >> while the driver has any outstanding pm_stay_awake requests.  Hence the
> >> non-nesting calls never got "lost", since the first one is always made
> >> while the counter is 0.
> >>
> 
> It sounds as if you are saying that the two drivers don't interfere
> because one of them always gets there first. In your example if
> another interrupt occurs before the second driver still has released
> all its references on the counter, the interrupt will (incorrectly)
> not increment the count.

That's right -- one of them always gets there first.  You're also right
to suspect that this is an unusual situation.  It arises with USB root
hubs because a root hub is the only child of its parent controller
device, and the only reason a controller would generate a wakeup
request is because the root hub needs to wake up.

This means that a root-hub wakeup can be signalled in two different 
ways:

	If the controller is in low-power then the controller sends its
	wakeup request.  The controller's driver sets the controller
	back to full power, sees that the root hub has needs attention,
	and calls pm_request_resume for the root hub.

	If the controller isn't in low-power then the root hub simply
	generates an IRQ.  The interrupt handler calls 
	pm_request_resume for the root hub.

Depending on how the timing works out, both can occur: The controller
gets set to full power and then the IRQ line is signalled before the PM
workqueue gets around to resuming the root hub.  I have seen this
happen while testing my computer.

With a little care, I could avoid calling pm_stay_awake twice.  The 
code already sets a bitflag before queuing the resume request, and I 
could use test_and_set_bit instead.  If I did then the non-nesting 
calls wouldn't be needed at all.  And that was the only case I could
think of where the two types of calls might need to be mixed.

> > I actually like the idea of having two levels of calls, ie. pm_stay_awake(dev)
> > calling something like __pm_stay_awake(handle), where the handle is read from
> > struct dev_pm_info.  And analogously for pm_relax().
> >
> > Then, the object associated with the handle might be created when the device
> > is enabled to wake up and destroyed when it's disabled to do that.
> >
> 
> When multiple drivers use the same device, who creates and destroys the handle?

The primary driver (i.e., the one bound to the device) would use the
device's handle.  Other drivers would create and use their own separate
wakeup structures.

But if all the calls are of the "nesting" sort then there's not really
any need for multiple structures, other than to help identify which
driver is misbehaving when a problem occurs.

> Typically a driver puts a wakelock in the state that it already
> allocates for each device, so you have a handle for each device/driver
> combo that needs to block suspend.

You didn't answer my question about which devices have more than one 
associated wakelock in Android.  Those would be the only cases where 
the device/driver combo matters.

Also, why does Android's MMC core have its own wakelock instead of
using a per-device wakelock?

> >> Allocating these wakeup structures for all possible wakeup sources
> >> would be a big waste.
> >
> > Indeed.
> >
> >> On the other hand, a driver generally doesn't
> >> know until a suspend starts whether or not its devices should be
> >> enabled for wakeup.
> 
> Why? How is the driver notified on suspend that it needs to be enabled
> for wakeup.

The driver's (or bus subsystem's) suspend routine calls
device_may_wakeup(dev) to see whether it should enable dev's wakeup
mechanism.  The value returned by device_may_wakeup can change at
almost any time, since it is set by userspace.

> >> That makes things difficult, especially if you
> >> contemplate having multiple structures for each device.  Have you
> >> really sat down and thought about what wakelocks would be needed on a
> >> general system such as a desktop or laptop machine?
> 
> No, I don't know what wakelocks are needed on a general system, but
> you need code to turn the wakeup events on and off events anyway, so I
> assume allocating a handle for the wakeup event is possible.

Yes.  However system suspend is generally not a good time to go around 
allocating new structures; memory tends to be tight then.  I suppose 
it might be acceptable to do this during the "prepare" phase of system 
suspend.

Alan Stern

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Wakeup-events implementation
  2010-08-18 15:20               ` Alan Stern
@ 2010-08-18 19:17                 ` Rafael J. Wysocki
  2010-08-19  0:18                   ` Arve Hjønnevåg
  0 siblings, 1 reply; 18+ messages in thread
From: Rafael J. Wysocki @ 2010-08-18 19:17 UTC (permalink / raw)
  To: Alan Stern; +Cc: Linux-pm mailing list

On Wednesday, August 18, 2010, Alan Stern wrote:
> On Tue, 17 Aug 2010, Arve Hjønnevåg wrote:
> 
> > We don't uncover all bugs during testing before we ship. For instance
> > the bug could be triggered by a 3db party app, and there is value in
> > making the system as robust as possible to both driver and user space
> > bugs.
> 
> By that argument, all in-kernel wakelocks should always have a timeout.
> 
> What happens if somebody puts down their phone and then puts something 
> on top of it, in such a way that one of the keypad buttons is 
> depressed?  If the keypad driver's wakelock doesn't have a timeout then 
> the battery would drain quickly.
> 
> > >> > Wakeup events that don't reach user space should not need user space
> > >> > support to reenter suspend.
> > >>
> > >> That's debatable.  For example, consider a Wake-On-LAN magic packet.
> > >> It generally doesn't go anywhere -- it gets dropped once the system is
> > >> awake.
> > >
> > > Yes.  I think every event that would wake up a sleeping system should result
> > > in unfreezing user space.
> 
> In fact they do -- there's almost no way to avoid it.  However, 
> unfreezing userspace is different from sending a wakeup event to 
> userspace.
> 
> The question is whether the kernel should automatically try to
> re-suspend following a wakeup for which no information was sent to
> userspace (e.g., WOL).  The alternative, of course, is that the kernel
> does not try to re-suspend until a power-manager program tells it to.  
> To me this seems more flexible and general (the PM program can always 
> start a new suspend, whereas there's no way to prevent an automatic 
> in-kernel re-suspend).

I agree.

> > >> Let's say the interrupt handlers responsible for invoking
> > >> pm_request_resume use non-nesting calls.  This may happen several
> > >> times, depending on the wakeup and interrupt delivery mechanism, but
> > >> the end result is that the counter is incremented only once.  When the
> > >> pm_runtime_resume call ends, pm_relax does a single decrement.
> > >>
> > >> Meanwhile, the driver uses nesting calls.  It can increment the counter
> > >> as much as it wants, and it has to perform an equal number of
> > >> decrements.  The key point is that pm_request_resume never gets called
> > >> while the driver has any outstanding pm_stay_awake requests.  Hence the
> > >> non-nesting calls never got "lost", since the first one is always made
> > >> while the counter is 0.
> > >>
> > 
> > It sounds as if you are saying that the two drivers don't interfere
> > because one of them always gets there first. In your example if
> > another interrupt occurs before the second driver still has released
> > all its references on the counter, the interrupt will (incorrectly)
> > not increment the count.
> 
> That's right -- one of them always gets there first.  You're also right
> to suspect that this is an unusual situation.  It arises with USB root
> hubs because a root hub is the only child of its parent controller
> device, and the only reason a controller would generate a wakeup
> request is because the root hub needs to wake up.
> 
> This means that a root-hub wakeup can be signalled in two different 
> ways:
> 
> 	If the controller is in low-power then the controller sends its
> 	wakeup request.  The controller's driver sets the controller
> 	back to full power, sees that the root hub has needs attention,
> 	and calls pm_request_resume for the root hub.
> 
> 	If the controller isn't in low-power then the root hub simply
> 	generates an IRQ.  The interrupt handler calls 
> 	pm_request_resume for the root hub.
> 
> Depending on how the timing works out, both can occur: The controller
> gets set to full power and then the IRQ line is signalled before the PM
> workqueue gets around to resuming the root hub.  I have seen this
> happen while testing my computer.
> 
> With a little care, I could avoid calling pm_stay_awake twice.  The 
> code already sets a bitflag before queuing the resume request, and I 
> could use test_and_set_bit instead.  If I did then the non-nesting 
> calls wouldn't be needed at all.  And that was the only case I could
> think of where the two types of calls might need to be mixed.

That means one corner case less to worry about, good. :-)

> > > I actually like the idea of having two levels of calls, ie. pm_stay_awake(dev)
> > > calling something like __pm_stay_awake(handle), where the handle is read from
> > > struct dev_pm_info.  And analogously for pm_relax().
> > >
> > > Then, the object associated with the handle might be created when the device
> > > is enabled to wake up and destroyed when it's disabled to do that.
> > >
> > 
> > When multiple drivers use the same device, who creates and destroys the handle?
> 
> The primary driver (i.e., the one bound to the device) would use the
> device's handle.  Other drivers would create and use their own separate
> wakeup structures.
> 
> But if all the calls are of the "nesting" sort then there's not really
> any need for multiple structures, other than to help identify which
> driver is misbehaving when a problem occurs.

Still, IMO we can do that if it would help to narrow the gap between the
Android kernel and the mainline.

> > Typically a driver puts a wakelock in the state that it already
> > allocates for each device, so you have a handle for each device/driver
> > combo that needs to block suspend.
> 
> You didn't answer my question about which devices have more than one 
> associated wakelock in Android.  Those would be the only cases where 
> the device/driver combo matters.
> 
> Also, why does Android's MMC core have its own wakelock instead of
> using a per-device wakelock?
> 
> > >> Allocating these wakeup structures for all possible wakeup sources
> > >> would be a big waste.
> > >
> > > Indeed.
> > >
> > >> On the other hand, a driver generally doesn't
> > >> know until a suspend starts whether or not its devices should be
> > >> enabled for wakeup.
> > 
> > Why? How is the driver notified on suspend that it needs to be enabled
> > for wakeup.
> 
> The driver's (or bus subsystem's) suspend routine calls
> device_may_wakeup(dev) to see whether it should enable dev's wakeup
> mechanism.  The value returned by device_may_wakeup can change at
> almost any time, since it is set by userspace.
> 
> > >> That makes things difficult, especially if you
> > >> contemplate having multiple structures for each device.  Have you
> > >> really sat down and thought about what wakelocks would be needed on a
> > >> general system such as a desktop or laptop machine?
> > 
> > No, I don't know what wakelocks are needed on a general system, but
> > you need code to turn the wakeup events on and off events anyway, so I
> > assume allocating a handle for the wakeup event is possible.
> 
> Yes.  However system suspend is generally not a good time to go around 
> allocating new structures; memory tends to be tight then.

Well, that's what I wanted to do originally in pm_stay_awake(), but you
convinced me it was avoidable. :-)

> I suppose it might be acceptable to do this during the "prepare" phase of
> system suspend.

I generally think it would make sense to allocate a structure, call it struct
wakeup_source for now, for each device or whatever part of the system that we
want to treat as a wakeup source and put all of the statistics in there.

For devices it would be pretty straightforward, the structure is allocated as
soon as the device is enabled to wakeup and freed when it is disabled.  In the
other cases it would have to depend on the usage scenario, I guess.

Rafael

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Wakeup-events implementation
  2010-08-18 19:17                 ` Rafael J. Wysocki
@ 2010-08-19  0:18                   ` Arve Hjønnevåg
  2010-08-19 21:09                     ` Alan Stern
  0 siblings, 1 reply; 18+ messages in thread
From: Arve Hjønnevåg @ 2010-08-19  0:18 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux-pm mailing list

2010/8/18 Rafael J. Wysocki <rjw@sisk.pl>:
> On Wednesday, August 18, 2010, Alan Stern wrote:
>> On Tue, 17 Aug 2010, Arve Hjønnevåg wrote:
>>
>> > We don't uncover all bugs during testing before we ship. For instance
>> > the bug could be triggered by a 3db party app, and there is value in
>> > making the system as robust as possible to both driver and user space
>> > bugs.
>>
>> By that argument, all in-kernel wakelocks should always have a timeout.
>>
I don't think that is the only conclusion.

>> What happens if somebody puts down their phone and then puts something
>> on top of it, in such a way that one of the keypad buttons is
>> depressed?  If the keypad driver's wakelock doesn't have a timeout then
>> the battery would drain quickly.

This is a real problem that might be worth solving, but just adding a
wakelock timeout does not solve it since the keyboard driver would
either renew the timeout after every scan, or it would leave the
keyboard in a state where you could not use it to wake up the device
anymore.

>>
>> > >> > Wakeup events that don't reach user space should not need user space
>> > >> > support to reenter suspend.
>> > >>
>> > >> That's debatable.  For example, consider a Wake-On-LAN magic packet.
>> > >> It generally doesn't go anywhere -- it gets dropped once the system is
>> > >> awake.
>> > >
>> > > Yes.  I think every event that would wake up a sleeping system should result
>> > > in unfreezing user space.
>>
>> In fact they do -- there's almost no way to avoid it.  However,
>> unfreezing userspace is different from sending a wakeup event to
>> userspace.
>>
>> The question is whether the kernel should automatically try to
>> re-suspend following a wakeup for which no information was sent to
>> userspace (e.g., WOL).  The alternative, of course, is that the kernel
>> does not try to re-suspend until a power-manager program tells it to.
>> To me this seems more flexible and general (the PM program can always
>> start a new suspend, whereas there's no way to prevent an automatic
>> in-kernel re-suspend).
>
> I agree.
>

Having user space initiate every suspend operation is more flexible
out of the box, but you can get the same flexibility with the other
approach by adding an interface to report all wakeup events and failed
suspend attempts. However, I have not seen a use case for this
flexibility other than as a partial workaround for wakeup events that
user space needs to know about but are not reported to user space any
other way (it is only a partial workaround since it still has the same
race condition we are fixing for other wakeup events).

>> > >> Let's say the interrupt handlers responsible for invoking
>> > >> pm_request_resume use non-nesting calls.  This may happen several
>> > >> times, depending on the wakeup and interrupt delivery mechanism, but
>> > >> the end result is that the counter is incremented only once.  When the
>> > >> pm_runtime_resume call ends, pm_relax does a single decrement.
>> > >>
>> > >> Meanwhile, the driver uses nesting calls.  It can increment the counter
>> > >> as much as it wants, and it has to perform an equal number of
>> > >> decrements.  The key point is that pm_request_resume never gets called
>> > >> while the driver has any outstanding pm_stay_awake requests.  Hence the
>> > >> non-nesting calls never got "lost", since the first one is always made
>> > >> while the counter is 0.
>> > >>
>> >
>> > It sounds as if you are saying that the two drivers don't interfere
>> > because one of them always gets there first. In your example if
>> > another interrupt occurs before the second driver still has released
>> > all its references on the counter, the interrupt will (incorrectly)
>> > not increment the count.
>>
>> That's right -- one of them always gets there first.  You're also right
>> to suspect that this is an unusual situation.  It arises with USB root
>> hubs because a root hub is the only child of its parent controller
>> device, and the only reason a controller would generate a wakeup
>> request is because the root hub needs to wake up.
>>
>> This means that a root-hub wakeup can be signalled in two different
>> ways:
>>
>>       If the controller is in low-power then the controller sends its
>>       wakeup request.  The controller's driver sets the controller
>>       back to full power, sees that the root hub has needs attention,
>>       and calls pm_request_resume for the root hub.
>>
>>       If the controller isn't in low-power then the root hub simply
>>       generates an IRQ.  The interrupt handler calls
>>       pm_request_resume for the root hub.
>>
>> Depending on how the timing works out, both can occur: The controller
>> gets set to full power and then the IRQ line is signalled before the PM
>> workqueue gets around to resuming the root hub.  I have seen this
>> happen while testing my computer.
>>
>> With a little care, I could avoid calling pm_stay_awake twice.  The
>> code already sets a bitflag before queuing the resume request, and I
>> could use test_and_set_bit instead.  If I did then the non-nesting
>> calls wouldn't be needed at all.  And that was the only case I could
>> think of where the two types of calls might need to be mixed.
>
> That means one corner case less to worry about, good. :-)
>
>> > > I actually like the idea of having two levels of calls, ie. pm_stay_awake(dev)
>> > > calling something like __pm_stay_awake(handle), where the handle is read from
>> > > struct dev_pm_info.  And analogously for pm_relax().
>> > >
>> > > Then, the object associated with the handle might be created when the device
>> > > is enabled to wake up and destroyed when it's disabled to do that.
>> > >
>> >
>> > When multiple drivers use the same device, who creates and destroys the handle?
>>
>> The primary driver (i.e., the one bound to the device) would use the
>> device's handle.  Other drivers would create and use their own separate
>> wakeup structures.
>>
>> But if all the calls are of the "nesting" sort then there's not really
>> any need for multiple structures, other than to help identify which
>> driver is misbehaving when a problem occurs.
>
> Still, IMO we can do that if it would help to narrow the gap between the
> Android kernel and the mainline.
>

Some android drivers currently need the calls to be not nested so that
they can mix timeouts and pm_relax. With only nesting apis we cannot
get the same functionality.

Other drivers rely on the non nesting property of the existing
wakelock api but do not use timeouts. I expect these drivers can be
converted have the same behavior with a nested api, but there could be
difficult edge cases.

Non-android driver may need a nested api so they can use a handle that
they did not allocate. It is not clear to me if this is really needed.
If it is not needed we can make all the calls non nested and require
that the driver allocates a handle (or more commonly put the handle in
the private data it already allocates). If we do need to keep the
nested behavior, we a way to select which variant to use. We can have
separate calls that nest and require drivers that use the non-nested
variant have their own handle, or we could set a flag when creating a
handle that makes calls on that handle not nest.

>> > Typically a driver puts a wakelock in the state that it already
>> > allocates for each device, so you have a handle for each device/driver
>> > combo that needs to block suspend.
>>
>> You didn't answer my question about which devices have more than one
>> associated wakelock in Android.  Those would be the only cases where
>> the device/driver combo matters.
>>

Wakelocks are not associated with devices at all right now, so I don't
have a list for you. However, the input queues need more than one
wakelock per device if they use timeouts. In this case there is only
one driver though, but the driver allocates a separate queue for each
client that opens a device. Our keypad/gpio-keys driver may also end
up using multiple wakelocks. Many phones use different keys types in
the same logical keyboard, so you could end up with one wakelock for
the matrix scan and another one for debouncing keys connected directly
to a gpio. The driver could be changed to share a nested handle, but I
prefer to keep them separate.

>> Also, why does Android's MMC core have its own wakelock instead of
>> using a per-device wakelock?
>>

There is a global work queue.

>> > >> Allocating these wakeup structures for all possible wakeup sources
>> > >> would be a big waste.
>> > >
>> > > Indeed.
>> > >
>> > >> On the other hand, a driver generally doesn't
>> > >> know until a suspend starts whether or not its devices should be
>> > >> enabled for wakeup.
>> >
>> > Why? How is the driver notified on suspend that it needs to be enabled
>> > for wakeup.
>>
>> The driver's (or bus subsystem's) suspend routine calls
>> device_may_wakeup(dev) to see whether it should enable dev's wakeup
>> mechanism.  The value returned by device_may_wakeup can change at
>> almost any time, since it is set by userspace.
>>

It would be better to allocate the structure when user space changes
the setting. If you wait until suspend before blocking suspend you may
have passed the event on already.

>> > >> That makes things difficult, especially if you
>> > >> contemplate having multiple structures for each device.  Have you
>> > >> really sat down and thought about what wakelocks would be needed on a
>> > >> general system such as a desktop or laptop machine?
>> >
>> > No, I don't know what wakelocks are needed on a general system, but
>> > you need code to turn the wakeup events on and off events anyway, so I
>> > assume allocating a handle for the wakeup event is possible.
>>
>> Yes.  However system suspend is generally not a good time to go around
>> allocating new structures; memory tends to be tight then.
>

I mean the code that lets user space turn wakeup events on and off,
not the driver transferring a wakeup source from an interrupt to a
wakeup controller.

> Well, that's what I wanted to do originally in pm_stay_awake(), but you
> convinced me it was avoidable. :-)
>
>> I suppose it might be acceptable to do this during the "prepare" phase of
>> system suspend.
>
> I generally think it would make sense to allocate a structure, call it struct
> wakeup_source for now, for each device or whatever part of the system that we
> want to treat as a wakeup source and put all of the statistics in there.
>
> For devices it would be pretty straightforward, the structure is allocated as
> soon as the device is enabled to wakeup and freed when it is disabled.  In the
> other cases it would have to depend on the usage scenario, I guess.
>
> Rafael
>



-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Wakeup-events implementation
  2010-08-19  0:18                   ` Arve Hjønnevåg
@ 2010-08-19 21:09                     ` Alan Stern
  2010-08-19 23:12                       ` Arve Hjønnevåg
  0 siblings, 1 reply; 18+ messages in thread
From: Alan Stern @ 2010-08-19 21:09 UTC (permalink / raw)
  To: Arve Hjønnevåg; +Cc: Linux-pm mailing list

On Wed, 18 Aug 2010, Arve Hjønnevåg wrote:

> >> By that argument, all in-kernel wakelocks should always have a timeout.
> >>
> I don't think that is the only conclusion.
> 
> >> What happens if somebody puts down their phone and then puts something
> >> on top of it, in such a way that one of the keypad buttons is
> >> depressed?  If the keypad driver's wakelock doesn't have a timeout then
> >> the battery would drain quickly.
> 
> This is a real problem that might be worth solving, but just adding a
> wakelock timeout does not solve it since the keyboard driver would
> either renew the timeout after every scan, or it would leave the
> keyboard in a state where you could not use it to wake up the device
> anymore.

Okay.  Nevertheless, you maintain debugging info in case a driver bug
causes a wakelock not to be released.  If the wakelocks all had
timeouts then a bug of that sort wouldn't be able to affect battery
life very much.  So why not add the timeouts?

Or conversely, if you think the likelihood of such bugs is too low to 
worry about, then why have the debugging info?

> >> The question is whether the kernel should automatically try to
> >> re-suspend following a wakeup for which no information was sent to
> >> userspace (e.g., WOL).  The alternative, of course, is that the kernel
> >> does not try to re-suspend until a power-manager program tells it to.
> >> To me this seems more flexible and general (the PM program can always
> >> start a new suspend, whereas there's no way to prevent an automatic
> >> in-kernel re-suspend).
> >
> > I agree.
> >
> 
> Having user space initiate every suspend operation is more flexible
> out of the box, but you can get the same flexibility with the other
> approach by adding an interface to report all wakeup events and failed
> suspend attempts. However, I have not seen a use case for this
> flexibility other than as a partial workaround for wakeup events that
> user space needs to know about but are not reported to user space any
> other way (it is only a partial workaround since it still has the same
> race condition we are fixing for other wakeup events).

In any case, this is something that could easily be changed with a
small Android-specific kernel patch.  At the moment it doesn't seem to 
be a major issue.

> >> > When multiple drivers use the same device, who creates and destroys the handle?
> >>
> >> The primary driver (i.e., the one bound to the device) would use the
> >> device's handle.  Other drivers would create and use their own separate
> >> wakeup structures.
> >>
> >> But if all the calls are of the "nesting" sort then there's not really
> >> any need for multiple structures, other than to help identify which
> >> driver is misbehaving when a problem occurs.
> >
> > Still, IMO we can do that if it would help to narrow the gap between the
> > Android kernel and the mainline.
> >
> 
> Some android drivers currently need the calls to be not nested so that
> they can mix timeouts and pm_relax. With only nesting apis we cannot
> get the same functionality.
> 
> Other drivers rely on the non nesting property of the existing
> wakelock api but do not use timeouts. I expect these drivers can be
> converted have the same behavior with a nested api, but there could be
> difficult edge cases.
> 
> Non-android driver may need a nested api so they can use a handle that
> they did not allocate. It is not clear to me if this is really needed.
> If it is not needed we can make all the calls non nested and require
> that the driver allocates a handle (or more commonly put the handle in
> the private data it already allocates). If we do need to keep the
> nested behavior, we a way to select which variant to use. We can have
> separate calls that nest and require drivers that use the non-nested
> variant have their own handle, or we could set a flag when creating a
> handle that makes calls on that handle not nest.

I agree, there are several possible designs.  It's hard to tell which 
will be best without more experience.

How much degradation do you think you would observe if in-kernel
wakelocks _always_ used timeouts alone and _never_ were cancelled?

> >> You didn't answer my question about which devices have more than one
> >> associated wakelock in Android.  Those would be the only cases where
> >> the device/driver combo matters.
> >>
> 
> Wakelocks are not associated with devices at all right now, so I don't
> have a list for you. However, the input queues need more than one
> wakelock per device if they use timeouts. In this case there is only
> one driver though, but the driver allocates a separate queue for each
> client that opens a device. Our keypad/gpio-keys driver may also end
> up using multiple wakelocks. Many phones use different keys types in
> the same logical keyboard, so you could end up with one wakelock for
> the matrix scan and another one for debouncing keys connected directly
> to a gpio. The driver could be changed to share a nested handle, but I
> prefer to keep them separate.

Okay.  In addition, we've already seen that devices for which you
haven't yet implemented any wakelocks may end up needing more than one.

> >> Also, why does Android's MMC core have its own wakelock instead of
> >> using a per-device wakelock?
> >>
> 
> There is a global work queue.

So you want to be able to cancel the wakelock whenever the queue is
empty, right?  And not have to keep track of whether each MMC device
has any work items in the queue.

In theory this is insufficient -- it doesn't handle the case where one 
MMC device is enabled for wakeup and another isn't.  I assume you don't 
really care about this (especially since Android phones aren't likely 
to have more than one MMC device capable of generating wakeup events).

> >> The driver's (or bus subsystem's) suspend routine calls
> >> device_may_wakeup(dev) to see whether it should enable dev's wakeup
> >> mechanism.  The value returned by device_may_wakeup can change at
> >> almost any time, since it is set by userspace.
> >>
> 
> It would be better to allocate the structure when user space changes
> the setting. If you wait until suspend before blocking suspend you may
> have passed the event on already.

That will work for the structure directly associated with the device.  
Additional structures allocated by the drivers have to be handled
differently, because the drivers aren't notified when the wakeup-enable
setting is changed.

Alan Stern

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Wakeup-events implementation
  2010-08-19 21:09                     ` Alan Stern
@ 2010-08-19 23:12                       ` Arve Hjønnevåg
  2010-08-20 15:04                         ` Alan Stern
  0 siblings, 1 reply; 18+ messages in thread
From: Arve Hjønnevåg @ 2010-08-19 23:12 UTC (permalink / raw)
  To: Alan Stern; +Cc: Linux-pm mailing list

2010/8/19 Alan Stern <stern@rowland.harvard.edu>:
> On Wed, 18 Aug 2010, Arve Hjønnevåg wrote:
>
>> >> By that argument, all in-kernel wakelocks should always have a timeout.
>> >>
>> I don't think that is the only conclusion.
>>
>> >> What happens if somebody puts down their phone and then puts something
>> >> on top of it, in such a way that one of the keypad buttons is
>> >> depressed?  If the keypad driver's wakelock doesn't have a timeout then
>> >> the battery would drain quickly.
>>
>> This is a real problem that might be worth solving, but just adding a
>> wakelock timeout does not solve it since the keyboard driver would
>> either renew the timeout after every scan, or it would leave the
>> keyboard in a state where you could not use it to wake up the device
>> anymore.
>
> Okay.  Nevertheless, you maintain debugging info in case a driver bug
> causes a wakelock not to be released.  If the wakelocks all had
> timeouts then a bug of that sort wouldn't be able to affect battery
> life very much.  So why not add the timeouts?
>

Because we can't. I just described how the keypad driver would break
we just add a timeout to its wakelock. Other drivers, like the usb
gadget driver may need to block suspend for an extended time period.

> Or conversely, if you think the likelihood of such bugs is too low to
> worry about, then why have the debugging info?
>

That statement assumes all the bugs can be fixed by adding timeouts,
which I don't think is true. Also, you don't need to have bugs for the
stats to be useful. The gps could for instance prevent suspend for a
long time because the user ran a gps tracking app.

>> >> The question is whether the kernel should automatically try to
>> >> re-suspend following a wakeup for which no information was sent to
>> >> userspace (e.g., WOL).  The alternative, of course, is that the kernel
>> >> does not try to re-suspend until a power-manager program tells it to.
>> >> To me this seems more flexible and general (the PM program can always
>> >> start a new suspend, whereas there's no way to prevent an automatic
>> >> in-kernel re-suspend).
>> >
>> > I agree.
>> >
>>
>> Having user space initiate every suspend operation is more flexible
>> out of the box, but you can get the same flexibility with the other
>> approach by adding an interface to report all wakeup events and failed
>> suspend attempts. However, I have not seen a use case for this
>> flexibility other than as a partial workaround for wakeup events that
>> user space needs to know about but are not reported to user space any
>> other way (it is only a partial workaround since it still has the same
>> race condition we are fixing for other wakeup events).
>
> In any case, this is something that could easily be changed with a
> small Android-specific kernel patch.  At the moment it doesn't seem to
> be a major issue.
>

Yes, that seems to be a likely outcome.

>> >> > When multiple drivers use the same device, who creates and destroys the handle?
>> >>
>> >> The primary driver (i.e., the one bound to the device) would use the
>> >> device's handle.  Other drivers would create and use their own separate
>> >> wakeup structures.
>> >>
>> >> But if all the calls are of the "nesting" sort then there's not really
>> >> any need for multiple structures, other than to help identify which
>> >> driver is misbehaving when a problem occurs.
>> >
>> > Still, IMO we can do that if it would help to narrow the gap between the
>> > Android kernel and the mainline.
>> >
>>
>> Some android drivers currently need the calls to be not nested so that
>> they can mix timeouts and pm_relax. With only nesting apis we cannot
>> get the same functionality.
>>
>> Other drivers rely on the non nesting property of the existing
>> wakelock api but do not use timeouts. I expect these drivers can be
>> converted have the same behavior with a nested api, but there could be
>> difficult edge cases.
>>
>> Non-android driver may need a nested api so they can use a handle that
>> they did not allocate. It is not clear to me if this is really needed.
>> If it is not needed we can make all the calls non nested and require
>> that the driver allocates a handle (or more commonly put the handle in
>> the private data it already allocates). If we do need to keep the
>> nested behavior, we a way to select which variant to use. We can have
>> separate calls that nest and require drivers that use the non-nested
>> variant have their own handle, or we could set a flag when creating a
>> handle that makes calls on that handle not nest.
>
> I agree, there are several possible designs.  It's hard to tell which
> will be best without more experience.
>
> How much degradation do you think you would observe if in-kernel
> wakelocks _always_ used timeouts alone and _never_ were cancelled?
>

I don't think that is an option. We have kernel wakelocks that need to
be held while the device is in a specific state.

>> >> You didn't answer my question about which devices have more than one
>> >> associated wakelock in Android.  Those would be the only cases where
>> >> the device/driver combo matters.
>> >>
>>
>> Wakelocks are not associated with devices at all right now, so I don't
>> have a list for you. However, the input queues need more than one
>> wakelock per device if they use timeouts. In this case there is only
>> one driver though, but the driver allocates a separate queue for each
>> client that opens a device. Our keypad/gpio-keys driver may also end
>> up using multiple wakelocks. Many phones use different keys types in
>> the same logical keyboard, so you could end up with one wakelock for
>> the matrix scan and another one for debouncing keys connected directly
>> to a gpio. The driver could be changed to share a nested handle, but I
>> prefer to keep them separate.
>
> Okay.  In addition, we've already seen that devices for which you
> haven't yet implemented any wakelocks may end up needing more than one.
>
>> >> Also, why does Android's MMC core have its own wakelock instead of
>> >> using a per-device wakelock?
>> >>
>>
>> There is a global work queue.
>
> So you want to be able to cancel the wakelock whenever the queue is
> empty, right?  And not have to keep track of whether each MMC device
> has any work items in the queue.
>

I have not reviewed that code, so I don't know the details of how it
works or even if it is correct, but I see that it uses both timeouts
and wake_unlock calls on the same wakelock.

> In theory this is insufficient -- it doesn't handle the case where one
> MMC device is enabled for wakeup and another isn't.  I assume you don't
> really care about this (especially since Android phones aren't likely
> to have more than one MMC device capable of generating wakeup events).
>

I'm not sure what you mean by insufficient here. If it not optimal if
you also have work that does not need to prevent suspend, but it
should still work correctly.

>> >> The driver's (or bus subsystem's) suspend routine calls
>> >> device_may_wakeup(dev) to see whether it should enable dev's wakeup
>> >> mechanism.  The value returned by device_may_wakeup can change at
>> >> almost any time, since it is set by userspace.
>> >>
>>
>> It would be better to allocate the structure when user space changes
>> the setting. If you wait until suspend before blocking suspend you may
>> have passed the event on already.
>
> That will work for the structure directly associated with the device.
> Additional structures allocated by the drivers have to be handled
> differently, because the drivers aren't notified when the wakeup-enable
> setting is changed.
>

I'm not convinced that is it sufficient to detect that a device is
wakeup-enabled on suspend. However it is not a problem I have tried to
solve since we have only used static wakeup sources.


-- 
Arve Hjønnevåg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Wakeup-events implementation
  2010-08-19 23:12                       ` Arve Hjønnevåg
@ 2010-08-20 15:04                         ` Alan Stern
  2010-09-07 23:51                           ` [RFC][PATCH] PM / Wakeup: Introduce wakeup source objects and event statistics (was: Re: Wakeup-events implementation) Rafael J. Wysocki
  0 siblings, 1 reply; 18+ messages in thread
From: Alan Stern @ 2010-08-20 15:04 UTC (permalink / raw)
  To: Arve Hjønnevåg; +Cc: Linux-pm mailing list

On Thu, 19 Aug 2010, Arve Hjønnevåg wrote:

> > Okay.  Nevertheless, you maintain debugging info in case a driver bug
> > causes a wakelock not to be released.  If the wakelocks all had
> > timeouts then a bug of that sort wouldn't be able to affect battery
> > life very much.  So why not add the timeouts?
> >
> 
> Because we can't. I just described how the keypad driver would break
> we just add a timeout to its wakelock. Other drivers, like the usb
> gadget driver may need to block suspend for an extended time period.

Some wakelocks can't use timeouts because they may have to be held for 
extended periods.  But the others could have timeouts.

> > How much degradation do you think you would observe if in-kernel
> > wakelocks _always_ used timeouts alone and _never_ were cancelled?
> >
> 
> I don't think that is an option. We have kernel wakelocks that need to
> be held while the device is in a specific state.

Yes, but what about all the others?  What I'm getting at is whether
it's feasible for each wakelock to be exclusively used either with
timeouts or cancellations, but never both.

> > In theory this is insufficient -- it doesn't handle the case where one
> > MMC device is enabled for wakeup and another isn't.  I assume you don't
> > really care about this (especially since Android phones aren't likely
> > to have more than one MMC device capable of generating wakeup events).
> >
> 
> I'm not sure what you mean by insufficient here. If it not optimal if
> you also have work that does not need to prevent suspend, but it
> should still work correctly.

Yeah, that's what I meant.  You would end up blocking suspends 
unnecessarily.  It's okay if it doesn't happen very often.

> >> It would be better to allocate the structure when user space changes
> >> the setting. If you wait until suspend before blocking suspend you may
> >> have passed the event on already.
> >
> > That will work for the structure directly associated with the device.
> > Additional structures allocated by the drivers have to be handled
> > differently, because the drivers aren't notified when the wakeup-enable
> > setting is changed.
> >
> 
> I'm not convinced that is it sufficient to detect that a device is
> wakeup-enabled on suspend.

It isn't.

>  However it is not a problem I have tried to
> solve since we have only used static wakeup sources.

Well, we may need to solve it eventually.  One possible solution is to 
imitate what you're doing with MMC: Use a single per-driver wakelock 
instead of separate wakelocks for each driver/device combination.  I 
don't know if that would work for all the cases, though.

Alan Stern

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC][PATCH] PM / Wakeup: Introduce wakeup source objects and event statistics (was: Re: Wakeup-events implementation)
  2010-08-20 15:04                         ` Alan Stern
@ 2010-09-07 23:51                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2010-09-07 23:51 UTC (permalink / raw)
  To: Alan Stern; +Cc: Linux-pm mailing list, LKML

Hi,

Below is a patch that adds some statistics to the previously merged
pm_wakeup_event()/pm_stay_awake()/pm_relax() code.  It also makes it possible
to use wakeup sources that are not directly associated with devices.

It adds functions for manipulating wakeup source objects and reworks the
device wakeup enabling/disabling to use the new functions.  The list of wakeup
sources is only used for updating the "hit count" statistics for now (this is
the number of times the wakeup source was active when the PM core checked), but
I'm planning to add a /proc file listing all wakeup sources, including the ones
that are not attached to device objects.

It appears to work with the PCI wakeup code added previously, but that's only
one case.  I'm also not sure if it builds withoug CONFIG_PM_SLEEP.  [BTW, I'm
not sure it atomic_inc() and atomic_dec() imply a memory barrier in general.
That seems to be the case on x86, but I don't know about other architectures.]

Please have a look and let me know what you think.

Thanks,
Rafael


---
 drivers/base/power/main.c    |    3 
 drivers/base/power/power.h   |    1 
 drivers/base/power/runtime.c |    2 
 drivers/base/power/sysfs.c   |  121 +++++++++
 drivers/base/power/wakeup.c  |  532 ++++++++++++++++++++++++++++++++++++-------
 include/linux/pm.h           |   16 -
 include/linux/pm_wakeup.h    |  114 +++++++--
 include/linux/suspend.h      |    4 
 kernel/power/main.c          |    8 
 9 files changed, 673 insertions(+), 128 deletions(-)

Index: linux-2.6/include/linux/pm_wakeup.h
===================================================================
--- linux-2.6.orig/include/linux/pm_wakeup.h
+++ linux-2.6/include/linux/pm_wakeup.h
@@ -2,6 +2,7 @@
  *  pm_wakeup.h - Power management wakeup interface
  *
  *  Copyright (C) 2008 Alan Stern
+ *  Copyright (C) 2010 Rafael J. Wysocki, Novell Inc.
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -27,18 +28,27 @@
 
 #include <linux/types.h>
 
-#ifdef CONFIG_PM
+struct wakeup_source {
+	char 			*name;
+	struct list_head	entry;
+	spinlock_t		lock;
+	struct timer_list	timer;
+	unsigned long		timer_expires;
+	ktime_t total_time;
+	ktime_t max_time;
+	ktime_t last_time;
+	unsigned long		event_count;
+	unsigned long		active_count;
+	unsigned long		relax_count;
+	unsigned long		hit_count;
+	unsigned int		active:1;
+};
 
-/* Changes to device_may_wakeup take effect on the next pm state change.
- *
- * By default, most devices should leave wakeup disabled.  The exceptions
- * are devices that everyone expects to be wakeup sources: keyboards,
- * power buttons, possibly network interfaces, etc.
+#ifdef CONFIG_PM_SLEEP
+
+/*
+ * Changes to device_may_wakeup take effect on the next pm state change.
  */
-static inline void device_init_wakeup(struct device *dev, bool val)
-{
-	dev->power.can_wakeup = dev->power.should_wakeup = val;
-}
 
 static inline void device_set_wakeup_capable(struct device *dev, bool capable)
 {
@@ -50,23 +60,32 @@ static inline bool device_can_wakeup(str
 	return dev->power.can_wakeup;
 }
 
-static inline void device_set_wakeup_enable(struct device *dev, bool enable)
-{
-	dev->power.should_wakeup = enable;
-}
+
 
 static inline bool device_may_wakeup(struct device *dev)
 {
-	return dev->power.can_wakeup && dev->power.should_wakeup;
+	return dev->power.can_wakeup && !!dev->power.wakeup;
 }
 
-#else /* !CONFIG_PM */
+/* drivers/base/power/wakeup.c */
+extern struct wakeup_source *wakeup_source_create(const char *name);
+extern void wakeup_source_destroy(struct wakeup_source *ws);
+extern void wakeup_source_register(struct wakeup_source *ws);
+extern void wakeup_source_unregister(struct wakeup_source *ws);
+extern struct wakeup_source *wakeup_source_add(const char *name);
+extern void wakeup_source_remove(struct wakeup_source *ws);
+extern int device_wakeup_enable(struct device *dev);
+extern int device_wakeup_disable(struct device *dev);
+extern int device_init_wakeup(struct device *dev, bool val);
+extern int device_set_wakeup_enable(struct device *dev, bool enable);
+extern void __pm_stay_awake(struct wakeup_source *ws);
+extern void pm_stay_awake(struct device *dev);
+extern void __pm_relax(struct wakeup_source *ws);
+extern void pm_relax(struct device *dev);
+extern void __pm_wakeup_event(struct wakeup_source *ws, unsigned int msec);
+extern void pm_wakeup_event(struct device *dev, unsigned int msec);
 
-/* For some reason the following routines work even without CONFIG_PM */
-static inline void device_init_wakeup(struct device *dev, bool val)
-{
-	dev->power.can_wakeup = val;
-}
+#else /* !CONFIG_PM_SLEEP */
 
 static inline void device_set_wakeup_capable(struct device *dev, bool capable)
 {
@@ -78,15 +97,60 @@ static inline bool device_can_wakeup(str
 	return dev->power.can_wakeup;
 }
 
-static inline void device_set_wakeup_enable(struct device *dev, bool enable)
+static inline bool device_may_wakeup(struct device *dev)
 {
+	return false;
 }
 
-static inline bool device_may_wakeup(struct device *dev)
+static inline struct wakeup_source *wakeup_source_create(char *name)
 {
-	return false;
+	return NULL;
+}
+
+static inline void wakeup_source_destroy(struct wakeup_source *ws) {}
+
+static inline void wakeup_source_register(struct wakeup_source *ws) {}
+
+static inline void wakeup_source_unregister(struct wakeup_source *ws) {}
+
+static inline struct wakeup_source *wakeup_source_add(char *name)
+{
+	return NULL;
 }
 
-#endif /* !CONFIG_PM */
+static inline void wakeup_source_remove(struct wakeup_source *ws) {}
+
+static inline int device_wakeup_enable(struct device *dev)
+{
+	return -EINVAL;
+}
+
+static inline int device_wakeup_disable(struct device *dev) {}
+
+static inline int device_init_wakeup(struct device *dev, bool val)
+{
+	dev->power.can_wakeup = val;
+	return val ? -EINVAL : 0;
+}
+
+
+static inline int device_set_wakeup_enable(struct device *dev, bool enable)
+{
+	return -EINVAL;
+}
+
+static inline void __pm_stay_awake(struct wakeup_source *ws) {}
+
+static inline void pm_stay_awake(struct device *dev) {}
+
+static inline void __pm_relax(struct wakeup_source *ws) {}
+
+static inline void pm_relax(struct device *dev) {}
+
+static inline void __pm_wakeup_event(struct wakeup_source *ws, unsigned int msec) {}
+
+static inline void pm_wakeup_event(struct device *dev, unsigned int msec) {}
+
+#endif /* !CONFIG_PM_SLEEP */
 
 #endif /* _LINUX_PM_WAKEUP_H */
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -448,23 +448,24 @@ enum rpm_request {
 	RPM_REQ_RESUME,
 };
 
+struct wakeup_source;
+
 struct dev_pm_info {
 	pm_message_t		power_state;
 	unsigned int		can_wakeup:1;
-	unsigned int		should_wakeup:1;
 	unsigned		async_suspend:1;
 	enum dpm_state		status;		/* Owned by the PM core */
+	spinlock_t		lock;
 #ifdef CONFIG_PM_SLEEP
 	struct list_head	entry;
 	struct completion	completion;
-	unsigned long		wakeup_count;
+	struct wakeup_source	*wakeup;
 #endif
 #ifdef CONFIG_PM_RUNTIME
 	struct timer_list	suspend_timer;
 	unsigned long		timer_expires;
 	struct work_struct	work;
 	wait_queue_head_t	wait_queue;
-	spinlock_t		lock;
 	atomic_t		usage_count;
 	atomic_t		child_count;
 	unsigned int		disable_depth:3;
@@ -559,11 +560,6 @@ extern void __suspend_report_result(cons
 	} while (0)
 
 extern void device_pm_wait_for_dev(struct device *sub, struct device *dev);
-
-/* drivers/base/power/wakeup.c */
-extern void pm_wakeup_event(struct device *dev, unsigned int msec);
-extern void pm_stay_awake(struct device *dev);
-extern void pm_relax(void);
 #else /* !CONFIG_PM_SLEEP */
 
 #define device_pm_lock() do {} while (0)
@@ -577,10 +573,6 @@ static inline int dpm_suspend_start(pm_m
 #define suspend_report_result(fn, ret)		do {} while (0)
 
 static inline void device_pm_wait_for_dev(struct device *a, struct device *b) {}
-
-static inline void pm_wakeup_event(struct device *dev, unsigned int msec) {}
-static inline void pm_stay_awake(struct device *dev) {}
-static inline void pm_relax(void) {}
 #endif /* !CONFIG_PM_SLEEP */
 
 /* How to reorder dpm_list after device_move() */
Index: linux-2.6/drivers/base/power/wakeup.c
===================================================================
--- linux-2.6.orig/drivers/base/power/wakeup.c
+++ linux-2.6/drivers/base/power/wakeup.c
@@ -11,7 +11,10 @@
 #include <linux/sched.h>
 #include <linux/capability.h>
 #include <linux/suspend.h>
-#include <linux/pm.h>
+
+#include "power.h"
+
+#define TIMEOUT		100
 
 /*
  * If set, the suspend/hibernate code will abort transitions to a sleep state
@@ -20,18 +23,251 @@
 bool events_check_enabled;
 
 /* The counter of registered wakeup events. */
-static unsigned long event_count;
+static atomic_t event_count = ATOMIC_INIT(0);
 /* A preserved old value of event_count. */
-static unsigned long saved_event_count;
+static unsigned int saved_count;
 /* The counter of wakeup events being processed. */
-static unsigned long events_in_progress;
+static atomic_t events_in_progress = ATOMIC_INIT(0);
 
 static DEFINE_SPINLOCK(events_lock);
 
 static void pm_wakeup_timer_fn(unsigned long data);
 
-static DEFINE_TIMER(events_timer, pm_wakeup_timer_fn, 0, 0);
-static unsigned long events_timer_expires;
+static LIST_HEAD(wakeup_sources);
+
+/**
+ * wakeup_source_create - Create a struct wakeup_source object.
+ * @name: Name of the new wakeup source.
+ */
+struct wakeup_source *wakeup_source_create(const char *name)
+{
+	struct wakeup_source *ws;
+
+	ws = kzalloc(sizeof(*ws), GFP_KERNEL);
+	if (!ws)
+		return NULL;
+
+	if (name) {
+		int len = strlen(name);
+		char *s = kzalloc(len + 1, GFP_KERNEL);
+		if (s) {
+			strncpy(s, name, len);
+			ws->name = s;
+		}
+	}
+
+	return ws;
+}
+EXPORT_SYMBOL_GPL(wakeup_source_create);
+
+/**
+ * wakeup_source_destroy - Destroy a struct wakeup_source object.
+ * @ws: Wakeup source to destroy.
+ */
+void wakeup_source_destroy(struct wakeup_source *ws)
+{
+	if (!ws)
+		return;
+
+	spin_lock_irq(&ws->lock);
+	while (ws->active) {
+		spin_unlock_irq(&ws->lock);
+
+		schedule_timeout_interruptible(msecs_to_jiffies(TIMEOUT));
+
+		spin_lock_irq(&ws->lock);
+	}
+	spin_unlock_irq(&ws->lock);
+
+	if (ws->name)
+		kfree(ws->name);
+	kfree(ws);
+}
+EXPORT_SYMBOL_GPL(wakeup_source_destroy);
+
+/**
+ * wakeup_source_register - Add given object to the list of wakeup sources.
+ * @ws: Wakeup source object to register.
+ */
+void wakeup_source_register(struct wakeup_source *ws)
+{
+	if (WARN_ON(!ws))
+		return;
+
+	spin_lock_init(&ws->lock);
+	setup_timer(&ws->timer, pm_wakeup_timer_fn, (unsigned long)ws);
+	ws->active = false;
+
+	spin_lock_irq(&events_lock);
+	list_add_rcu(&ws->entry, &wakeup_sources);
+	spin_unlock_irq(&events_lock);
+	synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(wakeup_source_register);
+
+/**
+ * wakeup_source_unregister - Remove given object from the wakeup sources list.
+ * @ws: Wakeup source object to unregister.
+ */
+void wakeup_source_unregister(struct wakeup_source *ws)
+{
+	if (WARN_ON(!ws))
+		return;
+
+	spin_lock_irq(&events_lock);
+	list_del_rcu(&ws->entry);
+	spin_unlock_irq(&events_lock);
+	synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(wakeup_source_unregister);
+
+/**
+ * wakeup_source_add - Create and register a wakeup source object.
+ * @name: Name of the wakeup source to create.
+ */
+struct wakeup_source *wakeup_source_add(const char *name)
+{
+	struct wakeup_source *ws;
+
+	ws = wakeup_source_create(name);
+	if (ws)
+		wakeup_source_register(ws);
+
+	return ws;
+}
+EXPORT_SYMBOL_GPL(wakeup_source_add);
+
+/**
+ * wakeup_source_remove - Unregister and destroy a wakeup source object.
+ * @ws: Wakeup source object to handle.
+ */
+void wakeup_source_remove(struct wakeup_source *ws)
+{
+	wakeup_source_unregister(ws);
+	wakeup_source_destroy(ws);
+}
+EXPORT_SYMBOL_GPL(wakeup_source_remove);
+
+/**
+ * device_wakeup_attach - Attach a wakeup source object to a device object.
+ * @dev: Device to handle.
+ * @ws: Wakeup source object to attach to @dev.
+ *
+ * This causes @dev to be treated as a wakeup device.
+ */
+static int device_wakeup_attach(struct device *dev, struct wakeup_source *ws)
+{
+	spin_lock_irq(&dev->power.lock);
+	if (dev->power.wakeup) {
+		spin_unlock_irq(&dev->power.lock);
+		return -EEXIST;
+	}
+	dev->power.wakeup = ws;
+	spin_unlock_irq(&dev->power.lock);
+	return 0;
+}
+
+/**
+ * device_wakeup_enable - Enable given device to be a wakeup source.
+ * @dev: Device to handle.
+ *
+ * Create a wakeup source object, register it and attach it to @dev.
+ */
+int device_wakeup_enable(struct device *dev)
+{
+	struct wakeup_source *ws;
+	int ret;
+
+	if (!dev || !dev->power.can_wakeup)
+		return -EINVAL;
+
+	ws = wakeup_source_add(dev_name(dev));
+	if (!ws)
+		return -ENOMEM;
+
+	ret = device_wakeup_attach(dev, ws);
+	if (ret)
+		wakeup_source_remove(ws);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(device_wakeup_enable);
+
+/**
+ * device_wakeup_detach - Detach a device's wakeup source object from it.
+ * @dev: Device to detach the wakeup source object from.
+ *
+ * After it returns, @dev will not be treated as a wakeup device any more.
+ */
+static struct wakeup_source *device_wakeup_detach(struct device *dev)
+{
+	struct wakeup_source *ws;
+
+	spin_lock_irq(&dev->power.lock);
+	ws = dev->power.wakeup;
+	dev->power.wakeup = NULL;
+	spin_unlock_irq(&dev->power.lock);
+	return ws;
+}
+
+/**
+ * device_wakeup_disable - Do not regard a device as a wakeup source any more.
+ * @dev: Device to handle.
+ *
+ * Detach the @dev's wakeup source object from it, unregister this wakeup source
+ * object and destroy it.
+ */
+int device_wakeup_disable(struct device *dev)
+{
+	struct wakeup_source *ws;
+
+	if (!dev || !dev->power.can_wakeup)
+		return -EINVAL;
+
+	ws = device_wakeup_detach(dev);
+	if (ws)
+		wakeup_source_remove(ws);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(device_wakeup_disable);
+
+/**
+ * device_init_wakeup - Device wakeup initialization.
+ * @dev: Device to handle.
+ * @enable: Whether or not to enable @dev as a wakeup device.
+ *
+ * By default, most devices should leave wakeup disabled.  The exceptions are
+ * devices that everyone expects to be wakeup sources: keyboards, power buttons,
+ * possibly network interfaces, etc.
+ */
+int device_init_wakeup(struct device *dev, bool enable)
+{
+	int ret = 0;
+
+	if (enable) {
+		device_set_wakeup_capable(dev, true);
+		ret = device_wakeup_enable(dev);
+	} else {
+		device_set_wakeup_capable(dev, false);
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(device_init_wakeup);
+
+/**
+ * device_set_wakeup_enable - Enable or disable a device to wake up the system.
+ * @dev: Device to handle.
+ */
+int device_set_wakeup_enable(struct device *dev, bool enable)
+{
+	if (!dev || !dev->power.can_wakeup)
+		return -EINVAL;
+
+	return enable ? device_wakeup_enable(dev) : device_wakeup_disable(dev);
+}
+EXPORT_SYMBOL_GPL(device_set_wakeup_enable);
 
 /*
  * The functions below use the observation that each wakeup event starts a
@@ -55,118 +291,252 @@ static unsigned long events_timer_expire
  * knowledge, however, may not be available to it, so it can simply specify time
  * to wait before the system can be suspended and pass it as the second
  * argument of pm_wakeup_event().
+ *
+ * It is valid to call pm_relax() after pm_wakeup_event(), in which case the
+ * "no suspend" period will be ended either by the pm_relax(), or by the timer
+ * function executed when the timer expires, whichever comes first.
  */
 
 /**
+ * wakup_source_activate - Mark given wakeup source as active.
+ * @ws: Wakeup source to handle.
+ *
+ * Update the @ws' statistics and, if @ws has just been activated, notify the PM
+ * core of the event by incrementing the counter of of wakeup events being
+ * processed.
+ */
+static void wakeup_source_activate(struct wakeup_source *ws)
+{
+	ws->active = true;
+	ws->active_count++;
+	ws->timer_expires = jiffies;
+	ws->last_time = ktime_get();
+
+	atomic_inc(&events_in_progress);
+}
+
+/**
+ * __pm_stay_awake - Notify the PM core of a wakeup event.
+ * @ws: Wakeup source object associated with the source of the event.
+ *
+ * It is safe to call this function from interrupt context.
+ */
+void __pm_stay_awake(struct wakeup_source *ws)
+{
+	unsigned long flags;
+
+	if (!ws)
+		return;
+
+	spin_lock_irqsave(&ws->lock, flags);
+	ws->event_count++;
+	if (!ws->active)
+		wakeup_source_activate(ws);
+	spin_unlock_irqrestore(&ws->lock, flags);
+}
+EXPORT_SYMBOL_GPL(__pm_stay_awake);
+
+/**
  * pm_stay_awake - Notify the PM core that a wakeup event is being processed.
  * @dev: Device the wakeup event is related to.
  *
- * Notify the PM core of a wakeup event (signaled by @dev) by incrementing the
- * counter of wakeup events being processed.  If @dev is not NULL, the counter
- * of wakeup events related to @dev is incremented too.
+ * Notify the PM core of a wakeup event (signaled by @dev) by calling
+ * __pm_stay_awake for the @dev's wakeup source object.
  *
  * Call this function after detecting of a wakeup event if pm_relax() is going
  * to be called directly after processing the event (and possibly passing it to
  * user space for further processing).
- *
- * It is safe to call this function from interrupt context.
  */
 void pm_stay_awake(struct device *dev)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&events_lock, flags);
-	if (dev)
-		dev->power.wakeup_count++;
+	if (!dev)
+		return;
 
-	events_in_progress++;
-	spin_unlock_irqrestore(&events_lock, flags);
+	spin_lock_irqsave(&dev->power.lock, flags);
+	__pm_stay_awake(dev->power.wakeup);
+	spin_unlock_irqrestore(&dev->power.lock, flags);
 }
+EXPORT_SYMBOL_GPL(pm_stay_awake);
 
 /**
- * pm_relax - Notify the PM core that processing of a wakeup event has ended.
+ * wakup_source_deactivate - Mark given wakeup source as inactive.
+ * @ws: Wakeup source to handle.
  *
- * Notify the PM core that a wakeup event has been processed by decrementing
- * the counter of wakeup events being processed and incrementing the counter
- * of registered wakeup events.
+ * Update the @ws' statistics and notify the PM core that the wakeup source has
+ * become inactive by decrementing the counter of wakeup events being processed
+ * and incrementing the counter of registered wakeup events.
+ */
+static void wakeup_source_deactivate(struct wakeup_source *ws)
+{
+	ktime_t duration;
+	ktime_t now;
+
+	ws->relax_count++;
+	/*
+	 * __pm_relax() may be called directly or from a timer function.
+	 * If it is called directly right after the timer function has been
+	 * started, but before the timer function calls __pm_relax(), it is
+	 * possible that __pm_stay_awake() will be called in the meantime and
+	 * will set ws->active.  Then, ws->active may be cleared immediately
+	 * by the __pm_relax() called from the timer function, but in such a
+	 * case ws->relax_count will be different from ws->active_count.
+	 */
+	if (ws->relax_count != ws->active_count) {
+		ws->relax_count--;
+		return;
+	}
+
+	ws->active = false;
+
+	now = ktime_get();
+	duration = ktime_sub(now, ws->last_time);
+	ws->total_time = ktime_add(ws->total_time, duration);
+	if (ktime_to_ns(duration) > ktime_to_ns(ws->max_time))
+		ws->max_time = duration;
+
+	del_timer(&ws->timer);
+
+	atomic_inc(&event_count);
+	atomic_dec(&events_in_progress);
+}
+
+/**
+ * __pm_relax - Notify the PM core that processing of a wakeup event has ended.
+ * @ws: Wakeup source object associated with the source of the event.
  *
  * Call this function for wakeup events whose processing started with calling
- * pm_stay_awake().
+ * __pm_stay_awake().
  *
  * It is safe to call it from interrupt context.
  */
-void pm_relax(void)
+void __pm_relax(struct wakeup_source *ws)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&events_lock, flags);
-	if (events_in_progress) {
-		events_in_progress--;
-		event_count++;
-	}
-	spin_unlock_irqrestore(&events_lock, flags);
+	if (!ws)
+		return;
+
+	spin_lock_irqsave(&ws->lock, flags);
+	if (ws->active)
+		wakeup_source_deactivate(ws);
+	spin_unlock_irqrestore(&ws->lock, flags);
 }
+EXPORT_SYMBOL_GPL(__pm_relax);
+
+/**
+ * pm_relax - Notify the PM core that processing of a wakeup event has ended.
+ * @dev: Device that signaled the event.
+ *
+ * Execute __pm_relax() for the @dev's wakeup source object.
+ */
+void pm_relax(struct device *dev)
+{
+	unsigned long flags;
+
+	if (!dev)
+		return;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+	__pm_relax(dev->power.wakeup);
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_relax);
 
 /**
  * pm_wakeup_timer_fn - Delayed finalization of a wakeup event.
+ * @data: Address of the wakeup source object associated with the event source.
  *
- * Decrease the counter of wakeup events being processed after it was increased
- * by pm_wakeup_event().
+ * Call __pm_relax() for the wakeup source whose address is stored in @data.
  */
 static void pm_wakeup_timer_fn(unsigned long data)
 {
+	__pm_relax((struct wakeup_source *)data);
+}
+
+/**
+ * __pm_wakeup_event - Notify the PM core of a wakeup event.
+ * @ws: Wakeup source object associated with the event source.
+ * @msec: Anticipated event processing time (in milliseconds).
+ *
+ * Notify the PM core of a wakeup event whose source is @ws that will take
+ * approximately @msec milliseconds to be processed by the kernel.  If @ws is
+ * not active, activate it.  If @msec is nonzero, set up the @ws' timer to
+ * execute pm_wakeup_timer_fn() in future.
+ *
+ * It is safe to call this function from interrupt context.
+ */
+void __pm_wakeup_event(struct wakeup_source *ws, unsigned int msec)
+{
 	unsigned long flags;
+	unsigned long expires;
 
-	spin_lock_irqsave(&events_lock, flags);
-	if (events_timer_expires
-	    && time_before_eq(events_timer_expires, jiffies)) {
-		events_in_progress--;
-		events_timer_expires = 0;
+	if (!ws)
+		return;
+
+	spin_lock_irqsave(&ws->lock, flags);
+
+	ws->event_count++;
+	if (!ws->active)
+		wakeup_source_activate(ws);
+
+	if (!msec) {
+		wakeup_source_deactivate(ws);
+		goto unlock;
 	}
-	spin_unlock_irqrestore(&events_lock, flags);
+
+	expires = jiffies + msecs_to_jiffies(msec);
+	if (!expires)
+		expires = 1;
+
+	if (time_after(expires, ws->timer_expires)) {
+		mod_timer(&ws->timer, expires);
+		ws->timer_expires = expires;
+	}
+
+ unlock:
+	spin_unlock_irqrestore(&ws->lock, flags);
 }
+EXPORT_SYMBOL_GPL(__pm_wakeup_event);
+
 
 /**
  * pm_wakeup_event - Notify the PM core of a wakeup event.
  * @dev: Device the wakeup event is related to.
  * @msec: Anticipated event processing time (in milliseconds).
  *
- * Notify the PM core of a wakeup event (signaled by @dev) that will take
- * approximately @msec milliseconds to be processed by the kernel.  Increment
- * the counter of registered wakeup events and (if @msec is nonzero) set up
- * the wakeup events timer to execute pm_wakeup_timer_fn() in future (if the
- * timer has not been set up already, increment the counter of wakeup events
- * being processed).  If @dev is not NULL, the counter of wakeup events related
- * to @dev is incremented too.
- *
- * It is safe to call this function from interrupt context.
+ * Call __pm_wakeup_event() for the @dev's wakeup source object.
  */
 void pm_wakeup_event(struct device *dev, unsigned int msec)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&events_lock, flags);
-	event_count++;
-	if (dev)
-		dev->power.wakeup_count++;
-
-	if (msec) {
-		unsigned long expires;
-
-		expires = jiffies + msecs_to_jiffies(msec);
-		if (!expires)
-			expires = 1;
-
-		if (!events_timer_expires
-		    || time_after(expires, events_timer_expires)) {
-			if (!events_timer_expires)
-				events_in_progress++;
+	if (!dev)
+		return;
 
-			mod_timer(&events_timer, expires);
-			events_timer_expires = expires;
-		}
+	spin_lock_irqsave(&dev->power.lock, flags);
+	__pm_wakeup_event(dev->power.wakeup, msec);
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+}
+EXPORT_SYMBOL_GPL(pm_wakeup_event);
+
+/**
+ * pm_wakeup_update_hit_counts - Update hit counts of all active wakeup sources.
+ */
+static void pm_wakeup_update_hit_counts(void)
+{
+	unsigned long flags;
+	struct wakeup_source *ws;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(ws, &wakeup_sources, entry) {
+		spin_lock_irqsave(&ws->lock, flags);
+		if (ws->active)
+			ws->hit_count++;
+		spin_unlock_irqrestore(&ws->lock, flags);
 	}
-	spin_unlock_irqrestore(&events_lock, flags);
+	rcu_read_unlock();
 }
 
 /**
@@ -184,10 +554,13 @@ bool pm_check_wakeup_events(void)
 
 	spin_lock_irqsave(&events_lock, flags);
 	if (events_check_enabled) {
-		ret = (event_count == saved_event_count) && !events_in_progress;
+		ret = ((unsigned int)atomic_read(&event_count) == saved_count)
+			&& !atomic_read(&events_in_progress);
 		events_check_enabled = ret;
 	}
 	spin_unlock_irqrestore(&events_lock, flags);
+	if (!ret)
+		pm_wakeup_update_hit_counts();
 	return ret;
 }
 
@@ -202,24 +575,20 @@ bool pm_check_wakeup_events(void)
  * drop down to zero has been interrupted by a signal (and the current number
  * of wakeup events being processed is still nonzero).  Otherwise return true.
  */
-bool pm_get_wakeup_count(unsigned long *count)
+bool pm_get_wakeup_count(unsigned int *count)
 {
 	bool ret;
 
-	spin_lock_irq(&events_lock);
 	if (capable(CAP_SYS_ADMIN))
 		events_check_enabled = false;
 
-	while (events_in_progress && !signal_pending(current)) {
-		spin_unlock_irq(&events_lock);
-
-		schedule_timeout_interruptible(msecs_to_jiffies(100));
-
-		spin_lock_irq(&events_lock);
+	while (atomic_read(&events_in_progress) && !signal_pending(current)) {
+		pm_wakeup_update_hit_counts();
+		schedule_timeout_interruptible(msecs_to_jiffies(TIMEOUT));
 	}
-	*count = event_count;
-	ret = !events_in_progress;
-	spin_unlock_irq(&events_lock);
+
+	ret = !atomic_read(&events_in_progress);
+	*count = atomic_read(&event_count);
 	return ret;
 }
 
@@ -232,16 +601,19 @@ bool pm_get_wakeup_count(unsigned long *
  * old number of registered wakeup events to be used by pm_check_wakeup_events()
  * and return true.  Otherwise return false.
  */
-bool pm_save_wakeup_count(unsigned long count)
+bool pm_save_wakeup_count(unsigned int count)
 {
 	bool ret = false;
 
 	spin_lock_irq(&events_lock);
-	if (count == event_count && !events_in_progress) {
-		saved_event_count = count;
+	if (count == (unsigned int)atomic_read(&event_count)
+	    && !atomic_read(&events_in_progress)) {
+		saved_count = count;
 		events_check_enabled = true;
 		ret = true;
 	}
 	spin_unlock_irq(&events_lock);
+	if (!ret)
+		pm_wakeup_update_hit_counts();
 	return ret;
 }
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -60,7 +60,8 @@ void device_pm_init(struct device *dev)
 	dev->power.status = DPM_ON;
 	init_completion(&dev->power.completion);
 	complete_all(&dev->power.completion);
-	dev->power.wakeup_count = 0;
+	dev->power.wakeup = NULL;
+	spin_lock_init(&dev->power.lock);
 	pm_runtime_init(dev);
 }
 
Index: linux-2.6/drivers/base/power/power.h
===================================================================
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -34,6 +34,7 @@ extern void device_pm_move_last(struct d
 
 static inline void device_pm_init(struct device *dev)
 {
+	spin_lock_init(&dev->power.lock);
 	pm_runtime_init(dev);
 }
 
Index: linux-2.6/drivers/base/power/runtime.c
===================================================================
--- linux-2.6.orig/drivers/base/power/runtime.c
+++ linux-2.6/drivers/base/power/runtime.c
@@ -1099,8 +1099,6 @@ EXPORT_SYMBOL_GPL(pm_runtime_allow);
  */
 void pm_runtime_init(struct device *dev)
 {
-	spin_lock_init(&dev->power.lock);
-
 	dev->power.runtime_status = RPM_SUSPENDED;
 	dev->power.idle_notification = false;
 
Index: linux-2.6/include/linux/suspend.h
===================================================================
--- linux-2.6.orig/include/linux/suspend.h
+++ linux-2.6/include/linux/suspend.h
@@ -302,8 +302,8 @@ extern int unregister_pm_notifier(struct
 extern bool events_check_enabled;
 
 extern bool pm_check_wakeup_events(void);
-extern bool pm_get_wakeup_count(unsigned long *count);
-extern bool pm_save_wakeup_count(unsigned long count);
+extern bool pm_get_wakeup_count(unsigned int *count);
+extern bool pm_save_wakeup_count(unsigned int count);
 #else /* !CONFIG_PM_SLEEP */
 
 static inline int register_pm_notifier(struct notifier_block *nb)
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -237,18 +237,18 @@ static ssize_t wakeup_count_show(struct 
 				struct kobj_attribute *attr,
 				char *buf)
 {
-	unsigned long val;
+	unsigned int val;
 
-	return pm_get_wakeup_count(&val) ? sprintf(buf, "%lu\n", val) : -EINTR;
+	return pm_get_wakeup_count(&val) ? sprintf(buf, "%u\n", val) : -EINTR;
 }
 
 static ssize_t wakeup_count_store(struct kobject *kobj,
 				struct kobj_attribute *attr,
 				const char *buf, size_t n)
 {
-	unsigned long val;
+	unsigned int val;
 
-	if (sscanf(buf, "%lu", &val) == 1) {
+	if (sscanf(buf, "%u", &val) == 1) {
 		if (pm_save_wakeup_count(val))
 			return n;
 	}
Index: linux-2.6/drivers/base/power/sysfs.c
===================================================================
--- linux-2.6.orig/drivers/base/power/sysfs.c
+++ linux-2.6/drivers/base/power/sysfs.c
@@ -210,11 +210,122 @@ static DEVICE_ATTR(wakeup, 0644, wake_sh
 static ssize_t wakeup_count_show(struct device *dev,
 				struct device_attribute *attr, char *buf)
 {
-	return sprintf(buf, "%lu\n", dev->power.wakeup_count);
+	unsigned long count = 0;
+	bool enabled = false;
+
+	spin_lock_irq(&dev->power.lock);
+	if (dev->power.wakeup) {
+		count = dev->power.wakeup->event_count;
+		enabled = true;
+	}
+	spin_unlock_irq(&dev->power.lock);
+	return enabled ? sprintf(buf, "%lu\n", count) : sprintf(buf, "\n");
 }
 
 static DEVICE_ATTR(wakeup_count, 0444, wakeup_count_show, NULL);
-#endif
+
+static ssize_t wakeup_active_count_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	unsigned long count = 0;
+	bool enabled = false;
+
+	spin_lock_irq(&dev->power.lock);
+	if (dev->power.wakeup) {
+		count = dev->power.wakeup->active_count;
+		enabled = true;
+	}
+	spin_unlock_irq(&dev->power.lock);
+	return enabled ? sprintf(buf, "%lu\n", count) : sprintf(buf, "\n");
+}
+
+static DEVICE_ATTR(wakeup_active_count, 0444, wakeup_active_count_show, NULL);
+
+static ssize_t wakeup_hit_count_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	unsigned long count = 0;
+	bool enabled = false;
+
+	spin_lock_irq(&dev->power.lock);
+	if (dev->power.wakeup) {
+		count = dev->power.wakeup->hit_count;
+		enabled = true;
+	}
+	spin_unlock_irq(&dev->power.lock);
+	return enabled ? sprintf(buf, "%lu\n", count) : sprintf(buf, "\n");
+}
+
+static DEVICE_ATTR(wakeup_hit_count, 0444, wakeup_hit_count_show, NULL);
+
+static ssize_t wakeup_active_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	unsigned int active = 0;
+	bool enabled = false;
+
+	spin_lock_irq(&dev->power.lock);
+	if (dev->power.wakeup) {
+		active = dev->power.wakeup->active;
+		enabled = true;
+	}
+	spin_unlock_irq(&dev->power.lock);
+	return enabled ? sprintf(buf, "%u\n", active) : sprintf(buf, "\n");
+}
+
+static DEVICE_ATTR(wakeup_active, 0444, wakeup_active_show, NULL);
+
+static ssize_t wakeup_total_time_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	s64 msec = 0;
+	bool enabled = false;
+
+	spin_lock_irq(&dev->power.lock);
+	if (dev->power.wakeup) {
+		msec = ktime_to_ms(dev->power.wakeup->total_time);
+		enabled = true;
+	}
+	spin_unlock_irq(&dev->power.lock);
+	return enabled ? sprintf(buf, "%lld\n", msec) : sprintf(buf, "\n");
+}
+
+static DEVICE_ATTR(wakeup_total_time_ms, 0444, wakeup_total_time_show, NULL);
+
+static ssize_t wakeup_max_time_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	s64 msec = 0;
+	bool enabled = false;
+
+	spin_lock_irq(&dev->power.lock);
+	if (dev->power.wakeup) {
+		msec = ktime_to_ms(dev->power.wakeup->max_time);
+		enabled = true;
+	}
+	spin_unlock_irq(&dev->power.lock);
+	return enabled ? sprintf(buf, "%lld\n", msec) : sprintf(buf, "\n");
+}
+
+static DEVICE_ATTR(wakeup_max_time_ms, 0444, wakeup_max_time_show, NULL);
+
+static ssize_t wakeup_last_time_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	s64 msec = 0;
+	bool enabled = false;
+
+	spin_lock_irq(&dev->power.lock);
+	if (dev->power.wakeup) {
+		msec = ktime_to_ms(dev->power.wakeup->last_time);
+		enabled = true;
+	}
+	spin_unlock_irq(&dev->power.lock);
+	return enabled ? sprintf(buf, "%lld\n", msec) : sprintf(buf, "\n");
+}
+
+static DEVICE_ATTR(wakeup_last_time_ms, 0444, wakeup_last_time_show, NULL);
+#endif /* CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_PM_ADVANCED_DEBUG
 #ifdef CONFIG_PM_RUNTIME
@@ -288,6 +399,12 @@ static struct attribute * power_attrs[] 
 	&dev_attr_wakeup.attr,
 #ifdef CONFIG_PM_SLEEP
 	&dev_attr_wakeup_count.attr,
+	&dev_attr_wakeup_active_count.attr,
+	&dev_attr_wakeup_hit_count.attr,
+	&dev_attr_wakeup_active.attr,
+	&dev_attr_wakeup_total_time_ms.attr,
+	&dev_attr_wakeup_max_time_ms.attr,
+	&dev_attr_wakeup_last_time_ms.attr,
 #endif
 #ifdef CONFIG_PM_ADVANCED_DEBUG
 	&dev_attr_async.attr,

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Wakeup-events implementation
  2010-08-17 15:53         ` Alan Stern
  2010-08-17 18:37           ` David Brownell
  2010-08-17 19:21           ` Rafael J. Wysocki
@ 2010-08-18  7:04           ` Florian Mickler
  2 siblings, 0 replies; 18+ messages in thread
From: Florian Mickler @ 2010-08-18  7:04 UTC (permalink / raw)
  To: Alan Stern; +Cc: list, Linux-pm

On Tue, 17 Aug 2010 11:53:53 -0400 (EDT)
Alan Stern <stern@rowland.harvard.edu> wrote:

> Also, I frankly have to wonder why you demand such a large amount of
> detailed debugging about the kernel's wakeup/wakelock subsystem (as
> opposed to userspace wakelocks).  Yes, it's clear that if something
> goes wrong then you will drain your phone's battery in a few hours
> rather than a few days.  But what if something goes wrong with the VM
> subsystem, or the RF subsystem, or the keypad driver?  The device would
> crash immediately or at best become almost totally unusable.  These are
> much worse scenarios than losing some battery life -- have you added
> comparable quantities of debugging to those other parts of the kernel?
> 
> In addition, you retain all these debugging facilities in your
> production builds, not just in the debug kernels.  Doesn't this seem
> excessive?
> 

There is a theory about risk analysis and follow-up costs... i don't have
anything concrete at hand but the main idea is that if you develop for
example some monitoring for a production line of a pharmaceutical
product, you have to take into consideration all possible
failure-modes, the associated risk (likelihood * impact)
 and the likelihood of detecting these failures before shipping.

A failure mode that is fatal (->high risk), and is not detectable
before shipping is bad and a sign that you probably shouldn't produce
at all.

A failure mode that is fatal (->high risk) but is detectable in all
cases before shipping is not such a big problem. 

With battery powered devices, you have failure-modes (run-time not
optimal) that is not easily detectable before shipping. This is the
worst kind of failure-mode because you don't know if it will happen or
not. 

The risk of this failure mode (non-optimal runtime) is bad also, because
for mobile devices the runtime is critical and the fact that we talk
about millions of devices will turn this up with a non-negligible 
probability.

So I totally understand that the android people want all the debugging
they can get to make non-optimal runtime as visible as a broken vm or a
not functioning keypad. After all, runtime is critical for mobile
devices.

Cheers,
Flo

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2010-09-07 23:51 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <AANLkTimgJUAW9TU37XE4BqYWf3yCWvg-vgk3E0YLqcm9@mail.gmail.com>
2010-08-13 15:59 ` Wakeup-events implementation Alan Stern
2010-08-14  4:59   ` Arve Hjønnevåg
2010-08-16 19:22     ` Alan Stern
2010-08-17  0:30       ` Arve Hjønnevåg
2010-08-17 15:53         ` Alan Stern
2010-08-17 18:37           ` David Brownell
2010-08-17 19:19             ` Alan Stern
2010-08-17 19:21           ` Rafael J. Wysocki
2010-08-18  0:40             ` Arve Hjønnevåg
2010-08-18 10:32               ` Mark Brown
2010-08-18 15:20               ` Alan Stern
2010-08-18 19:17                 ` Rafael J. Wysocki
2010-08-19  0:18                   ` Arve Hjønnevåg
2010-08-19 21:09                     ` Alan Stern
2010-08-19 23:12                       ` Arve Hjønnevåg
2010-08-20 15:04                         ` Alan Stern
2010-09-07 23:51                           ` [RFC][PATCH] PM / Wakeup: Introduce wakeup source objects and event statistics (was: Re: Wakeup-events implementation) Rafael J. Wysocki
2010-08-18  7:04           ` Wakeup-events implementation Florian Mickler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox