* Wakeup-events implementation [not found] <AANLkTimgJUAW9TU37XE4BqYWf3yCWvg-vgk3E0YLqcm9@mail.gmail.com> @ 2010-08-13 15:59 ` Alan Stern 2010-08-14 4:59 ` Arve Hjønnevåg 0 siblings, 1 reply; 18+ messages in thread From: Alan Stern @ 2010-08-13 15:59 UTC (permalink / raw) To: Arve Hjønnevåg; +Cc: Linux-pm mailing list Since this area of the discussion has strayed away from the path of Paul McKenney's original summary, I'm starting a new email thread and drastically cutting the CC list. On Sun, 8 Aug 2010, Arve Hjønnevåg wrote: > > Maybe. There aren't enough examples coded up yet to be sure. At this > > point we could easily make the device argument required instead of > > optional, and we could add a device argument to pm_relax. > > > I think that would be a good start. I can create dummy devices for now > where I don't already have a device. Rafael and I spent some time discussing all this during LinuxCon. Combining what he thought with my own ideas led us to decide that the wakeup information should be stored in a separate structure, not in dev_pm_info. These structures will be allocated dynamically when a device is enabled for remote wakeup (and presumably deallocated if a device is disabled for remote wakeup). A pointer to the new structure will be stored in dev_pm_info. We expect that most devices won't need a wakeup structure, because most devices aren't wakeup sources. This new structure (as yet unnamed) will be a lot like your suspend_blocker structure, but with some differences. It won't need a name, since the device already has a name. It won't have a timer, if we can get away with leaving one out. But it will optionally contain various counters that can be used for statistics reporting and debugging. The API for exporting these fields to userspace has not yet been decided upon. The device argument will become mandatory for pm_stay_awake, pm_relax, and pm_wakeup_event, so that the counter fields can be updated properly. > > In the scheme we're talking about, the suspend-blocking interface for > > uesrspace would be located entirely in the power manager. Hence it > > could maintain all the necessary statistics, without involving the > > kernel. During early stages of system startup, before the power > > manager is running, lower-level services would not be able to block > > suspend. This wouldn't matter because at those times the system simply > > would not ever suspend -- because suspends have to be initiated by the > > power manager. > > If we do that, a low level process could try to block suspend while > the power manager is not running. Then the power manager could start > and decide to suspend not knowing that a low level process wanted to > block suspend. There's a simple solution: Put all this in a _new_, low-level power manager program that can be started very early during boot-up (before any of the other low-level services that need to block suspend). Then all the other programs would communicate their suspend-blocker requirements to this program instead of to the kernel. You expressed concern about the new power manager deadlocking. This can be avoided by designing the program properly. It should run in two threads. Thread 1 will handle IPC, keeping track of userspace suspend-blocker requests, their statistics, and so on. Thread 2 will manage the PM interface to the kernel, and it will execute the following simple loop: for (;;) { read /sys/power/wakeup_count /* blocking read */ if (there are any active userspace suspend blockers) { wait until they are all deactivated } else { if (write to /sys/power/wakeup_count succeeds) write "mem" to /sys/power/state } } I think this will resolve the problem you were worried about. > >> >> I don't know if they are all kernel-internal but these drivers appear > >> >> to use timeouts and active cancellation on the same wakelock: > >> >> wifi driver, mmc core, alarm driver, evdev (suspend blocker version > >> >> removes the timeout). > > > > Roughly speaking, what do the timings work out to be? That is, how > > long are the timeouts for these wakelocks, how long does it usually > > take for one of them to be actively cancelled, and what percentage of > > them end up timing out? > > > The evdev timeout is a few seconds, and never timeout unless user > space is misbehaving. I don't know how the wifi and mmc wakelocks are > used. The alarm driver uses a 1 or 2 second timeout to abort suspend > when an alarm is set to close to program an rtc alarm. I think this > one is cancelled when the alarm triggers, and the timeout only handles > the case where the client cancelled the alarm before it trigger, but I > don't remember for sure. Evdev turns out to be tricky for a couple of reasons. One is this business of combining active cancellation with timeouts (we might end up needing a special-purpose timer for this). Another is the way all input events funnel into a single event queue. For example, suppose you want the keyboard to be a wakeup device but not the mouse. Once events get into the input queue, there's no way to tell where they came from. Hence mouse motion events would prevent the system from suspending, even though they weren't supposed to. > You cannot easily mix timeouts, cancellations and nesting. > Wakelocks/suspend blockers allow you to mix timeouts and > cancellations. That's true. And it turns out that we do need to mix cancellations with nesting. For example, it may well happen that pm_request_resume gets called several times before the pm workqueue thread gets around to resuming the device. If each of those calls has a corresponding call to pm_stay_awake then we would like to cancel all of them with a single pm_relax after the driver's runtime_resume callback is finished. On the other hand, the driver's runtime_resume routine might need to carry out some work in a different kernel thread. It would call pm_stay_awake, and we wouldn't want _this_ call to be cancelled until the other thread has run. In the end, the best answer may be to keep a count of pending wakeup events in the new wakeup structure (as well as a global count of the number of devices whose pending count is > 0). Then pm_stay_awake would increment the count, pm_relax would decrement it, and there could be a variant of pm_stay_awake that would increment the count only if it was currently equal to 0. > The cleanup is a little simpler when you don't nest and there is less > chance that a missed edge case prevents suspend forever (we had a few > of those bugs in our userspace code that used nested wakelocks). Bugs in the kernel can be tracked down and fixed. > If > the above is all you do, then you block suspend forever if someone > closes an non-empty input device. That sounds like a fixable bug. :-) Alan Stern ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Wakeup-events implementation 2010-08-13 15:59 ` Wakeup-events implementation Alan Stern @ 2010-08-14 4:59 ` Arve Hjønnevåg 2010-08-16 19:22 ` Alan Stern 0 siblings, 1 reply; 18+ messages in thread From: Arve Hjønnevåg @ 2010-08-14 4:59 UTC (permalink / raw) To: Alan Stern; +Cc: Linux-pm mailing list 2010/8/13 Alan Stern <stern@rowland.harvard.edu>: > Since this area of the discussion has strayed away from the path of > Paul McKenney's original summary, I'm starting a new email thread and > drastically cutting the CC list. > > On Sun, 8 Aug 2010, Arve Hjønnevåg wrote: > >> > Maybe. There aren't enough examples coded up yet to be sure. At this >> > point we could easily make the device argument required instead of >> > optional, and we could add a device argument to pm_relax. >> > >> I think that would be a good start. I can create dummy devices for now >> where I don't already have a device. > > Rafael and I spent some time discussing all this during LinuxCon. > Combining what he thought with my own ideas led us to decide that the > wakeup information should be stored in a separate structure, not in > dev_pm_info. These structures will be allocated dynamically when a > device is enabled for remote wakeup (and presumably deallocated if a > device is disabled for remote wakeup). A pointer to the new structure > will be stored in dev_pm_info. We expect that most devices won't need > a wakeup structure, because most devices aren't wakeup sources. > > This new structure (as yet unnamed) will be a lot like your > suspend_blocker structure, but with some differences. It won't need a > name, since the device already has a name. A name is still useful if we allocate multiple handles per device (see below for why one handle per device is not enough). > It won't have a timer, if > we can get away with leaving one out. The suspend_blocker structure does not have a timer, it has a timeout. This was intentional, since it avoids unnecessary timers from firing but it also means it takes a lot less space than the stats. We do need the timeout in the structure to support some of our drivers since they mix wake_lock_timeout and wake_unlock. > But it will optionally contain > various counters that can be used for statistics reporting and > debugging. The API for exporting these fields to userspace has not yet > been decided upon. > > The device argument will become mandatory for pm_stay_awake, pm_relax, > and pm_wakeup_event, so that the counter fields can be updated > properly. > I think this is a step in the right direction, but it is not enough to support mixing timeouts and pm_relax in all the drivers. >> > In the scheme we're talking about, the suspend-blocking interface for >> > uesrspace would be located entirely in the power manager. Hence it >> > could maintain all the necessary statistics, without involving the >> > kernel. During early stages of system startup, before the power >> > manager is running, lower-level services would not be able to block >> > suspend. This wouldn't matter because at those times the system simply >> > would not ever suspend -- because suspends have to be initiated by the >> > power manager. >> >> If we do that, a low level process could try to block suspend while >> the power manager is not running. Then the power manager could start >> and decide to suspend not knowing that a low level process wanted to >> block suspend. > > There's a simple solution: Put all this in a _new_, low-level power > manager program that can be started very early during boot-up (before > any of the other low-level services that need to block suspend). Then > all the other programs would communicate their suspend-blocker > requirements to this program instead of to the kernel. > Maintaining an out of mainline driver to export allow user space to block suspend seems like a more appealing solution than adding another process for everyone to depend on. Having a user space interface to block suspend in the mainline kernel would still be valuable, though. Like I mentioned in another message, this would allow some low level services, like the user-space battery monitor some devices have, to not be android specific. > You expressed concern about the new power manager deadlocking. This > can be avoided by designing the program properly. It should run in two > threads. Thread 1 will handle IPC, keeping track of userspace > suspend-blocker requests, their statistics, and so on. Thread 2 will > manage the PM interface to the kernel, and it will execute the > following simple loop: > > for (;;) { > read /sys/power/wakeup_count /* blocking read */ > if (there are any active userspace suspend blockers) { > wait until they are all deactivated > } else { > if (write to /sys/power/wakeup_count succeeds) > write "mem" to /sys/power/state > } > } > > I think this will resolve the problem you were worried about. > I know how to avoid the deadlock. My point was that adding the code in the drivers to support the overlapping wakelock model of handling wakeup events, means that a single threaded power manager that did not deadlock before now could deadlock. >> >> >> I don't know if they are all kernel-internal but these drivers appear >> >> >> to use timeouts and active cancellation on the same wakelock: >> >> >> wifi driver, mmc core, alarm driver, evdev (suspend blocker version >> >> >> removes the timeout). >> > >> > Roughly speaking, what do the timings work out to be? That is, how >> > long are the timeouts for these wakelocks, how long does it usually >> > take for one of them to be actively cancelled, and what percentage of >> > them end up timing out? >> > >> The evdev timeout is a few seconds, and never timeout unless user >> space is misbehaving. I don't know how the wifi and mmc wakelocks are >> used. The alarm driver uses a 1 or 2 second timeout to abort suspend >> when an alarm is set to close to program an rtc alarm. I think this >> one is cancelled when the alarm triggers, and the timeout only handles >> the case where the client cancelled the alarm before it trigger, but I >> don't remember for sure. > > Evdev turns out to be tricky for a couple of reasons. One is this > business of combining active cancellation with timeouts (we might end > up needing a special-purpose timer for this). Another is the way all > input events funnel into a single event queue. > Input events do not funnel to a single event queue in the kernel (at least not with the interface I have used). There are separate queues for each client of every device. > For example, suppose you want the keyboard to be a wakeup device but > not the mouse. Once events get into the input queue, there's no way to > tell where they came from. Hence mouse motion events would prevent the > system from suspending, even though they weren't supposed to. > >> You cannot easily mix timeouts, cancellations and nesting. >> Wakelocks/suspend blockers allow you to mix timeouts and >> cancellations. > > That's true. And it turns out that we do need to mix cancellations > with nesting. For example, it may well happen that pm_request_resume > gets called several times before the pm workqueue thread gets around to > resuming the device. If each of those calls has a corresponding call > to pm_stay_awake then we would like to cancel all of them with a single > pm_relax after the driver's runtime_resume callback is finished. > > On the other hand, the driver's runtime_resume routine might need to > carry out some work in a different kernel thread. It would call > pm_stay_awake, and we wouldn't want _this_ call to be cancelled until > the other thread has run. > > In the end, the best answer may be to keep a count of pending wakeup > events in the new wakeup structure (as well as a global count of the > number of devices whose pending count is > 0). Then pm_stay_awake > would increment the count, pm_relax would decrement it, and there could > be a variant of pm_stay_awake that would increment the count only if it > was currently equal to 0. > If I correctly understand what you are suggesting, that does not work. If two subsystems use the same device an one of them expects the pm_stay_awake call to be nested and the other one does not, then the subsystem that uses the non-nested api is not preventing suspend if the other subsystem was preventing suspend when it called pm_stay_awake_if_not_zero (the non-nested pm_relax variant would cause similar problems). Also, to support all our existing drivers we need multiple handles per device. For each input device we have a wakelock with a timeout per client. We only allow suspend when all the clients have emptied their queue or had their wakelock timeout. >> The cleanup is a little simpler when you don't nest and there is less >> chance that a missed edge case prevents suspend forever (we had a few >> of those bugs in our userspace code that used nested wakelocks). > > Bugs in the kernel can be tracked down and fixed. > Yes, bugs can be fixed, but not all bugs are discovered quickly and products ship before some bugs are fixed. For instance, when applying our patch that holds a wakelock when the input queue is not empty to this weeks kernel, I noticed that someone had fixed a bug where a queue overflow could leave the queue empty. If you hit this bug with the wakelock with timeout implementation that we currently use, then it would cause suspend to be blocked for 5 seconds (less if another event occurred or the file descriptor was closed). With the patch that uses a suspend blocker without a timeout, it would block suspend until another event from the same device occurred or user space closes the device. With a nesting api like the current pm_stay_awake/pm_relax, it will prevent suspend forever (unless another driver has the opposite bug). Do we need the nested api for something? We need a non-nested api to mix timeouts and pm_relax, and unless we need the nested variants, it is easier to just make all the calls non-nesting. >> If >> the above is all you do, then you block suspend forever if someone >> closes an non-empty input device. > > That sounds like a fixable bug. :-) > > Alan Stern > > -- Arve Hjønnevåg ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Wakeup-events implementation 2010-08-14 4:59 ` Arve Hjønnevåg @ 2010-08-16 19:22 ` Alan Stern 2010-08-17 0:30 ` Arve Hjønnevåg 0 siblings, 1 reply; 18+ messages in thread From: Alan Stern @ 2010-08-16 19:22 UTC (permalink / raw) To: Arve Hjønnevåg; +Cc: Linux-pm mailing list On Fri, 13 Aug 2010, Arve Hjønnevåg wrote: > > Rafael and I spent some time discussing all this during LinuxCon. > > Combining what he thought with my own ideas led us to decide that the > > wakeup information should be stored in a separate structure, not in > > dev_pm_info. These structures will be allocated dynamically when a > > device is enabled for remote wakeup (and presumably deallocated if a > > device is disabled for remote wakeup). A pointer to the new structure > > will be stored in dev_pm_info. We expect that most devices won't need > > a wakeup structure, because most devices aren't wakeup sources. > > > > This new structure (as yet unnamed) will be a lot like your > > suspend_blocker structure, but with some differences. It won't need a > > name, since the device already has a name. > > A name is still useful if we allocate multiple handles per device (see > below for why one handle per device is not enough). Are the input queues are the only place where this is needed? And BTW, I'm puzzled with regard to something you wrote earlier: > > ... I can see how relying on devices alone > > wouldn't give enough information. There could be two separate programs > > both reading input events, and you wouldn't know which one was > > responsible for failing to drain its queue. > > > We don't know which one now either since we don't give them unique > names, but we know that one of them is not reading (and it is usually > not the normal input path). If you don't keep track of the input queues by name, why do you need a separate wakeup structure for each queue? > > It won't have a timer, if > > we can get away with leaving one out. > > The suspend_blocker structure does not have a timer, it has a timeout. > This was intentional, since it avoids unnecessary timers from firing > but it also means it takes a lot less space than the stats. We do need > the timeout in the structure to support some of our drivers since they > mix wake_lock_timeout and wake_unlock. A timeout is okay. > > The device argument will become mandatory for pm_stay_awake, pm_relax, > > and pm_wakeup_event, so that the counter fields can be updated > > properly. > > > I think this is a step in the right direction, but it is not enough to > support mixing timeouts and pm_relax in all the drivers. We'll figure out the best approach. You mentioned that when the timeouts and pm_relax do get mixed, the timeout value is around 1 or 2 seconds. For these cases, would it be so terrible just to use the timeout alone and forget about pm_relax? > Maintaining an out of mainline driver to export allow user space to > block suspend seems like a more appealing solution than adding another > process for everyone to depend on. Having a user space interface to > block suspend in the mainline kernel would still be valuable, though. > Like I mentioned in another message, this would allow some low level > services, like the user-space battery monitor some devices have, to > not be android specific. Adding out-of-mainline drivers is of course always possible. As for your second point, I guess it's a matter of perspective. You see the kernel as always trying to suspend, with userspace needing to do something special to keep it awake. Rafael and I see it as userspace occasionally asking the kernel to suspend, and the kernel delaying or aborting if a wakeup event occurs. This is because we believe that suspends should be initiated by userspace, not by the kernel. From our point of view there doesn't have to be a user interface to block suspends; you only need a way for one part of userspace to tell another to stop requesting a suspend. This could become non-Android-specific as well, if people decide on some standard way for system management programs to decide when the system should go to sleep. > > In the end, the best answer may be to keep a count of pending wakeup > > events in the new wakeup structure (as well as a global count of the > > number of devices whose pending count is > 0). Then pm_stay_awake > > would increment the count, pm_relax would decrement it, and there could > > be a variant of pm_stay_awake that would increment the count only if it > > was currently equal to 0. > > > > If I correctly understand what you are suggesting, that does not work. > If two subsystems use the same device an one of them expects the > pm_stay_awake call to be nested and the other one does not, then the > subsystem that uses the non-nested api is not preventing suspend if > the other subsystem was preventing suspend when it called > pm_stay_awake_if_not_zero (the non-nested pm_relax variant would cause > similar problems). Do you have any examples where two subsystems use the same device but with differing expectations? In the examples I can think of, the non-nesting call is used only when a wakeup request is received (which happens only when the device is at low power), so the nesting calls can't interfere. > Also, to support all our existing drivers we need > multiple handles per device. For each input device we have a wakelock > with a timeout per client. We only allow suspend when all the clients > have emptied their queue or had their wakelock timeout. We may end up attaching these wakeup structures not just to devices but also to input queues (or other kernel->userspace communication channels). > > Bugs in the kernel can be tracked down and fixed. > > > > Yes, bugs can be fixed, but not all bugs are discovered quickly and > products ship before some bugs are fixed. For instance, when applying > our patch that holds a wakelock when the input queue is not empty to > this weeks kernel, I noticed that someone had fixed a bug where a > queue overflow could leave the queue empty. If you hit this bug with > the wakelock with timeout implementation that we currently use, then > it would cause suspend to be blocked for 5 seconds (less if another > event occurred or the file descriptor was closed). With the patch that > uses a suspend blocker without a timeout, it would block suspend until > another event from the same device occurred or user space closes the > device. With a nesting api like the current pm_stay_awake/pm_relax, it > will prevent suspend forever (unless another driver has the opposite > bug). > > Do we need the nested api for something? We need a non-nested api to > mix timeouts and pm_relax, and unless we need the nested variants, it > is easier to just make all the calls non-nesting. Here's an example where nested pm_stay_awake calls are needed. Suppose a USB root hub is enabled for wakeup (in other words, plugging or unplugging a USB device into the computer should wake the system up). Suppose the root hub is at low power (autosuspended) when a wakeup event occurs. The interrupt handler calls pm_stay_awake and indirectly invokes the USB runtime-resume routine. This routine brings the root hub back to full power, informs the hub driver, and then calls pm_relax. But the hub driver carries out its main work in a separate kernel thread (i.e., khubd). So the hub_resume routine has to call pm_stay_awake, and then khubd calls pm_relax when it is finished processing the root hub's event. Thus we end up with: usb_runtime_resume: call pm_stay_awake (1) call hub_resume hub_resume: call pm_stay_awake (2) usb_runtime_resume: call pm_relax (balances 1) khubd: call pm_relax (balances 2) As you can see, this won't work right if the pm wakeup calls aren't cumulative. ("Nested" isn't quite the right word.) The system would be allowed to sleep before khubd could process the root hub's wakeup event. Alan Stern ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Wakeup-events implementation 2010-08-16 19:22 ` Alan Stern @ 2010-08-17 0:30 ` Arve Hjønnevåg 2010-08-17 15:53 ` Alan Stern 0 siblings, 1 reply; 18+ messages in thread From: Arve Hjønnevåg @ 2010-08-17 0:30 UTC (permalink / raw) To: Alan Stern; +Cc: Linux-pm mailing list 2010/8/16 Alan Stern <stern@rowland.harvard.edu>: > On Fri, 13 Aug 2010, Arve Hjønnevåg wrote: > >> > Rafael and I spent some time discussing all this during LinuxCon. >> > Combining what he thought with my own ideas led us to decide that the >> > wakeup information should be stored in a separate structure, not in >> > dev_pm_info. These structures will be allocated dynamically when a >> > device is enabled for remote wakeup (and presumably deallocated if a >> > device is disabled for remote wakeup). A pointer to the new structure >> > will be stored in dev_pm_info. We expect that most devices won't need >> > a wakeup structure, because most devices aren't wakeup sources. >> > >> > This new structure (as yet unnamed) will be a lot like your >> > suspend_blocker structure, but with some differences. It won't need a >> > name, since the device already has a name. >> >> A name is still useful if we allocate multiple handles per device (see >> below for why one handle per device is not enough). > > Are the input queues are the only place where this is needed? > I don't know if it is the only place that is using more than one wakelock with a timeout per device. There are other drivers that use more than one wakelock, but it is possible they could use a single nested lock instead (with some detail loss in the the stats). I'm not sure every wakelock can be associated with a device though. For instance the mmc core wakelock does not have a corresponding device. > And BTW, I'm puzzled with regard to something you wrote earlier: > >> > ... I can see how relying on devices alone >> > wouldn't give enough information. There could be two separate programs >> > both reading input events, and you wouldn't know which one was >> > responsible for failing to drain its queue. >> > >> We don't know which one now either since we don't give them unique >> names, but we know that one of them is not reading (and it is usually >> not the normal input path). > > If you don't keep track of the input queues by name, why do you need a > separate wakeup structure for each queue? > Primarily for the timeouts, but the individual stats are still useful even if they don't name the client. >> > It won't have a timer, if >> > we can get away with leaving one out. >> >> The suspend_blocker structure does not have a timer, it has a timeout. >> This was intentional, since it avoids unnecessary timers from firing >> but it also means it takes a lot less space than the stats. We do need >> the timeout in the structure to support some of our drivers since they >> mix wake_lock_timeout and wake_unlock. > > A timeout is okay. > >> > The device argument will become mandatory for pm_stay_awake, pm_relax, >> > and pm_wakeup_event, so that the counter fields can be updated >> > properly. >> > >> I think this is a step in the right direction, but it is not enough to >> support mixing timeouts and pm_relax in all the drivers. > > We'll figure out the best approach. You mentioned that when the > timeouts and pm_relax do get mixed, the timeout value is around 1 or 2 > seconds. For these cases, would it be so terrible just to use the > timeout alone and forget about pm_relax? > The timeout used when mixing timeouts and wake_unlock are generally longer than when only using only timeouts since the timeouts are used to cover edge cases were wake_unlock did not get called. Allowing timeouts and pm_relax to be mixed may allow a gradual path from using timeouts to block suspend only when needed (timeouts only -> timeouts for edge cases -> no timeouts). We have also had bugs, where sensors generate input events whenever the system is not suspended. The 5 second timeout we have on input events is not short enough to allow the system to suspend at all in this case. >> Maintaining an out of mainline driver to export allow user space to >> block suspend seems like a more appealing solution than adding another >> process for everyone to depend on. Having a user space interface to >> block suspend in the mainline kernel would still be valuable, though. >> Like I mentioned in another message, this would allow some low level >> services, like the user-space battery monitor some devices have, to >> not be android specific. > > Adding out-of-mainline drivers is of course always possible. As for > your second point, I guess it's a matter of perspective. You see the > kernel as always trying to suspend, with userspace needing to do > something special to keep it awake. I see the kernel as trying to suspend if user space asked it to suspend (until it user space asks the kernel to stop suspending). Wakeup events that don't reach user space should not need user space support to reenter suspend. > Rafael and I see it as userspace > occasionally asking the kernel to suspend, and the kernel delaying or > aborting if a wakeup event occurs. This is because we believe that > suspends should be initiated by userspace, not by the kernel. You want each suspend attempt to be initiated by user space. Suspend is initiated from user space in both cases, the difference is that you want the suspend request to be one-shot, which means that the kernel may cancel the user space suspend request, where I want user space to explicitly cancel the request. > From our > point of view there doesn't have to be a user interface to block > suspends; you only need a way for one part of userspace to tell another > to stop requesting a suspend. > The user space interface to block suspend is not all that relevant here. Both suspend interfaces can be used with or without a user space interface to block suspend. > This could become non-Android-specific as well, if people decide on > some standard way for system management programs to decide when the > system should go to sleep. > >> > In the end, the best answer may be to keep a count of pending wakeup >> > events in the new wakeup structure (as well as a global count of the >> > number of devices whose pending count is > 0). Then pm_stay_awake >> > would increment the count, pm_relax would decrement it, and there could >> > be a variant of pm_stay_awake that would increment the count only if it >> > was currently equal to 0. >> > >> >> If I correctly understand what you are suggesting, that does not work. >> If two subsystems use the same device an one of them expects the >> pm_stay_awake call to be nested and the other one does not, then the >> subsystem that uses the non-nested api is not preventing suspend if >> the other subsystem was preventing suspend when it called >> pm_stay_awake_if_not_zero (the non-nested pm_relax variant would cause >> similar problems). > > Do you have any examples where two subsystems use the same device but > with differing expectations? In the examples I can think of, the > non-nesting call is used only when a wakeup request is received (which > happens only when the device is at low power), so the nesting calls > can't interfere. > The wakeup request need to block suspend until the driver has seen the event. If the driver and the wakeup code uses the same handle how can you avoid interference? >> Also, to support all our existing drivers we need >> multiple handles per device. For each input device we have a wakelock >> with a timeout per client. We only allow suspend when all the clients >> have emptied their queue or had their wakelock timeout. > > We may end up attaching these wakeup structures not just to devices but > also to input queues (or other kernel->userspace communication > channels). > That is we do now (with wakelocks), but I think we need to pass this handle to the pm_stay_wake... calls, and not the device, for this to work. >> > Bugs in the kernel can be tracked down and fixed. >> > >> >> Yes, bugs can be fixed, but not all bugs are discovered quickly and >> products ship before some bugs are fixed. For instance, when applying >> our patch that holds a wakelock when the input queue is not empty to >> this weeks kernel, I noticed that someone had fixed a bug where a >> queue overflow could leave the queue empty. If you hit this bug with >> the wakelock with timeout implementation that we currently use, then >> it would cause suspend to be blocked for 5 seconds (less if another >> event occurred or the file descriptor was closed). With the patch that >> uses a suspend blocker without a timeout, it would block suspend until >> another event from the same device occurred or user space closes the >> device. With a nesting api like the current pm_stay_awake/pm_relax, it >> will prevent suspend forever (unless another driver has the opposite >> bug). >> >> Do we need the nested api for something? We need a non-nested api to >> mix timeouts and pm_relax, and unless we need the nested variants, it >> is easier to just make all the calls non-nesting. > > Here's an example where nested pm_stay_awake calls are needed. > Suppose a USB root hub is enabled for wakeup (in other words, plugging > or unplugging a USB device into the computer should wake the system > up). Suppose the root hub is at low power (autosuspended) when a > wakeup event occurs. The interrupt handler calls pm_stay_awake and > indirectly invokes the USB runtime-resume routine. This routine > brings the root hub back to full power, informs the hub driver, and > then calls pm_relax. > > But the hub driver carries out its main work in a separate kernel > thread (i.e., khubd). So the hub_resume routine has to call > pm_stay_awake, and then khubd calls pm_relax when it is finished > processing the root hub's event. Thus we end up with: > > usb_runtime_resume: call pm_stay_awake (1) > call hub_resume > hub_resume: call pm_stay_awake (2) > usb_runtime_resume: call pm_relax (balances 1) > khubd: call pm_relax (balances 2) > > As you can see, this won't work right if the pm wakeup calls aren't > cumulative. ("Nested" isn't quite the right word.) The system would > be allowed to sleep before khubd could process the root hub's wakeup > event. > Too me this is an example of where two handles are needed, not where nesting is needed. Yes, you can use a single nested handle here, but the stats will not tell you which driver blocked suspend if there is a problem. -- Arve Hjønnevåg ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Wakeup-events implementation 2010-08-17 0:30 ` Arve Hjønnevåg @ 2010-08-17 15:53 ` Alan Stern 2010-08-17 18:37 ` David Brownell ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Alan Stern @ 2010-08-17 15:53 UTC (permalink / raw) To: Arve Hjønnevåg; +Cc: Linux-pm mailing list On Mon, 16 Aug 2010, Arve Hjønnevåg wrote: > >> A name is still useful if we allocate multiple handles per device (see > >> below for why one handle per device is not enough). > > > > Are the input queues are the only place where this is needed? > > > > I don't know if it is the only place that is using more than one > wakelock with a timeout per device. There are other drivers that use > more than one wakelock, but it is possible they could use a single > nested lock instead (with some detail loss in the the stats). I'm not > sure every wakelock can be associated with a device though. For > instance the mmc core wakelock does not have a corresponding device. What other drivers need more than one wakelock per device? For that matter, why does the MMC core need wakelocks at all? I wasn't aware that memory cards could generate wakeup events. > > We'll figure out the best approach. You mentioned that when the > > timeouts and pm_relax do get mixed, the timeout value is around 1 or 2 > > seconds. For these cases, would it be so terrible just to use the > > timeout alone and forget about pm_relax? > > > > The timeout used when mixing timeouts and wake_unlock are generally > longer than when only using only timeouts since the timeouts are used > to cover edge cases were wake_unlock did not get called. Allowing > timeouts and pm_relax to be mixed may allow a gradual path from using > timeouts to block suspend only when needed (timeouts only -> timeouts > for edge cases -> no timeouts). In fact, I rather expected you would say that the combined cancellations/timeouts were useful for pinpointing applications that don't read their input events quickly enough. If the wakelocks were always allowed to time out instead of being cancelled when the data was read, you wouldn't be able to tell whether the data was read in a timely manner or not. > We have also had bugs, where sensors generate input events whenever > the system is not suspended. The 5 second timeout we have on input > events is not short enough to allow the system to suspend at all in > this case. This sounds like the sort of bug that would be found very quickly and easily if you didn't have active cancellation. It would be pretty obvious when the system didn't suspend at all. > I see the kernel as trying to suspend if user space asked it to > suspend (until it user space asks the kernel to stop suspending). > Wakeup events that don't reach user space should not need user space > support to reenter suspend. That's debatable. For example, consider a Wake-On-LAN magic packet. It generally doesn't go anywhere -- it gets dropped once the system is awake. > You want each suspend attempt to be initiated by user space. Suspend > is initiated from user space in both cases, the difference is that you > want the suspend request to be one-shot, which means that the kernel > may cancel the user space suspend request, where I want user space to > explicitly cancel the request. Yes, that's a fair way to describe the situation. > > Do you have any examples where two subsystems use the same device but > > with differing expectations? In the examples I can think of, the > > non-nesting call is used only when a wakeup request is received (which > > happens only when the device is at low power), so the nesting calls > > can't interfere. > > > > The wakeup request need to block suspend until the driver has seen the > event. If the driver and the wakeup code uses the same handle how can > you avoid interference? Let's say the interrupt handlers responsible for invoking pm_request_resume use non-nesting calls. This may happen several times, depending on the wakeup and interrupt delivery mechanism, but the end result is that the counter is incremented only once. When the pm_runtime_resume call ends, pm_relax does a single decrement. Meanwhile, the driver uses nesting calls. It can increment the counter as much as it wants, and it has to perform an equal number of decrements. The key point is that pm_request_resume never gets called while the driver has any outstanding pm_stay_awake requests. Hence the non-nesting calls never got "lost", since the first one is always made while the counter is 0. > > We may end up attaching these wakeup structures not just to devices but > > also to input queues (or other kernel->userspace communication > > channels). > > > > That is we do now (with wakelocks), but I think we need to pass this > handle to the pm_stay_wake... calls, and not the device, for this to > work. You may be right. Or we may need additional calls taking the handle as an argument rather than the device. The design requirements are still simmering in my mind, and I'm waiting to hear what Rafael thinks. > > Here's an example where nested pm_stay_awake calls are needed. > > Suppose a USB root hub is enabled for wakeup (in other words, plugging > > or unplugging a USB device into the computer should wake the system > > up). Suppose the root hub is at low power (autosuspended) when a > > wakeup event occurs. The interrupt handler calls pm_stay_awake and > > indirectly invokes the USB runtime-resume routine. This routine > > brings the root hub back to full power, informs the hub driver, and > > then calls pm_relax. > > > > But the hub driver carries out its main work in a separate kernel > > thread (i.e., khubd). So the hub_resume routine has to call > > pm_stay_awake, and then khubd calls pm_relax when it is finished > > processing the root hub's event. Thus we end up with: > > > > usb_runtime_resume: call pm_stay_awake (1) > > call hub_resume > > hub_resume: call pm_stay_awake (2) > > usb_runtime_resume: call pm_relax (balances 1) > > khubd: call pm_relax (balances 2) > > > > As you can see, this won't work right if the pm wakeup calls aren't > > cumulative. ("Nested" isn't quite the right word.) The system would > > be allowed to sleep before khubd could process the root hub's wakeup > > event. > > > > Too me this is an example of where two handles are needed, not where > nesting is needed. Yes, you can use a single nested handle here, but > the stats will not tell you which driver blocked suspend if there is a > problem. It sounds like you're suggesting that handles should be allocated per-driver rather than per-device. That would tell you which driver blocked suspend, but not which device. This raises a few other questions. With Android-based handsets, you're in a setting where you have a pretty good idea of what all the wakeup sources are and how they might get used. On a general system this isn't so. Many devices are potential wakeup sources, even though only a few of them end up getting enabled. In the example above, I said "Suppose a USB root hub is enabled for wakeup". But usually people won't want to do this; they don't want their suspended laptops to wake up when a USB mouse is unplugged or plugged in. Allocating these wakeup structures for all possible wakeup sources would be a big waste. On the other hand, a driver generally doesn't know until a suspend starts whether or not its devices should be enabled for wakeup. That makes things difficult, especially if you contemplate having multiple structures for each device. Have you really sat down and thought about what wakelocks would be needed on a general system such as a desktop or laptop machine? Also, I frankly have to wonder why you demand such a large amount of detailed debugging about the kernel's wakeup/wakelock subsystem (as opposed to userspace wakelocks). Yes, it's clear that if something goes wrong then you will drain your phone's battery in a few hours rather than a few days. But what if something goes wrong with the VM subsystem, or the RF subsystem, or the keypad driver? The device would crash immediately or at best become almost totally unusable. These are much worse scenarios than losing some battery life -- have you added comparable quantities of debugging to those other parts of the kernel? In addition, you retain all these debugging facilities in your production builds, not just in the debug kernels. Doesn't this seem excessive? Alan Stern ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Wakeup-events implementation 2010-08-17 15:53 ` Alan Stern @ 2010-08-17 18:37 ` David Brownell 2010-08-17 19:19 ` Alan Stern 2010-08-17 19:21 ` Rafael J. Wysocki 2010-08-18 7:04 ` Wakeup-events implementation Florian Mickler 2 siblings, 1 reply; 18+ messages in thread From: David Brownell @ 2010-08-17 18:37 UTC (permalink / raw) To: Arve Hjønnevåg, Alan Stern; +Cc: Linux-pm mailing list --- On Tue, 8/17/10, Alan Stern <stern@rowland.harvard.edu> wrote: I wasn't > aware that memory cards could generate wakeup events. Almost all MMC/SD hardware can generate reliable insert/remove events, in my experience. So MMC/SD drivers I've modified tend to treat them as wake events ... userspace can respond without polling constantly "did it change yet, huh, did it???" Unfortunately Pierre spent a lot of time hard-wiring UNSAFE_RESUME assumptions into the MMC core, so lots of that stuff is currently (and needlessly) broken. And of more experimental interest, something I don't recall being discussed here: I/O completions as wake events. Start an I/O, enter low power state, and wake when it completes. Easily done on lots of embedded hardware.. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Wakeup-events implementation 2010-08-17 18:37 ` David Brownell @ 2010-08-17 19:19 ` Alan Stern 0 siblings, 0 replies; 18+ messages in thread From: Alan Stern @ 2010-08-17 19:19 UTC (permalink / raw) To: David Brownell; +Cc: Linux-pm mailing list On Tue, 17 Aug 2010, David Brownell wrote: > --- On Tue, 8/17/10, Alan Stern <stern@rowland.harvard.edu> wrote: > > I wasn't > > aware that memory cards could generate wakeup events. > > Almost all MMC/SD hardware can generate reliable > insert/remove events, in my experience. So MMC/SD > drivers I've modified tend to treat them as wake > events ... userspace can respond without polling > constantly "did it change yet, huh, did it???" This is interesting in light of the fact that testing has shown many USB card readers don't track insert/remove events while in low-power mode. When they return to full power, if a card is present they always report a media change -- even if the card had been there all along. This makes autosuspend less useful than it might be. > Unfortunately Pierre spent a lot of time hard-wiring > UNSAFE_RESUME assumptions into the MMC core, so lots > of that stuff is currently (and needlessly) broken. There's a difference between generating wakeup events for insert/remove and keeping track of media-change events. I wouldn't want my sleeping laptop to wake up when I insert or remove a memory card, but I would like it to recognize when a card has been changed. If the hardware can do one but not the other then Pierre's choice is understandable (although not in accord with Linus's preference). Alan Stern ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Wakeup-events implementation 2010-08-17 15:53 ` Alan Stern 2010-08-17 18:37 ` David Brownell @ 2010-08-17 19:21 ` Rafael J. Wysocki 2010-08-18 0:40 ` Arve Hjønnevåg 2010-08-18 7:04 ` Wakeup-events implementation Florian Mickler 2 siblings, 1 reply; 18+ messages in thread From: Rafael J. Wysocki @ 2010-08-17 19:21 UTC (permalink / raw) To: Alan Stern; +Cc: Linux-pm mailing list On Tuesday, August 17, 2010, Alan Stern wrote: > On Mon, 16 Aug 2010, Arve Hjønnevåg wrote: > > > >> A name is still useful if we allocate multiple handles per device (see > > >> below for why one handle per device is not enough). > > > > > > Are the input queues are the only place where this is needed? > > > > > > > I don't know if it is the only place that is using more than one > > wakelock with a timeout per device. There are other drivers that use > > more than one wakelock, but it is possible they could use a single > > nested lock instead (with some detail loss in the the stats). I'm not > > sure every wakelock can be associated with a device though. For > > instance the mmc core wakelock does not have a corresponding device. > > What other drivers need more than one wakelock per device? > > For that matter, why does the MMC core need wakelocks at all? I wasn't > aware that memory cards could generate wakeup events. Yes, that's interesting, although I _guess_ this is related to the events generated by inserting/removing a card. > > > We'll figure out the best approach. You mentioned that when the > > > timeouts and pm_relax do get mixed, the timeout value is around 1 or 2 > > > seconds. For these cases, would it be so terrible just to use the > > > timeout alone and forget about pm_relax? > > > > > > > The timeout used when mixing timeouts and wake_unlock are generally > > longer than when only using only timeouts since the timeouts are used > > to cover edge cases were wake_unlock did not get called. Allowing > > timeouts and pm_relax to be mixed may allow a gradual path from using > > timeouts to block suspend only when needed (timeouts only -> timeouts > > for edge cases -> no timeouts). > > In fact, I rather expected you would say that the combined > cancellations/timeouts were useful for pinpointing applications that > don't read their input events quickly enough. If the wakelocks were > always allowed to time out instead of being cancelled when the data was > read, you wouldn't be able to tell whether the data was read in a > timely manner or not. > > > We have also had bugs, where sensors generate input events whenever > > the system is not suspended. The 5 second timeout we have on input > > events is not short enough to allow the system to suspend at all in > > this case. > > This sounds like the sort of bug that would be found very quickly and > easily if you didn't have active cancellation. It would be pretty > obvious when the system didn't suspend at all. > > > I see the kernel as trying to suspend if user space asked it to > > suspend (until it user space asks the kernel to stop suspending). > > Wakeup events that don't reach user space should not need user space > > support to reenter suspend. > > That's debatable. For example, consider a Wake-On-LAN magic packet. > It generally doesn't go anywhere -- it gets dropped once the system is > awake. Yes. I think every event that would wake up a sleeping system should result in unfreezing user space. > > You want each suspend attempt to be initiated by user space. Suspend > > is initiated from user space in both cases, Not really. With wakelocks suspend may be initiated as a result of a driver dropping its wakelock, which doesn't qualify as "from user space" to me. > > the difference is that you want the suspend request to be one-shot, which > > means that the kernel may cancel the user space suspend request, where I > > want user space to explicitly cancel the request. > > Yes, that's a fair way to describe the situation. I somewhat agree. > > > Do you have any examples where two subsystems use the same device but > > > with differing expectations? In the examples I can think of, the > > > non-nesting call is used only when a wakeup request is received (which > > > happens only when the device is at low power), so the nesting calls > > > can't interfere. > > > > > > > The wakeup request need to block suspend until the driver has seen the > > event. If the driver and the wakeup code uses the same handle how can > > you avoid interference? > > Let's say the interrupt handlers responsible for invoking > pm_request_resume use non-nesting calls. This may happen several > times, depending on the wakeup and interrupt delivery mechanism, but > the end result is that the counter is incremented only once. When the > pm_runtime_resume call ends, pm_relax does a single decrement. > > Meanwhile, the driver uses nesting calls. It can increment the counter > as much as it wants, and it has to perform an equal number of > decrements. The key point is that pm_request_resume never gets called > while the driver has any outstanding pm_stay_awake requests. Hence the > non-nesting calls never got "lost", since the first one is always made > while the counter is 0. > > > > We may end up attaching these wakeup structures not just to devices but > > > also to input queues (or other kernel->userspace communication > > > channels). > > > > > > > That is we do now (with wakelocks), but I think we need to pass this > > handle to the pm_stay_wake... calls, and not the device, for this to > > work. > > You may be right. Or we may need additional calls taking the handle as > an argument rather than the device. The design requirements are still > simmering in my mind, and I'm waiting to hear what Rafael thinks. I actually like the idea of having two levels of calls, ie. pm_stay_awake(dev) calling something like __pm_stay_awake(handle), where the handle is read from struct dev_pm_info. And analogously for pm_relax(). Then, the object associated with the handle might be created when the device is enabled to wake up and destroyed when it's disabled to do that. OTOH, if it's really necessary to use a "raw" handle (ie. without struct device), the caller of __pm_stay_awake() will allocate and free the "wakelock" object by itself as necessary. As far as the names are concerned, I'd like to avoid allocating space for them whenever it is not necessary (like when the "wakelock" is pointed to by struct dev_pm_info, in which case it is simply sufficient to use the device name to refer to it). > > > Here's an example where nested pm_stay_awake calls are needed. > > > Suppose a USB root hub is enabled for wakeup (in other words, plugging > > > or unplugging a USB device into the computer should wake the system > > > up). Suppose the root hub is at low power (autosuspended) when a > > > wakeup event occurs. The interrupt handler calls pm_stay_awake and > > > indirectly invokes the USB runtime-resume routine. This routine > > > brings the root hub back to full power, informs the hub driver, and > > > then calls pm_relax. > > > > > > But the hub driver carries out its main work in a separate kernel > > > thread (i.e., khubd). So the hub_resume routine has to call > > > pm_stay_awake, and then khubd calls pm_relax when it is finished > > > processing the root hub's event. Thus we end up with: > > > > > > usb_runtime_resume: call pm_stay_awake (1) > > > call hub_resume > > > hub_resume: call pm_stay_awake (2) > > > usb_runtime_resume: call pm_relax (balances 1) > > > khubd: call pm_relax (balances 2) > > > > > > As you can see, this won't work right if the pm wakeup calls aren't > > > cumulative. ("Nested" isn't quite the right word.) The system would > > > be allowed to sleep before khubd could process the root hub's wakeup > > > event. > > > > > > > Too me this is an example of where two handles are needed, not where > > nesting is needed. Yes, you can use a single nested handle here, but > > the stats will not tell you which driver blocked suspend if there is a > > problem. > > It sounds like you're suggesting that handles should be allocated > per-driver rather than per-device. That would tell you which driver > blocked suspend, but not which device. > > This raises a few other questions. > > With Android-based handsets, you're in a setting where you have a > pretty good idea of what all the wakeup sources are and how they might > get used. On a general system this isn't so. Many devices are > potential wakeup sources, even though only a few of them end up getting > enabled. In the example above, I said "Suppose a USB root hub is > enabled for wakeup". But usually people won't want to do this; they > don't want their suspended laptops to wake up when a USB mouse is > unplugged or plugged in. > > Allocating these wakeup structures for all possible wakeup sources > would be a big waste. Indeed. > On the other hand, a driver generally doesn't > know until a suspend starts whether or not its devices should be > enabled for wakeup. That makes things difficult, especially if you > contemplate having multiple structures for each device. Have you > really sat down and thought about what wakelocks would be needed on a > general system such as a desktop or laptop machine? > > Also, I frankly have to wonder why you demand such a large amount of > detailed debugging about the kernel's wakeup/wakelock subsystem (as > opposed to userspace wakelocks). That also is my question. > Yes, it's clear that if something > goes wrong then you will drain your phone's battery in a few hours > rather than a few days. But what if something goes wrong with the VM > subsystem, or the RF subsystem, or the keypad driver? The device would > crash immediately or at best become almost totally unusable. These are > much worse scenarios than losing some battery life -- have you added > comparable quantities of debugging to those other parts of the kernel? > > In addition, you retain all these debugging facilities in your > production builds, not just in the debug kernels. Doesn't this seem > excessive? It definitely seems so. Thanks, Rafael ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Wakeup-events implementation 2010-08-17 19:21 ` Rafael J. Wysocki @ 2010-08-18 0:40 ` Arve Hjønnevåg 2010-08-18 10:32 ` Mark Brown 2010-08-18 15:20 ` Alan Stern 0 siblings, 2 replies; 18+ messages in thread From: Arve Hjønnevåg @ 2010-08-18 0:40 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Linux-pm mailing list 2010/8/17 Rafael J. Wysocki <rjw@sisk.pl>: > On Tuesday, August 17, 2010, Alan Stern wrote: >> On Mon, 16 Aug 2010, Arve Hjønnevåg wrote: >> >> > >> A name is still useful if we allocate multiple handles per device (see >> > >> below for why one handle per device is not enough). >> > > >> > > Are the input queues are the only place where this is needed? >> > > >> > >> > I don't know if it is the only place that is using more than one >> > wakelock with a timeout per device. There are other drivers that use >> > more than one wakelock, but it is possible they could use a single >> > nested lock instead (with some detail loss in the the stats). I'm not >> > sure every wakelock can be associated with a device though. For >> > instance the mmc core wakelock does not have a corresponding device. >> >> What other drivers need more than one wakelock per device? >> >> For that matter, why does the MMC core need wakelocks at all? I wasn't >> aware that memory cards could generate wakeup events. > > Yes, that's interesting, although I _guess_ this is related to the events > generated by inserting/removing a card. > That is my guess too, but it is also possible sdio wakeup events could also hit this code path. >> > > We'll figure out the best approach. You mentioned that when the >> > > timeouts and pm_relax do get mixed, the timeout value is around 1 or 2 >> > > seconds. For these cases, would it be so terrible just to use the >> > > timeout alone and forget about pm_relax? >> > > >> > >> > The timeout used when mixing timeouts and wake_unlock are generally >> > longer than when only using only timeouts since the timeouts are used >> > to cover edge cases were wake_unlock did not get called. Allowing >> > timeouts and pm_relax to be mixed may allow a gradual path from using >> > timeouts to block suspend only when needed (timeouts only -> timeouts >> > for edge cases -> no timeouts). >> >> In fact, I rather expected you would say that the combined >> cancellations/timeouts were useful for pinpointing applications that >> don't read their input events quickly enough. If the wakelocks were >> always allowed to time out instead of being cancelled when the data was >> read, you wouldn't be able to tell whether the data was read in a >> timely manner or not. >> >> > We have also had bugs, where sensors generate input events whenever >> > the system is not suspended. The 5 second timeout we have on input >> > events is not short enough to allow the system to suspend at all in >> > this case. >> >> This sounds like the sort of bug that would be found very quickly and >> easily if you didn't have active cancellation. It would be pretty >> obvious when the system didn't suspend at all. >> We don't uncover all bugs during testing before we ship. For instance the bug could be triggered by a 3db party app, and there is value in making the system as robust as possible to both driver and user space bugs. >> > I see the kernel as trying to suspend if user space asked it to >> > suspend (until it user space asks the kernel to stop suspending). >> > Wakeup events that don't reach user space should not need user space >> > support to reenter suspend. >> >> That's debatable. For example, consider a Wake-On-LAN magic packet. >> It generally doesn't go anywhere -- it gets dropped once the system is >> awake. > > Yes. I think every event that would wake up a sleeping system should result > in unfreezing user space. > Returning all the way to user space for every wakeup event is a poor substitute for delivering the actual event to user space. If user space want to stay awake for 60 seconds after every wake-on-lan packet, how can it accomplish this if it does not get the event. If user space delays suspend for 60 seconds after every failed suspend attempt, a wake-on-lan packet that arrives just before it decided to suspend will be ignored. >> > You want each suspend attempt to be initiated by user space. Suspend >> > is initiated from user space in both cases, > > Not really. With wakelocks suspend may be initiated as a result of a driver > dropping its wakelock, which doesn't qualify as "from user space" to me. > Sure, but only if user-space had already requested suspend. >> > the difference is that you want the suspend request to be one-shot, which >> > means that the kernel may cancel the user space suspend request, where I >> > want user space to explicitly cancel the request. >> >> Yes, that's a fair way to describe the situation. > > I somewhat agree. > >> > > Do you have any examples where two subsystems use the same device but >> > > with differing expectations? In the examples I can think of, the >> > > non-nesting call is used only when a wakeup request is received (which >> > > happens only when the device is at low power), so the nesting calls >> > > can't interfere. >> > > >> > >> > The wakeup request need to block suspend until the driver has seen the >> > event. If the driver and the wakeup code uses the same handle how can >> > you avoid interference? >> >> Let's say the interrupt handlers responsible for invoking >> pm_request_resume use non-nesting calls. This may happen several >> times, depending on the wakeup and interrupt delivery mechanism, but >> the end result is that the counter is incremented only once. When the >> pm_runtime_resume call ends, pm_relax does a single decrement. >> >> Meanwhile, the driver uses nesting calls. It can increment the counter >> as much as it wants, and it has to perform an equal number of >> decrements. The key point is that pm_request_resume never gets called >> while the driver has any outstanding pm_stay_awake requests. Hence the >> non-nesting calls never got "lost", since the first one is always made >> while the counter is 0. >> It sounds as if you are saying that the two drivers don't interfere because one of them always gets there first. In your example if another interrupt occurs before the second driver still has released all its references on the counter, the interrupt will (incorrectly) not increment the count. >> > > We may end up attaching these wakeup structures not just to devices but >> > > also to input queues (or other kernel->userspace communication >> > > channels). >> > > >> > >> > That is we do now (with wakelocks), but I think we need to pass this >> > handle to the pm_stay_wake... calls, and not the device, for this to >> > work. >> >> You may be right. Or we may need additional calls taking the handle as >> an argument rather than the device. The design requirements are still >> simmering in my mind, and I'm waiting to hear what Rafael thinks. > > I actually like the idea of having two levels of calls, ie. pm_stay_awake(dev) > calling something like __pm_stay_awake(handle), where the handle is read from > struct dev_pm_info. And analogously for pm_relax(). > > Then, the object associated with the handle might be created when the device > is enabled to wake up and destroyed when it's disabled to do that. > When multiple drivers use the same device, who creates and destroys the handle? > OTOH, if it's really necessary to use a "raw" handle (ie. without struct > device), the caller of __pm_stay_awake() will allocate and free the > "wakelock" object by itself as necessary. > > As far as the names are concerned, I'd like to avoid allocating space for them > whenever it is not necessary (like when the "wakelock" is pointed to by struct > dev_pm_info, in which case it is simply sufficient to use the device name to > refer to it). > >> > > Here's an example where nested pm_stay_awake calls are needed. >> > > Suppose a USB root hub is enabled for wakeup (in other words, plugging >> > > or unplugging a USB device into the computer should wake the system >> > > up). Suppose the root hub is at low power (autosuspended) when a >> > > wakeup event occurs. The interrupt handler calls pm_stay_awake and >> > > indirectly invokes the USB runtime-resume routine. This routine >> > > brings the root hub back to full power, informs the hub driver, and >> > > then calls pm_relax. >> > > >> > > But the hub driver carries out its main work in a separate kernel >> > > thread (i.e., khubd). So the hub_resume routine has to call >> > > pm_stay_awake, and then khubd calls pm_relax when it is finished >> > > processing the root hub's event. Thus we end up with: >> > > >> > > usb_runtime_resume: call pm_stay_awake (1) >> > > call hub_resume >> > > hub_resume: call pm_stay_awake (2) >> > > usb_runtime_resume: call pm_relax (balances 1) >> > > khubd: call pm_relax (balances 2) >> > > >> > > As you can see, this won't work right if the pm wakeup calls aren't >> > > cumulative. ("Nested" isn't quite the right word.) The system would >> > > be allowed to sleep before khubd could process the root hub's wakeup >> > > event. >> > > >> > >> > Too me this is an example of where two handles are needed, not where >> > nesting is needed. Yes, you can use a single nested handle here, but >> > the stats will not tell you which driver blocked suspend if there is a >> > problem. >> >> It sounds like you're suggesting that handles should be allocated >> per-driver rather than per-device. That would tell you which driver >> blocked suspend, but not which device. >> Typically a driver puts a wakelock in the state that it already allocates for each device, so you have a handle for each device/driver combo that needs to block suspend. >> This raises a few other questions. >> >> With Android-based handsets, you're in a setting where you have a >> pretty good idea of what all the wakeup sources are and how they might >> get used. On a general system this isn't so. Many devices are >> potential wakeup sources, even though only a few of them end up getting >> enabled. In the example above, I said "Suppose a USB root hub is >> enabled for wakeup". But usually people won't want to do this; they >> don't want their suspended laptops to wake up when a USB mouse is >> unplugged or plugged in. >> >> Allocating these wakeup structures for all possible wakeup sources >> would be a big waste. > > Indeed. > >> On the other hand, a driver generally doesn't >> know until a suspend starts whether or not its devices should be >> enabled for wakeup. Why? How is the driver notified on suspend that it needs to be enabled for wakeup. >> That makes things difficult, especially if you >> contemplate having multiple structures for each device. Have you >> really sat down and thought about what wakelocks would be needed on a >> general system such as a desktop or laptop machine? No, I don't know what wakelocks are needed on a general system, but you need code to turn the wakeup events on and off events anyway, so I assume allocating a handle for the wakeup event is possible. >> >> Also, I frankly have to wonder why you demand such a large amount of >> detailed debugging about the kernel's wakeup/wakelock subsystem (as >> opposed to userspace wakelocks). > > That also is my question. > It is not that different from what you get from /proc/timer_list, /proc/meminfo etc. When the system does not suspend I want to know why. >> Yes, it's clear that if something >> goes wrong then you will drain your phone's battery in a few hours >> rather than a few days. But what if something goes wrong with the VM >> subsystem, or the RF subsystem, or the keypad driver? The device would >> crash immediately or at best become almost totally unusable. These are >> much worse scenarios than losing some battery life -- have you added >> comparable quantities of debugging to those other parts of the kernel? >> We do track crashes and the kernel already dumps detailed debug information in that case. Also, I don't agree that a kernel crash is always worse than draining the battery. If you are away from a power source and the kernel crashes, the device reboots and you have a usable device again. If the battery drained, you cannot use the device again until you get to a power source. >> In addition, you retain all these debugging facilities in your >> production builds, not just in the debug kernels. Doesn't this seem >> excessive? > > It definitely seems so. > There are two reasons for keeping the debugging facilities in production builds. 1. The debugging information is useful for debugging problems in the field. 2. We want to test with the same code we ship. -- Arve Hjønnevåg ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Wakeup-events implementation 2010-08-18 0:40 ` Arve Hjønnevåg @ 2010-08-18 10:32 ` Mark Brown 2010-08-18 15:20 ` Alan Stern 1 sibling, 0 replies; 18+ messages in thread From: Mark Brown @ 2010-08-18 10:32 UTC (permalink / raw) To: Arve Hj?nnev?g; +Cc: Linux-pm mailing list On Tue, Aug 17, 2010 at 05:40:37PM -0700, Arve Hj?nnev?g wrote: > 2010/8/17 Rafael J. Wysocki <rjw@sisk.pl>: > >> On the other hand, a driver generally doesn't > >> know until a suspend starts whether or not its devices should be > >> enabled for wakeup. > Why? How is the driver notified on suspend that it needs to be enabled > for wakeup. There's the wakeup API in pm_wakeup.h, plus subsystem specific knowledge about what sensible defaults for the hardware in much the same way as we handle what to do when we suspend. > >> In addition, you retain all these debugging facilities in your > >> production builds, not just in the debug kernels. ?Doesn't this seem > >> excessive? > > It definitely seems so. > There are two reasons for keeping the debugging facilities in production builds. > 1. The debugging information is useful for debugging problems in the field. > 2. We want to test with the same code we ship. Plus the fact that power usage tends to be very dynamic often with strong dependencies on usage patterns. Power usage is something that end users will want to debug as well as developers, though most likely not in the same detail. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Wakeup-events implementation 2010-08-18 0:40 ` Arve Hjønnevåg 2010-08-18 10:32 ` Mark Brown @ 2010-08-18 15:20 ` Alan Stern 2010-08-18 19:17 ` Rafael J. Wysocki 1 sibling, 1 reply; 18+ messages in thread From: Alan Stern @ 2010-08-18 15:20 UTC (permalink / raw) To: Arve Hjønnevåg; +Cc: Linux-pm mailing list On Tue, 17 Aug 2010, Arve Hjønnevåg wrote: > We don't uncover all bugs during testing before we ship. For instance > the bug could be triggered by a 3db party app, and there is value in > making the system as robust as possible to both driver and user space > bugs. By that argument, all in-kernel wakelocks should always have a timeout. What happens if somebody puts down their phone and then puts something on top of it, in such a way that one of the keypad buttons is depressed? If the keypad driver's wakelock doesn't have a timeout then the battery would drain quickly. > >> > Wakeup events that don't reach user space should not need user space > >> > support to reenter suspend. > >> > >> That's debatable. For example, consider a Wake-On-LAN magic packet. > >> It generally doesn't go anywhere -- it gets dropped once the system is > >> awake. > > > > Yes. I think every event that would wake up a sleeping system should result > > in unfreezing user space. In fact they do -- there's almost no way to avoid it. However, unfreezing userspace is different from sending a wakeup event to userspace. The question is whether the kernel should automatically try to re-suspend following a wakeup for which no information was sent to userspace (e.g., WOL). The alternative, of course, is that the kernel does not try to re-suspend until a power-manager program tells it to. To me this seems more flexible and general (the PM program can always start a new suspend, whereas there's no way to prevent an automatic in-kernel re-suspend). > >> Let's say the interrupt handlers responsible for invoking > >> pm_request_resume use non-nesting calls. This may happen several > >> times, depending on the wakeup and interrupt delivery mechanism, but > >> the end result is that the counter is incremented only once. When the > >> pm_runtime_resume call ends, pm_relax does a single decrement. > >> > >> Meanwhile, the driver uses nesting calls. It can increment the counter > >> as much as it wants, and it has to perform an equal number of > >> decrements. The key point is that pm_request_resume never gets called > >> while the driver has any outstanding pm_stay_awake requests. Hence the > >> non-nesting calls never got "lost", since the first one is always made > >> while the counter is 0. > >> > > It sounds as if you are saying that the two drivers don't interfere > because one of them always gets there first. In your example if > another interrupt occurs before the second driver still has released > all its references on the counter, the interrupt will (incorrectly) > not increment the count. That's right -- one of them always gets there first. You're also right to suspect that this is an unusual situation. It arises with USB root hubs because a root hub is the only child of its parent controller device, and the only reason a controller would generate a wakeup request is because the root hub needs to wake up. This means that a root-hub wakeup can be signalled in two different ways: If the controller is in low-power then the controller sends its wakeup request. The controller's driver sets the controller back to full power, sees that the root hub has needs attention, and calls pm_request_resume for the root hub. If the controller isn't in low-power then the root hub simply generates an IRQ. The interrupt handler calls pm_request_resume for the root hub. Depending on how the timing works out, both can occur: The controller gets set to full power and then the IRQ line is signalled before the PM workqueue gets around to resuming the root hub. I have seen this happen while testing my computer. With a little care, I could avoid calling pm_stay_awake twice. The code already sets a bitflag before queuing the resume request, and I could use test_and_set_bit instead. If I did then the non-nesting calls wouldn't be needed at all. And that was the only case I could think of where the two types of calls might need to be mixed. > > I actually like the idea of having two levels of calls, ie. pm_stay_awake(dev) > > calling something like __pm_stay_awake(handle), where the handle is read from > > struct dev_pm_info. And analogously for pm_relax(). > > > > Then, the object associated with the handle might be created when the device > > is enabled to wake up and destroyed when it's disabled to do that. > > > > When multiple drivers use the same device, who creates and destroys the handle? The primary driver (i.e., the one bound to the device) would use the device's handle. Other drivers would create and use their own separate wakeup structures. But if all the calls are of the "nesting" sort then there's not really any need for multiple structures, other than to help identify which driver is misbehaving when a problem occurs. > Typically a driver puts a wakelock in the state that it already > allocates for each device, so you have a handle for each device/driver > combo that needs to block suspend. You didn't answer my question about which devices have more than one associated wakelock in Android. Those would be the only cases where the device/driver combo matters. Also, why does Android's MMC core have its own wakelock instead of using a per-device wakelock? > >> Allocating these wakeup structures for all possible wakeup sources > >> would be a big waste. > > > > Indeed. > > > >> On the other hand, a driver generally doesn't > >> know until a suspend starts whether or not its devices should be > >> enabled for wakeup. > > Why? How is the driver notified on suspend that it needs to be enabled > for wakeup. The driver's (or bus subsystem's) suspend routine calls device_may_wakeup(dev) to see whether it should enable dev's wakeup mechanism. The value returned by device_may_wakeup can change at almost any time, since it is set by userspace. > >> That makes things difficult, especially if you > >> contemplate having multiple structures for each device. Have you > >> really sat down and thought about what wakelocks would be needed on a > >> general system such as a desktop or laptop machine? > > No, I don't know what wakelocks are needed on a general system, but > you need code to turn the wakeup events on and off events anyway, so I > assume allocating a handle for the wakeup event is possible. Yes. However system suspend is generally not a good time to go around allocating new structures; memory tends to be tight then. I suppose it might be acceptable to do this during the "prepare" phase of system suspend. Alan Stern ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Wakeup-events implementation 2010-08-18 15:20 ` Alan Stern @ 2010-08-18 19:17 ` Rafael J. Wysocki 2010-08-19 0:18 ` Arve Hjønnevåg 0 siblings, 1 reply; 18+ messages in thread From: Rafael J. Wysocki @ 2010-08-18 19:17 UTC (permalink / raw) To: Alan Stern; +Cc: Linux-pm mailing list On Wednesday, August 18, 2010, Alan Stern wrote: > On Tue, 17 Aug 2010, Arve Hjønnevåg wrote: > > > We don't uncover all bugs during testing before we ship. For instance > > the bug could be triggered by a 3db party app, and there is value in > > making the system as robust as possible to both driver and user space > > bugs. > > By that argument, all in-kernel wakelocks should always have a timeout. > > What happens if somebody puts down their phone and then puts something > on top of it, in such a way that one of the keypad buttons is > depressed? If the keypad driver's wakelock doesn't have a timeout then > the battery would drain quickly. > > > >> > Wakeup events that don't reach user space should not need user space > > >> > support to reenter suspend. > > >> > > >> That's debatable. For example, consider a Wake-On-LAN magic packet. > > >> It generally doesn't go anywhere -- it gets dropped once the system is > > >> awake. > > > > > > Yes. I think every event that would wake up a sleeping system should result > > > in unfreezing user space. > > In fact they do -- there's almost no way to avoid it. However, > unfreezing userspace is different from sending a wakeup event to > userspace. > > The question is whether the kernel should automatically try to > re-suspend following a wakeup for which no information was sent to > userspace (e.g., WOL). The alternative, of course, is that the kernel > does not try to re-suspend until a power-manager program tells it to. > To me this seems more flexible and general (the PM program can always > start a new suspend, whereas there's no way to prevent an automatic > in-kernel re-suspend). I agree. > > >> Let's say the interrupt handlers responsible for invoking > > >> pm_request_resume use non-nesting calls. This may happen several > > >> times, depending on the wakeup and interrupt delivery mechanism, but > > >> the end result is that the counter is incremented only once. When the > > >> pm_runtime_resume call ends, pm_relax does a single decrement. > > >> > > >> Meanwhile, the driver uses nesting calls. It can increment the counter > > >> as much as it wants, and it has to perform an equal number of > > >> decrements. The key point is that pm_request_resume never gets called > > >> while the driver has any outstanding pm_stay_awake requests. Hence the > > >> non-nesting calls never got "lost", since the first one is always made > > >> while the counter is 0. > > >> > > > > It sounds as if you are saying that the two drivers don't interfere > > because one of them always gets there first. In your example if > > another interrupt occurs before the second driver still has released > > all its references on the counter, the interrupt will (incorrectly) > > not increment the count. > > That's right -- one of them always gets there first. You're also right > to suspect that this is an unusual situation. It arises with USB root > hubs because a root hub is the only child of its parent controller > device, and the only reason a controller would generate a wakeup > request is because the root hub needs to wake up. > > This means that a root-hub wakeup can be signalled in two different > ways: > > If the controller is in low-power then the controller sends its > wakeup request. The controller's driver sets the controller > back to full power, sees that the root hub has needs attention, > and calls pm_request_resume for the root hub. > > If the controller isn't in low-power then the root hub simply > generates an IRQ. The interrupt handler calls > pm_request_resume for the root hub. > > Depending on how the timing works out, both can occur: The controller > gets set to full power and then the IRQ line is signalled before the PM > workqueue gets around to resuming the root hub. I have seen this > happen while testing my computer. > > With a little care, I could avoid calling pm_stay_awake twice. The > code already sets a bitflag before queuing the resume request, and I > could use test_and_set_bit instead. If I did then the non-nesting > calls wouldn't be needed at all. And that was the only case I could > think of where the two types of calls might need to be mixed. That means one corner case less to worry about, good. :-) > > > I actually like the idea of having two levels of calls, ie. pm_stay_awake(dev) > > > calling something like __pm_stay_awake(handle), where the handle is read from > > > struct dev_pm_info. And analogously for pm_relax(). > > > > > > Then, the object associated with the handle might be created when the device > > > is enabled to wake up and destroyed when it's disabled to do that. > > > > > > > When multiple drivers use the same device, who creates and destroys the handle? > > The primary driver (i.e., the one bound to the device) would use the > device's handle. Other drivers would create and use their own separate > wakeup structures. > > But if all the calls are of the "nesting" sort then there's not really > any need for multiple structures, other than to help identify which > driver is misbehaving when a problem occurs. Still, IMO we can do that if it would help to narrow the gap between the Android kernel and the mainline. > > Typically a driver puts a wakelock in the state that it already > > allocates for each device, so you have a handle for each device/driver > > combo that needs to block suspend. > > You didn't answer my question about which devices have more than one > associated wakelock in Android. Those would be the only cases where > the device/driver combo matters. > > Also, why does Android's MMC core have its own wakelock instead of > using a per-device wakelock? > > > >> Allocating these wakeup structures for all possible wakeup sources > > >> would be a big waste. > > > > > > Indeed. > > > > > >> On the other hand, a driver generally doesn't > > >> know until a suspend starts whether or not its devices should be > > >> enabled for wakeup. > > > > Why? How is the driver notified on suspend that it needs to be enabled > > for wakeup. > > The driver's (or bus subsystem's) suspend routine calls > device_may_wakeup(dev) to see whether it should enable dev's wakeup > mechanism. The value returned by device_may_wakeup can change at > almost any time, since it is set by userspace. > > > >> That makes things difficult, especially if you > > >> contemplate having multiple structures for each device. Have you > > >> really sat down and thought about what wakelocks would be needed on a > > >> general system such as a desktop or laptop machine? > > > > No, I don't know what wakelocks are needed on a general system, but > > you need code to turn the wakeup events on and off events anyway, so I > > assume allocating a handle for the wakeup event is possible. > > Yes. However system suspend is generally not a good time to go around > allocating new structures; memory tends to be tight then. Well, that's what I wanted to do originally in pm_stay_awake(), but you convinced me it was avoidable. :-) > I suppose it might be acceptable to do this during the "prepare" phase of > system suspend. I generally think it would make sense to allocate a structure, call it struct wakeup_source for now, for each device or whatever part of the system that we want to treat as a wakeup source and put all of the statistics in there. For devices it would be pretty straightforward, the structure is allocated as soon as the device is enabled to wakeup and freed when it is disabled. In the other cases it would have to depend on the usage scenario, I guess. Rafael ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Wakeup-events implementation 2010-08-18 19:17 ` Rafael J. Wysocki @ 2010-08-19 0:18 ` Arve Hjønnevåg 2010-08-19 21:09 ` Alan Stern 0 siblings, 1 reply; 18+ messages in thread From: Arve Hjønnevåg @ 2010-08-19 0:18 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Linux-pm mailing list 2010/8/18 Rafael J. Wysocki <rjw@sisk.pl>: > On Wednesday, August 18, 2010, Alan Stern wrote: >> On Tue, 17 Aug 2010, Arve Hjønnevåg wrote: >> >> > We don't uncover all bugs during testing before we ship. For instance >> > the bug could be triggered by a 3db party app, and there is value in >> > making the system as robust as possible to both driver and user space >> > bugs. >> >> By that argument, all in-kernel wakelocks should always have a timeout. >> I don't think that is the only conclusion. >> What happens if somebody puts down their phone and then puts something >> on top of it, in such a way that one of the keypad buttons is >> depressed? If the keypad driver's wakelock doesn't have a timeout then >> the battery would drain quickly. This is a real problem that might be worth solving, but just adding a wakelock timeout does not solve it since the keyboard driver would either renew the timeout after every scan, or it would leave the keyboard in a state where you could not use it to wake up the device anymore. >> >> > >> > Wakeup events that don't reach user space should not need user space >> > >> > support to reenter suspend. >> > >> >> > >> That's debatable. For example, consider a Wake-On-LAN magic packet. >> > >> It generally doesn't go anywhere -- it gets dropped once the system is >> > >> awake. >> > > >> > > Yes. I think every event that would wake up a sleeping system should result >> > > in unfreezing user space. >> >> In fact they do -- there's almost no way to avoid it. However, >> unfreezing userspace is different from sending a wakeup event to >> userspace. >> >> The question is whether the kernel should automatically try to >> re-suspend following a wakeup for which no information was sent to >> userspace (e.g., WOL). The alternative, of course, is that the kernel >> does not try to re-suspend until a power-manager program tells it to. >> To me this seems more flexible and general (the PM program can always >> start a new suspend, whereas there's no way to prevent an automatic >> in-kernel re-suspend). > > I agree. > Having user space initiate every suspend operation is more flexible out of the box, but you can get the same flexibility with the other approach by adding an interface to report all wakeup events and failed suspend attempts. However, I have not seen a use case for this flexibility other than as a partial workaround for wakeup events that user space needs to know about but are not reported to user space any other way (it is only a partial workaround since it still has the same race condition we are fixing for other wakeup events). >> > >> Let's say the interrupt handlers responsible for invoking >> > >> pm_request_resume use non-nesting calls. This may happen several >> > >> times, depending on the wakeup and interrupt delivery mechanism, but >> > >> the end result is that the counter is incremented only once. When the >> > >> pm_runtime_resume call ends, pm_relax does a single decrement. >> > >> >> > >> Meanwhile, the driver uses nesting calls. It can increment the counter >> > >> as much as it wants, and it has to perform an equal number of >> > >> decrements. The key point is that pm_request_resume never gets called >> > >> while the driver has any outstanding pm_stay_awake requests. Hence the >> > >> non-nesting calls never got "lost", since the first one is always made >> > >> while the counter is 0. >> > >> >> > >> > It sounds as if you are saying that the two drivers don't interfere >> > because one of them always gets there first. In your example if >> > another interrupt occurs before the second driver still has released >> > all its references on the counter, the interrupt will (incorrectly) >> > not increment the count. >> >> That's right -- one of them always gets there first. You're also right >> to suspect that this is an unusual situation. It arises with USB root >> hubs because a root hub is the only child of its parent controller >> device, and the only reason a controller would generate a wakeup >> request is because the root hub needs to wake up. >> >> This means that a root-hub wakeup can be signalled in two different >> ways: >> >> If the controller is in low-power then the controller sends its >> wakeup request. The controller's driver sets the controller >> back to full power, sees that the root hub has needs attention, >> and calls pm_request_resume for the root hub. >> >> If the controller isn't in low-power then the root hub simply >> generates an IRQ. The interrupt handler calls >> pm_request_resume for the root hub. >> >> Depending on how the timing works out, both can occur: The controller >> gets set to full power and then the IRQ line is signalled before the PM >> workqueue gets around to resuming the root hub. I have seen this >> happen while testing my computer. >> >> With a little care, I could avoid calling pm_stay_awake twice. The >> code already sets a bitflag before queuing the resume request, and I >> could use test_and_set_bit instead. If I did then the non-nesting >> calls wouldn't be needed at all. And that was the only case I could >> think of where the two types of calls might need to be mixed. > > That means one corner case less to worry about, good. :-) > >> > > I actually like the idea of having two levels of calls, ie. pm_stay_awake(dev) >> > > calling something like __pm_stay_awake(handle), where the handle is read from >> > > struct dev_pm_info. And analogously for pm_relax(). >> > > >> > > Then, the object associated with the handle might be created when the device >> > > is enabled to wake up and destroyed when it's disabled to do that. >> > > >> > >> > When multiple drivers use the same device, who creates and destroys the handle? >> >> The primary driver (i.e., the one bound to the device) would use the >> device's handle. Other drivers would create and use their own separate >> wakeup structures. >> >> But if all the calls are of the "nesting" sort then there's not really >> any need for multiple structures, other than to help identify which >> driver is misbehaving when a problem occurs. > > Still, IMO we can do that if it would help to narrow the gap between the > Android kernel and the mainline. > Some android drivers currently need the calls to be not nested so that they can mix timeouts and pm_relax. With only nesting apis we cannot get the same functionality. Other drivers rely on the non nesting property of the existing wakelock api but do not use timeouts. I expect these drivers can be converted have the same behavior with a nested api, but there could be difficult edge cases. Non-android driver may need a nested api so they can use a handle that they did not allocate. It is not clear to me if this is really needed. If it is not needed we can make all the calls non nested and require that the driver allocates a handle (or more commonly put the handle in the private data it already allocates). If we do need to keep the nested behavior, we a way to select which variant to use. We can have separate calls that nest and require drivers that use the non-nested variant have their own handle, or we could set a flag when creating a handle that makes calls on that handle not nest. >> > Typically a driver puts a wakelock in the state that it already >> > allocates for each device, so you have a handle for each device/driver >> > combo that needs to block suspend. >> >> You didn't answer my question about which devices have more than one >> associated wakelock in Android. Those would be the only cases where >> the device/driver combo matters. >> Wakelocks are not associated with devices at all right now, so I don't have a list for you. However, the input queues need more than one wakelock per device if they use timeouts. In this case there is only one driver though, but the driver allocates a separate queue for each client that opens a device. Our keypad/gpio-keys driver may also end up using multiple wakelocks. Many phones use different keys types in the same logical keyboard, so you could end up with one wakelock for the matrix scan and another one for debouncing keys connected directly to a gpio. The driver could be changed to share a nested handle, but I prefer to keep them separate. >> Also, why does Android's MMC core have its own wakelock instead of >> using a per-device wakelock? >> There is a global work queue. >> > >> Allocating these wakeup structures for all possible wakeup sources >> > >> would be a big waste. >> > > >> > > Indeed. >> > > >> > >> On the other hand, a driver generally doesn't >> > >> know until a suspend starts whether or not its devices should be >> > >> enabled for wakeup. >> > >> > Why? How is the driver notified on suspend that it needs to be enabled >> > for wakeup. >> >> The driver's (or bus subsystem's) suspend routine calls >> device_may_wakeup(dev) to see whether it should enable dev's wakeup >> mechanism. The value returned by device_may_wakeup can change at >> almost any time, since it is set by userspace. >> It would be better to allocate the structure when user space changes the setting. If you wait until suspend before blocking suspend you may have passed the event on already. >> > >> That makes things difficult, especially if you >> > >> contemplate having multiple structures for each device. Have you >> > >> really sat down and thought about what wakelocks would be needed on a >> > >> general system such as a desktop or laptop machine? >> > >> > No, I don't know what wakelocks are needed on a general system, but >> > you need code to turn the wakeup events on and off events anyway, so I >> > assume allocating a handle for the wakeup event is possible. >> >> Yes. However system suspend is generally not a good time to go around >> allocating new structures; memory tends to be tight then. > I mean the code that lets user space turn wakeup events on and off, not the driver transferring a wakeup source from an interrupt to a wakeup controller. > Well, that's what I wanted to do originally in pm_stay_awake(), but you > convinced me it was avoidable. :-) > >> I suppose it might be acceptable to do this during the "prepare" phase of >> system suspend. > > I generally think it would make sense to allocate a structure, call it struct > wakeup_source for now, for each device or whatever part of the system that we > want to treat as a wakeup source and put all of the statistics in there. > > For devices it would be pretty straightforward, the structure is allocated as > soon as the device is enabled to wakeup and freed when it is disabled. In the > other cases it would have to depend on the usage scenario, I guess. > > Rafael > -- Arve Hjønnevåg ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Wakeup-events implementation 2010-08-19 0:18 ` Arve Hjønnevåg @ 2010-08-19 21:09 ` Alan Stern 2010-08-19 23:12 ` Arve Hjønnevåg 0 siblings, 1 reply; 18+ messages in thread From: Alan Stern @ 2010-08-19 21:09 UTC (permalink / raw) To: Arve Hjønnevåg; +Cc: Linux-pm mailing list On Wed, 18 Aug 2010, Arve Hjønnevåg wrote: > >> By that argument, all in-kernel wakelocks should always have a timeout. > >> > I don't think that is the only conclusion. > > >> What happens if somebody puts down their phone and then puts something > >> on top of it, in such a way that one of the keypad buttons is > >> depressed? If the keypad driver's wakelock doesn't have a timeout then > >> the battery would drain quickly. > > This is a real problem that might be worth solving, but just adding a > wakelock timeout does not solve it since the keyboard driver would > either renew the timeout after every scan, or it would leave the > keyboard in a state where you could not use it to wake up the device > anymore. Okay. Nevertheless, you maintain debugging info in case a driver bug causes a wakelock not to be released. If the wakelocks all had timeouts then a bug of that sort wouldn't be able to affect battery life very much. So why not add the timeouts? Or conversely, if you think the likelihood of such bugs is too low to worry about, then why have the debugging info? > >> The question is whether the kernel should automatically try to > >> re-suspend following a wakeup for which no information was sent to > >> userspace (e.g., WOL). The alternative, of course, is that the kernel > >> does not try to re-suspend until a power-manager program tells it to. > >> To me this seems more flexible and general (the PM program can always > >> start a new suspend, whereas there's no way to prevent an automatic > >> in-kernel re-suspend). > > > > I agree. > > > > Having user space initiate every suspend operation is more flexible > out of the box, but you can get the same flexibility with the other > approach by adding an interface to report all wakeup events and failed > suspend attempts. However, I have not seen a use case for this > flexibility other than as a partial workaround for wakeup events that > user space needs to know about but are not reported to user space any > other way (it is only a partial workaround since it still has the same > race condition we are fixing for other wakeup events). In any case, this is something that could easily be changed with a small Android-specific kernel patch. At the moment it doesn't seem to be a major issue. > >> > When multiple drivers use the same device, who creates and destroys the handle? > >> > >> The primary driver (i.e., the one bound to the device) would use the > >> device's handle. Other drivers would create and use their own separate > >> wakeup structures. > >> > >> But if all the calls are of the "nesting" sort then there's not really > >> any need for multiple structures, other than to help identify which > >> driver is misbehaving when a problem occurs. > > > > Still, IMO we can do that if it would help to narrow the gap between the > > Android kernel and the mainline. > > > > Some android drivers currently need the calls to be not nested so that > they can mix timeouts and pm_relax. With only nesting apis we cannot > get the same functionality. > > Other drivers rely on the non nesting property of the existing > wakelock api but do not use timeouts. I expect these drivers can be > converted have the same behavior with a nested api, but there could be > difficult edge cases. > > Non-android driver may need a nested api so they can use a handle that > they did not allocate. It is not clear to me if this is really needed. > If it is not needed we can make all the calls non nested and require > that the driver allocates a handle (or more commonly put the handle in > the private data it already allocates). If we do need to keep the > nested behavior, we a way to select which variant to use. We can have > separate calls that nest and require drivers that use the non-nested > variant have their own handle, or we could set a flag when creating a > handle that makes calls on that handle not nest. I agree, there are several possible designs. It's hard to tell which will be best without more experience. How much degradation do you think you would observe if in-kernel wakelocks _always_ used timeouts alone and _never_ were cancelled? > >> You didn't answer my question about which devices have more than one > >> associated wakelock in Android. Those would be the only cases where > >> the device/driver combo matters. > >> > > Wakelocks are not associated with devices at all right now, so I don't > have a list for you. However, the input queues need more than one > wakelock per device if they use timeouts. In this case there is only > one driver though, but the driver allocates a separate queue for each > client that opens a device. Our keypad/gpio-keys driver may also end > up using multiple wakelocks. Many phones use different keys types in > the same logical keyboard, so you could end up with one wakelock for > the matrix scan and another one for debouncing keys connected directly > to a gpio. The driver could be changed to share a nested handle, but I > prefer to keep them separate. Okay. In addition, we've already seen that devices for which you haven't yet implemented any wakelocks may end up needing more than one. > >> Also, why does Android's MMC core have its own wakelock instead of > >> using a per-device wakelock? > >> > > There is a global work queue. So you want to be able to cancel the wakelock whenever the queue is empty, right? And not have to keep track of whether each MMC device has any work items in the queue. In theory this is insufficient -- it doesn't handle the case where one MMC device is enabled for wakeup and another isn't. I assume you don't really care about this (especially since Android phones aren't likely to have more than one MMC device capable of generating wakeup events). > >> The driver's (or bus subsystem's) suspend routine calls > >> device_may_wakeup(dev) to see whether it should enable dev's wakeup > >> mechanism. The value returned by device_may_wakeup can change at > >> almost any time, since it is set by userspace. > >> > > It would be better to allocate the structure when user space changes > the setting. If you wait until suspend before blocking suspend you may > have passed the event on already. That will work for the structure directly associated with the device. Additional structures allocated by the drivers have to be handled differently, because the drivers aren't notified when the wakeup-enable setting is changed. Alan Stern ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Wakeup-events implementation 2010-08-19 21:09 ` Alan Stern @ 2010-08-19 23:12 ` Arve Hjønnevåg 2010-08-20 15:04 ` Alan Stern 0 siblings, 1 reply; 18+ messages in thread From: Arve Hjønnevåg @ 2010-08-19 23:12 UTC (permalink / raw) To: Alan Stern; +Cc: Linux-pm mailing list 2010/8/19 Alan Stern <stern@rowland.harvard.edu>: > On Wed, 18 Aug 2010, Arve Hjønnevåg wrote: > >> >> By that argument, all in-kernel wakelocks should always have a timeout. >> >> >> I don't think that is the only conclusion. >> >> >> What happens if somebody puts down their phone and then puts something >> >> on top of it, in such a way that one of the keypad buttons is >> >> depressed? If the keypad driver's wakelock doesn't have a timeout then >> >> the battery would drain quickly. >> >> This is a real problem that might be worth solving, but just adding a >> wakelock timeout does not solve it since the keyboard driver would >> either renew the timeout after every scan, or it would leave the >> keyboard in a state where you could not use it to wake up the device >> anymore. > > Okay. Nevertheless, you maintain debugging info in case a driver bug > causes a wakelock not to be released. If the wakelocks all had > timeouts then a bug of that sort wouldn't be able to affect battery > life very much. So why not add the timeouts? > Because we can't. I just described how the keypad driver would break we just add a timeout to its wakelock. Other drivers, like the usb gadget driver may need to block suspend for an extended time period. > Or conversely, if you think the likelihood of such bugs is too low to > worry about, then why have the debugging info? > That statement assumes all the bugs can be fixed by adding timeouts, which I don't think is true. Also, you don't need to have bugs for the stats to be useful. The gps could for instance prevent suspend for a long time because the user ran a gps tracking app. >> >> The question is whether the kernel should automatically try to >> >> re-suspend following a wakeup for which no information was sent to >> >> userspace (e.g., WOL). The alternative, of course, is that the kernel >> >> does not try to re-suspend until a power-manager program tells it to. >> >> To me this seems more flexible and general (the PM program can always >> >> start a new suspend, whereas there's no way to prevent an automatic >> >> in-kernel re-suspend). >> > >> > I agree. >> > >> >> Having user space initiate every suspend operation is more flexible >> out of the box, but you can get the same flexibility with the other >> approach by adding an interface to report all wakeup events and failed >> suspend attempts. However, I have not seen a use case for this >> flexibility other than as a partial workaround for wakeup events that >> user space needs to know about but are not reported to user space any >> other way (it is only a partial workaround since it still has the same >> race condition we are fixing for other wakeup events). > > In any case, this is something that could easily be changed with a > small Android-specific kernel patch. At the moment it doesn't seem to > be a major issue. > Yes, that seems to be a likely outcome. >> >> > When multiple drivers use the same device, who creates and destroys the handle? >> >> >> >> The primary driver (i.e., the one bound to the device) would use the >> >> device's handle. Other drivers would create and use their own separate >> >> wakeup structures. >> >> >> >> But if all the calls are of the "nesting" sort then there's not really >> >> any need for multiple structures, other than to help identify which >> >> driver is misbehaving when a problem occurs. >> > >> > Still, IMO we can do that if it would help to narrow the gap between the >> > Android kernel and the mainline. >> > >> >> Some android drivers currently need the calls to be not nested so that >> they can mix timeouts and pm_relax. With only nesting apis we cannot >> get the same functionality. >> >> Other drivers rely on the non nesting property of the existing >> wakelock api but do not use timeouts. I expect these drivers can be >> converted have the same behavior with a nested api, but there could be >> difficult edge cases. >> >> Non-android driver may need a nested api so they can use a handle that >> they did not allocate. It is not clear to me if this is really needed. >> If it is not needed we can make all the calls non nested and require >> that the driver allocates a handle (or more commonly put the handle in >> the private data it already allocates). If we do need to keep the >> nested behavior, we a way to select which variant to use. We can have >> separate calls that nest and require drivers that use the non-nested >> variant have their own handle, or we could set a flag when creating a >> handle that makes calls on that handle not nest. > > I agree, there are several possible designs. It's hard to tell which > will be best without more experience. > > How much degradation do you think you would observe if in-kernel > wakelocks _always_ used timeouts alone and _never_ were cancelled? > I don't think that is an option. We have kernel wakelocks that need to be held while the device is in a specific state. >> >> You didn't answer my question about which devices have more than one >> >> associated wakelock in Android. Those would be the only cases where >> >> the device/driver combo matters. >> >> >> >> Wakelocks are not associated with devices at all right now, so I don't >> have a list for you. However, the input queues need more than one >> wakelock per device if they use timeouts. In this case there is only >> one driver though, but the driver allocates a separate queue for each >> client that opens a device. Our keypad/gpio-keys driver may also end >> up using multiple wakelocks. Many phones use different keys types in >> the same logical keyboard, so you could end up with one wakelock for >> the matrix scan and another one for debouncing keys connected directly >> to a gpio. The driver could be changed to share a nested handle, but I >> prefer to keep them separate. > > Okay. In addition, we've already seen that devices for which you > haven't yet implemented any wakelocks may end up needing more than one. > >> >> Also, why does Android's MMC core have its own wakelock instead of >> >> using a per-device wakelock? >> >> >> >> There is a global work queue. > > So you want to be able to cancel the wakelock whenever the queue is > empty, right? And not have to keep track of whether each MMC device > has any work items in the queue. > I have not reviewed that code, so I don't know the details of how it works or even if it is correct, but I see that it uses both timeouts and wake_unlock calls on the same wakelock. > In theory this is insufficient -- it doesn't handle the case where one > MMC device is enabled for wakeup and another isn't. I assume you don't > really care about this (especially since Android phones aren't likely > to have more than one MMC device capable of generating wakeup events). > I'm not sure what you mean by insufficient here. If it not optimal if you also have work that does not need to prevent suspend, but it should still work correctly. >> >> The driver's (or bus subsystem's) suspend routine calls >> >> device_may_wakeup(dev) to see whether it should enable dev's wakeup >> >> mechanism. The value returned by device_may_wakeup can change at >> >> almost any time, since it is set by userspace. >> >> >> >> It would be better to allocate the structure when user space changes >> the setting. If you wait until suspend before blocking suspend you may >> have passed the event on already. > > That will work for the structure directly associated with the device. > Additional structures allocated by the drivers have to be handled > differently, because the drivers aren't notified when the wakeup-enable > setting is changed. > I'm not convinced that is it sufficient to detect that a device is wakeup-enabled on suspend. However it is not a problem I have tried to solve since we have only used static wakeup sources. -- Arve Hjønnevåg ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Wakeup-events implementation 2010-08-19 23:12 ` Arve Hjønnevåg @ 2010-08-20 15:04 ` Alan Stern 2010-09-07 23:51 ` [RFC][PATCH] PM / Wakeup: Introduce wakeup source objects and event statistics (was: Re: Wakeup-events implementation) Rafael J. Wysocki 0 siblings, 1 reply; 18+ messages in thread From: Alan Stern @ 2010-08-20 15:04 UTC (permalink / raw) To: Arve Hjønnevåg; +Cc: Linux-pm mailing list On Thu, 19 Aug 2010, Arve Hjønnevåg wrote: > > Okay. Nevertheless, you maintain debugging info in case a driver bug > > causes a wakelock not to be released. If the wakelocks all had > > timeouts then a bug of that sort wouldn't be able to affect battery > > life very much. So why not add the timeouts? > > > > Because we can't. I just described how the keypad driver would break > we just add a timeout to its wakelock. Other drivers, like the usb > gadget driver may need to block suspend for an extended time period. Some wakelocks can't use timeouts because they may have to be held for extended periods. But the others could have timeouts. > > How much degradation do you think you would observe if in-kernel > > wakelocks _always_ used timeouts alone and _never_ were cancelled? > > > > I don't think that is an option. We have kernel wakelocks that need to > be held while the device is in a specific state. Yes, but what about all the others? What I'm getting at is whether it's feasible for each wakelock to be exclusively used either with timeouts or cancellations, but never both. > > In theory this is insufficient -- it doesn't handle the case where one > > MMC device is enabled for wakeup and another isn't. I assume you don't > > really care about this (especially since Android phones aren't likely > > to have more than one MMC device capable of generating wakeup events). > > > > I'm not sure what you mean by insufficient here. If it not optimal if > you also have work that does not need to prevent suspend, but it > should still work correctly. Yeah, that's what I meant. You would end up blocking suspends unnecessarily. It's okay if it doesn't happen very often. > >> It would be better to allocate the structure when user space changes > >> the setting. If you wait until suspend before blocking suspend you may > >> have passed the event on already. > > > > That will work for the structure directly associated with the device. > > Additional structures allocated by the drivers have to be handled > > differently, because the drivers aren't notified when the wakeup-enable > > setting is changed. > > > > I'm not convinced that is it sufficient to detect that a device is > wakeup-enabled on suspend. It isn't. > However it is not a problem I have tried to > solve since we have only used static wakeup sources. Well, we may need to solve it eventually. One possible solution is to imitate what you're doing with MMC: Use a single per-driver wakelock instead of separate wakelocks for each driver/device combination. I don't know if that would work for all the cases, though. Alan Stern ^ permalink raw reply [flat|nested] 18+ messages in thread
* [RFC][PATCH] PM / Wakeup: Introduce wakeup source objects and event statistics (was: Re: Wakeup-events implementation) 2010-08-20 15:04 ` Alan Stern @ 2010-09-07 23:51 ` Rafael J. Wysocki 0 siblings, 0 replies; 18+ messages in thread From: Rafael J. Wysocki @ 2010-09-07 23:51 UTC (permalink / raw) To: Alan Stern; +Cc: Linux-pm mailing list, LKML Hi, Below is a patch that adds some statistics to the previously merged pm_wakeup_event()/pm_stay_awake()/pm_relax() code. It also makes it possible to use wakeup sources that are not directly associated with devices. It adds functions for manipulating wakeup source objects and reworks the device wakeup enabling/disabling to use the new functions. The list of wakeup sources is only used for updating the "hit count" statistics for now (this is the number of times the wakeup source was active when the PM core checked), but I'm planning to add a /proc file listing all wakeup sources, including the ones that are not attached to device objects. It appears to work with the PCI wakeup code added previously, but that's only one case. I'm also not sure if it builds withoug CONFIG_PM_SLEEP. [BTW, I'm not sure it atomic_inc() and atomic_dec() imply a memory barrier in general. That seems to be the case on x86, but I don't know about other architectures.] Please have a look and let me know what you think. Thanks, Rafael --- drivers/base/power/main.c | 3 drivers/base/power/power.h | 1 drivers/base/power/runtime.c | 2 drivers/base/power/sysfs.c | 121 +++++++++ drivers/base/power/wakeup.c | 532 ++++++++++++++++++++++++++++++++++++------- include/linux/pm.h | 16 - include/linux/pm_wakeup.h | 114 +++++++-- include/linux/suspend.h | 4 kernel/power/main.c | 8 9 files changed, 673 insertions(+), 128 deletions(-) Index: linux-2.6/include/linux/pm_wakeup.h =================================================================== --- linux-2.6.orig/include/linux/pm_wakeup.h +++ linux-2.6/include/linux/pm_wakeup.h @@ -2,6 +2,7 @@ * pm_wakeup.h - Power management wakeup interface * * Copyright (C) 2008 Alan Stern + * Copyright (C) 2010 Rafael J. Wysocki, Novell Inc. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -27,18 +28,27 @@ #include <linux/types.h> -#ifdef CONFIG_PM +struct wakeup_source { + char *name; + struct list_head entry; + spinlock_t lock; + struct timer_list timer; + unsigned long timer_expires; + ktime_t total_time; + ktime_t max_time; + ktime_t last_time; + unsigned long event_count; + unsigned long active_count; + unsigned long relax_count; + unsigned long hit_count; + unsigned int active:1; +}; -/* Changes to device_may_wakeup take effect on the next pm state change. - * - * By default, most devices should leave wakeup disabled. The exceptions - * are devices that everyone expects to be wakeup sources: keyboards, - * power buttons, possibly network interfaces, etc. +#ifdef CONFIG_PM_SLEEP + +/* + * Changes to device_may_wakeup take effect on the next pm state change. */ -static inline void device_init_wakeup(struct device *dev, bool val) -{ - dev->power.can_wakeup = dev->power.should_wakeup = val; -} static inline void device_set_wakeup_capable(struct device *dev, bool capable) { @@ -50,23 +60,32 @@ static inline bool device_can_wakeup(str return dev->power.can_wakeup; } -static inline void device_set_wakeup_enable(struct device *dev, bool enable) -{ - dev->power.should_wakeup = enable; -} + static inline bool device_may_wakeup(struct device *dev) { - return dev->power.can_wakeup && dev->power.should_wakeup; + return dev->power.can_wakeup && !!dev->power.wakeup; } -#else /* !CONFIG_PM */ +/* drivers/base/power/wakeup.c */ +extern struct wakeup_source *wakeup_source_create(const char *name); +extern void wakeup_source_destroy(struct wakeup_source *ws); +extern void wakeup_source_register(struct wakeup_source *ws); +extern void wakeup_source_unregister(struct wakeup_source *ws); +extern struct wakeup_source *wakeup_source_add(const char *name); +extern void wakeup_source_remove(struct wakeup_source *ws); +extern int device_wakeup_enable(struct device *dev); +extern int device_wakeup_disable(struct device *dev); +extern int device_init_wakeup(struct device *dev, bool val); +extern int device_set_wakeup_enable(struct device *dev, bool enable); +extern void __pm_stay_awake(struct wakeup_source *ws); +extern void pm_stay_awake(struct device *dev); +extern void __pm_relax(struct wakeup_source *ws); +extern void pm_relax(struct device *dev); +extern void __pm_wakeup_event(struct wakeup_source *ws, unsigned int msec); +extern void pm_wakeup_event(struct device *dev, unsigned int msec); -/* For some reason the following routines work even without CONFIG_PM */ -static inline void device_init_wakeup(struct device *dev, bool val) -{ - dev->power.can_wakeup = val; -} +#else /* !CONFIG_PM_SLEEP */ static inline void device_set_wakeup_capable(struct device *dev, bool capable) { @@ -78,15 +97,60 @@ static inline bool device_can_wakeup(str return dev->power.can_wakeup; } -static inline void device_set_wakeup_enable(struct device *dev, bool enable) +static inline bool device_may_wakeup(struct device *dev) { + return false; } -static inline bool device_may_wakeup(struct device *dev) +static inline struct wakeup_source *wakeup_source_create(char *name) { - return false; + return NULL; +} + +static inline void wakeup_source_destroy(struct wakeup_source *ws) {} + +static inline void wakeup_source_register(struct wakeup_source *ws) {} + +static inline void wakeup_source_unregister(struct wakeup_source *ws) {} + +static inline struct wakeup_source *wakeup_source_add(char *name) +{ + return NULL; } -#endif /* !CONFIG_PM */ +static inline void wakeup_source_remove(struct wakeup_source *ws) {} + +static inline int device_wakeup_enable(struct device *dev) +{ + return -EINVAL; +} + +static inline int device_wakeup_disable(struct device *dev) {} + +static inline int device_init_wakeup(struct device *dev, bool val) +{ + dev->power.can_wakeup = val; + return val ? -EINVAL : 0; +} + + +static inline int device_set_wakeup_enable(struct device *dev, bool enable) +{ + return -EINVAL; +} + +static inline void __pm_stay_awake(struct wakeup_source *ws) {} + +static inline void pm_stay_awake(struct device *dev) {} + +static inline void __pm_relax(struct wakeup_source *ws) {} + +static inline void pm_relax(struct device *dev) {} + +static inline void __pm_wakeup_event(struct wakeup_source *ws, unsigned int msec) {} + +static inline void pm_wakeup_event(struct device *dev, unsigned int msec) {} + +#endif /* !CONFIG_PM_SLEEP */ #endif /* _LINUX_PM_WAKEUP_H */ Index: linux-2.6/include/linux/pm.h =================================================================== --- linux-2.6.orig/include/linux/pm.h +++ linux-2.6/include/linux/pm.h @@ -448,23 +448,24 @@ enum rpm_request { RPM_REQ_RESUME, }; +struct wakeup_source; + struct dev_pm_info { pm_message_t power_state; unsigned int can_wakeup:1; - unsigned int should_wakeup:1; unsigned async_suspend:1; enum dpm_state status; /* Owned by the PM core */ + spinlock_t lock; #ifdef CONFIG_PM_SLEEP struct list_head entry; struct completion completion; - unsigned long wakeup_count; + struct wakeup_source *wakeup; #endif #ifdef CONFIG_PM_RUNTIME struct timer_list suspend_timer; unsigned long timer_expires; struct work_struct work; wait_queue_head_t wait_queue; - spinlock_t lock; atomic_t usage_count; atomic_t child_count; unsigned int disable_depth:3; @@ -559,11 +560,6 @@ extern void __suspend_report_result(cons } while (0) extern void device_pm_wait_for_dev(struct device *sub, struct device *dev); - -/* drivers/base/power/wakeup.c */ -extern void pm_wakeup_event(struct device *dev, unsigned int msec); -extern void pm_stay_awake(struct device *dev); -extern void pm_relax(void); #else /* !CONFIG_PM_SLEEP */ #define device_pm_lock() do {} while (0) @@ -577,10 +573,6 @@ static inline int dpm_suspend_start(pm_m #define suspend_report_result(fn, ret) do {} while (0) static inline void device_pm_wait_for_dev(struct device *a, struct device *b) {} - -static inline void pm_wakeup_event(struct device *dev, unsigned int msec) {} -static inline void pm_stay_awake(struct device *dev) {} -static inline void pm_relax(void) {} #endif /* !CONFIG_PM_SLEEP */ /* How to reorder dpm_list after device_move() */ Index: linux-2.6/drivers/base/power/wakeup.c =================================================================== --- linux-2.6.orig/drivers/base/power/wakeup.c +++ linux-2.6/drivers/base/power/wakeup.c @@ -11,7 +11,10 @@ #include <linux/sched.h> #include <linux/capability.h> #include <linux/suspend.h> -#include <linux/pm.h> + +#include "power.h" + +#define TIMEOUT 100 /* * If set, the suspend/hibernate code will abort transitions to a sleep state @@ -20,18 +23,251 @@ bool events_check_enabled; /* The counter of registered wakeup events. */ -static unsigned long event_count; +static atomic_t event_count = ATOMIC_INIT(0); /* A preserved old value of event_count. */ -static unsigned long saved_event_count; +static unsigned int saved_count; /* The counter of wakeup events being processed. */ -static unsigned long events_in_progress; +static atomic_t events_in_progress = ATOMIC_INIT(0); static DEFINE_SPINLOCK(events_lock); static void pm_wakeup_timer_fn(unsigned long data); -static DEFINE_TIMER(events_timer, pm_wakeup_timer_fn, 0, 0); -static unsigned long events_timer_expires; +static LIST_HEAD(wakeup_sources); + +/** + * wakeup_source_create - Create a struct wakeup_source object. + * @name: Name of the new wakeup source. + */ +struct wakeup_source *wakeup_source_create(const char *name) +{ + struct wakeup_source *ws; + + ws = kzalloc(sizeof(*ws), GFP_KERNEL); + if (!ws) + return NULL; + + if (name) { + int len = strlen(name); + char *s = kzalloc(len + 1, GFP_KERNEL); + if (s) { + strncpy(s, name, len); + ws->name = s; + } + } + + return ws; +} +EXPORT_SYMBOL_GPL(wakeup_source_create); + +/** + * wakeup_source_destroy - Destroy a struct wakeup_source object. + * @ws: Wakeup source to destroy. + */ +void wakeup_source_destroy(struct wakeup_source *ws) +{ + if (!ws) + return; + + spin_lock_irq(&ws->lock); + while (ws->active) { + spin_unlock_irq(&ws->lock); + + schedule_timeout_interruptible(msecs_to_jiffies(TIMEOUT)); + + spin_lock_irq(&ws->lock); + } + spin_unlock_irq(&ws->lock); + + if (ws->name) + kfree(ws->name); + kfree(ws); +} +EXPORT_SYMBOL_GPL(wakeup_source_destroy); + +/** + * wakeup_source_register - Add given object to the list of wakeup sources. + * @ws: Wakeup source object to register. + */ +void wakeup_source_register(struct wakeup_source *ws) +{ + if (WARN_ON(!ws)) + return; + + spin_lock_init(&ws->lock); + setup_timer(&ws->timer, pm_wakeup_timer_fn, (unsigned long)ws); + ws->active = false; + + spin_lock_irq(&events_lock); + list_add_rcu(&ws->entry, &wakeup_sources); + spin_unlock_irq(&events_lock); + synchronize_rcu(); +} +EXPORT_SYMBOL_GPL(wakeup_source_register); + +/** + * wakeup_source_unregister - Remove given object from the wakeup sources list. + * @ws: Wakeup source object to unregister. + */ +void wakeup_source_unregister(struct wakeup_source *ws) +{ + if (WARN_ON(!ws)) + return; + + spin_lock_irq(&events_lock); + list_del_rcu(&ws->entry); + spin_unlock_irq(&events_lock); + synchronize_rcu(); +} +EXPORT_SYMBOL_GPL(wakeup_source_unregister); + +/** + * wakeup_source_add - Create and register a wakeup source object. + * @name: Name of the wakeup source to create. + */ +struct wakeup_source *wakeup_source_add(const char *name) +{ + struct wakeup_source *ws; + + ws = wakeup_source_create(name); + if (ws) + wakeup_source_register(ws); + + return ws; +} +EXPORT_SYMBOL_GPL(wakeup_source_add); + +/** + * wakeup_source_remove - Unregister and destroy a wakeup source object. + * @ws: Wakeup source object to handle. + */ +void wakeup_source_remove(struct wakeup_source *ws) +{ + wakeup_source_unregister(ws); + wakeup_source_destroy(ws); +} +EXPORT_SYMBOL_GPL(wakeup_source_remove); + +/** + * device_wakeup_attach - Attach a wakeup source object to a device object. + * @dev: Device to handle. + * @ws: Wakeup source object to attach to @dev. + * + * This causes @dev to be treated as a wakeup device. + */ +static int device_wakeup_attach(struct device *dev, struct wakeup_source *ws) +{ + spin_lock_irq(&dev->power.lock); + if (dev->power.wakeup) { + spin_unlock_irq(&dev->power.lock); + return -EEXIST; + } + dev->power.wakeup = ws; + spin_unlock_irq(&dev->power.lock); + return 0; +} + +/** + * device_wakeup_enable - Enable given device to be a wakeup source. + * @dev: Device to handle. + * + * Create a wakeup source object, register it and attach it to @dev. + */ +int device_wakeup_enable(struct device *dev) +{ + struct wakeup_source *ws; + int ret; + + if (!dev || !dev->power.can_wakeup) + return -EINVAL; + + ws = wakeup_source_add(dev_name(dev)); + if (!ws) + return -ENOMEM; + + ret = device_wakeup_attach(dev, ws); + if (ret) + wakeup_source_remove(ws); + + return ret; +} +EXPORT_SYMBOL_GPL(device_wakeup_enable); + +/** + * device_wakeup_detach - Detach a device's wakeup source object from it. + * @dev: Device to detach the wakeup source object from. + * + * After it returns, @dev will not be treated as a wakeup device any more. + */ +static struct wakeup_source *device_wakeup_detach(struct device *dev) +{ + struct wakeup_source *ws; + + spin_lock_irq(&dev->power.lock); + ws = dev->power.wakeup; + dev->power.wakeup = NULL; + spin_unlock_irq(&dev->power.lock); + return ws; +} + +/** + * device_wakeup_disable - Do not regard a device as a wakeup source any more. + * @dev: Device to handle. + * + * Detach the @dev's wakeup source object from it, unregister this wakeup source + * object and destroy it. + */ +int device_wakeup_disable(struct device *dev) +{ + struct wakeup_source *ws; + + if (!dev || !dev->power.can_wakeup) + return -EINVAL; + + ws = device_wakeup_detach(dev); + if (ws) + wakeup_source_remove(ws); + + return 0; +} +EXPORT_SYMBOL_GPL(device_wakeup_disable); + +/** + * device_init_wakeup - Device wakeup initialization. + * @dev: Device to handle. + * @enable: Whether or not to enable @dev as a wakeup device. + * + * By default, most devices should leave wakeup disabled. The exceptions are + * devices that everyone expects to be wakeup sources: keyboards, power buttons, + * possibly network interfaces, etc. + */ +int device_init_wakeup(struct device *dev, bool enable) +{ + int ret = 0; + + if (enable) { + device_set_wakeup_capable(dev, true); + ret = device_wakeup_enable(dev); + } else { + device_set_wakeup_capable(dev, false); + } + + return ret; +} +EXPORT_SYMBOL_GPL(device_init_wakeup); + +/** + * device_set_wakeup_enable - Enable or disable a device to wake up the system. + * @dev: Device to handle. + */ +int device_set_wakeup_enable(struct device *dev, bool enable) +{ + if (!dev || !dev->power.can_wakeup) + return -EINVAL; + + return enable ? device_wakeup_enable(dev) : device_wakeup_disable(dev); +} +EXPORT_SYMBOL_GPL(device_set_wakeup_enable); /* * The functions below use the observation that each wakeup event starts a @@ -55,118 +291,252 @@ static unsigned long events_timer_expire * knowledge, however, may not be available to it, so it can simply specify time * to wait before the system can be suspended and pass it as the second * argument of pm_wakeup_event(). + * + * It is valid to call pm_relax() after pm_wakeup_event(), in which case the + * "no suspend" period will be ended either by the pm_relax(), or by the timer + * function executed when the timer expires, whichever comes first. */ /** + * wakup_source_activate - Mark given wakeup source as active. + * @ws: Wakeup source to handle. + * + * Update the @ws' statistics and, if @ws has just been activated, notify the PM + * core of the event by incrementing the counter of of wakeup events being + * processed. + */ +static void wakeup_source_activate(struct wakeup_source *ws) +{ + ws->active = true; + ws->active_count++; + ws->timer_expires = jiffies; + ws->last_time = ktime_get(); + + atomic_inc(&events_in_progress); +} + +/** + * __pm_stay_awake - Notify the PM core of a wakeup event. + * @ws: Wakeup source object associated with the source of the event. + * + * It is safe to call this function from interrupt context. + */ +void __pm_stay_awake(struct wakeup_source *ws) +{ + unsigned long flags; + + if (!ws) + return; + + spin_lock_irqsave(&ws->lock, flags); + ws->event_count++; + if (!ws->active) + wakeup_source_activate(ws); + spin_unlock_irqrestore(&ws->lock, flags); +} +EXPORT_SYMBOL_GPL(__pm_stay_awake); + +/** * pm_stay_awake - Notify the PM core that a wakeup event is being processed. * @dev: Device the wakeup event is related to. * - * Notify the PM core of a wakeup event (signaled by @dev) by incrementing the - * counter of wakeup events being processed. If @dev is not NULL, the counter - * of wakeup events related to @dev is incremented too. + * Notify the PM core of a wakeup event (signaled by @dev) by calling + * __pm_stay_awake for the @dev's wakeup source object. * * Call this function after detecting of a wakeup event if pm_relax() is going * to be called directly after processing the event (and possibly passing it to * user space for further processing). - * - * It is safe to call this function from interrupt context. */ void pm_stay_awake(struct device *dev) { unsigned long flags; - spin_lock_irqsave(&events_lock, flags); - if (dev) - dev->power.wakeup_count++; + if (!dev) + return; - events_in_progress++; - spin_unlock_irqrestore(&events_lock, flags); + spin_lock_irqsave(&dev->power.lock, flags); + __pm_stay_awake(dev->power.wakeup); + spin_unlock_irqrestore(&dev->power.lock, flags); } +EXPORT_SYMBOL_GPL(pm_stay_awake); /** - * pm_relax - Notify the PM core that processing of a wakeup event has ended. + * wakup_source_deactivate - Mark given wakeup source as inactive. + * @ws: Wakeup source to handle. * - * Notify the PM core that a wakeup event has been processed by decrementing - * the counter of wakeup events being processed and incrementing the counter - * of registered wakeup events. + * Update the @ws' statistics and notify the PM core that the wakeup source has + * become inactive by decrementing the counter of wakeup events being processed + * and incrementing the counter of registered wakeup events. + */ +static void wakeup_source_deactivate(struct wakeup_source *ws) +{ + ktime_t duration; + ktime_t now; + + ws->relax_count++; + /* + * __pm_relax() may be called directly or from a timer function. + * If it is called directly right after the timer function has been + * started, but before the timer function calls __pm_relax(), it is + * possible that __pm_stay_awake() will be called in the meantime and + * will set ws->active. Then, ws->active may be cleared immediately + * by the __pm_relax() called from the timer function, but in such a + * case ws->relax_count will be different from ws->active_count. + */ + if (ws->relax_count != ws->active_count) { + ws->relax_count--; + return; + } + + ws->active = false; + + now = ktime_get(); + duration = ktime_sub(now, ws->last_time); + ws->total_time = ktime_add(ws->total_time, duration); + if (ktime_to_ns(duration) > ktime_to_ns(ws->max_time)) + ws->max_time = duration; + + del_timer(&ws->timer); + + atomic_inc(&event_count); + atomic_dec(&events_in_progress); +} + +/** + * __pm_relax - Notify the PM core that processing of a wakeup event has ended. + * @ws: Wakeup source object associated with the source of the event. * * Call this function for wakeup events whose processing started with calling - * pm_stay_awake(). + * __pm_stay_awake(). * * It is safe to call it from interrupt context. */ -void pm_relax(void) +void __pm_relax(struct wakeup_source *ws) { unsigned long flags; - spin_lock_irqsave(&events_lock, flags); - if (events_in_progress) { - events_in_progress--; - event_count++; - } - spin_unlock_irqrestore(&events_lock, flags); + if (!ws) + return; + + spin_lock_irqsave(&ws->lock, flags); + if (ws->active) + wakeup_source_deactivate(ws); + spin_unlock_irqrestore(&ws->lock, flags); } +EXPORT_SYMBOL_GPL(__pm_relax); + +/** + * pm_relax - Notify the PM core that processing of a wakeup event has ended. + * @dev: Device that signaled the event. + * + * Execute __pm_relax() for the @dev's wakeup source object. + */ +void pm_relax(struct device *dev) +{ + unsigned long flags; + + if (!dev) + return; + + spin_lock_irqsave(&dev->power.lock, flags); + __pm_relax(dev->power.wakeup); + spin_unlock_irqrestore(&dev->power.lock, flags); +} +EXPORT_SYMBOL_GPL(pm_relax); /** * pm_wakeup_timer_fn - Delayed finalization of a wakeup event. + * @data: Address of the wakeup source object associated with the event source. * - * Decrease the counter of wakeup events being processed after it was increased - * by pm_wakeup_event(). + * Call __pm_relax() for the wakeup source whose address is stored in @data. */ static void pm_wakeup_timer_fn(unsigned long data) { + __pm_relax((struct wakeup_source *)data); +} + +/** + * __pm_wakeup_event - Notify the PM core of a wakeup event. + * @ws: Wakeup source object associated with the event source. + * @msec: Anticipated event processing time (in milliseconds). + * + * Notify the PM core of a wakeup event whose source is @ws that will take + * approximately @msec milliseconds to be processed by the kernel. If @ws is + * not active, activate it. If @msec is nonzero, set up the @ws' timer to + * execute pm_wakeup_timer_fn() in future. + * + * It is safe to call this function from interrupt context. + */ +void __pm_wakeup_event(struct wakeup_source *ws, unsigned int msec) +{ unsigned long flags; + unsigned long expires; - spin_lock_irqsave(&events_lock, flags); - if (events_timer_expires - && time_before_eq(events_timer_expires, jiffies)) { - events_in_progress--; - events_timer_expires = 0; + if (!ws) + return; + + spin_lock_irqsave(&ws->lock, flags); + + ws->event_count++; + if (!ws->active) + wakeup_source_activate(ws); + + if (!msec) { + wakeup_source_deactivate(ws); + goto unlock; } - spin_unlock_irqrestore(&events_lock, flags); + + expires = jiffies + msecs_to_jiffies(msec); + if (!expires) + expires = 1; + + if (time_after(expires, ws->timer_expires)) { + mod_timer(&ws->timer, expires); + ws->timer_expires = expires; + } + + unlock: + spin_unlock_irqrestore(&ws->lock, flags); } +EXPORT_SYMBOL_GPL(__pm_wakeup_event); + /** * pm_wakeup_event - Notify the PM core of a wakeup event. * @dev: Device the wakeup event is related to. * @msec: Anticipated event processing time (in milliseconds). * - * Notify the PM core of a wakeup event (signaled by @dev) that will take - * approximately @msec milliseconds to be processed by the kernel. Increment - * the counter of registered wakeup events and (if @msec is nonzero) set up - * the wakeup events timer to execute pm_wakeup_timer_fn() in future (if the - * timer has not been set up already, increment the counter of wakeup events - * being processed). If @dev is not NULL, the counter of wakeup events related - * to @dev is incremented too. - * - * It is safe to call this function from interrupt context. + * Call __pm_wakeup_event() for the @dev's wakeup source object. */ void pm_wakeup_event(struct device *dev, unsigned int msec) { unsigned long flags; - spin_lock_irqsave(&events_lock, flags); - event_count++; - if (dev) - dev->power.wakeup_count++; - - if (msec) { - unsigned long expires; - - expires = jiffies + msecs_to_jiffies(msec); - if (!expires) - expires = 1; - - if (!events_timer_expires - || time_after(expires, events_timer_expires)) { - if (!events_timer_expires) - events_in_progress++; + if (!dev) + return; - mod_timer(&events_timer, expires); - events_timer_expires = expires; - } + spin_lock_irqsave(&dev->power.lock, flags); + __pm_wakeup_event(dev->power.wakeup, msec); + spin_unlock_irqrestore(&dev->power.lock, flags); +} +EXPORT_SYMBOL_GPL(pm_wakeup_event); + +/** + * pm_wakeup_update_hit_counts - Update hit counts of all active wakeup sources. + */ +static void pm_wakeup_update_hit_counts(void) +{ + unsigned long flags; + struct wakeup_source *ws; + + rcu_read_lock(); + list_for_each_entry_rcu(ws, &wakeup_sources, entry) { + spin_lock_irqsave(&ws->lock, flags); + if (ws->active) + ws->hit_count++; + spin_unlock_irqrestore(&ws->lock, flags); } - spin_unlock_irqrestore(&events_lock, flags); + rcu_read_unlock(); } /** @@ -184,10 +554,13 @@ bool pm_check_wakeup_events(void) spin_lock_irqsave(&events_lock, flags); if (events_check_enabled) { - ret = (event_count == saved_event_count) && !events_in_progress; + ret = ((unsigned int)atomic_read(&event_count) == saved_count) + && !atomic_read(&events_in_progress); events_check_enabled = ret; } spin_unlock_irqrestore(&events_lock, flags); + if (!ret) + pm_wakeup_update_hit_counts(); return ret; } @@ -202,24 +575,20 @@ bool pm_check_wakeup_events(void) * drop down to zero has been interrupted by a signal (and the current number * of wakeup events being processed is still nonzero). Otherwise return true. */ -bool pm_get_wakeup_count(unsigned long *count) +bool pm_get_wakeup_count(unsigned int *count) { bool ret; - spin_lock_irq(&events_lock); if (capable(CAP_SYS_ADMIN)) events_check_enabled = false; - while (events_in_progress && !signal_pending(current)) { - spin_unlock_irq(&events_lock); - - schedule_timeout_interruptible(msecs_to_jiffies(100)); - - spin_lock_irq(&events_lock); + while (atomic_read(&events_in_progress) && !signal_pending(current)) { + pm_wakeup_update_hit_counts(); + schedule_timeout_interruptible(msecs_to_jiffies(TIMEOUT)); } - *count = event_count; - ret = !events_in_progress; - spin_unlock_irq(&events_lock); + + ret = !atomic_read(&events_in_progress); + *count = atomic_read(&event_count); return ret; } @@ -232,16 +601,19 @@ bool pm_get_wakeup_count(unsigned long * * old number of registered wakeup events to be used by pm_check_wakeup_events() * and return true. Otherwise return false. */ -bool pm_save_wakeup_count(unsigned long count) +bool pm_save_wakeup_count(unsigned int count) { bool ret = false; spin_lock_irq(&events_lock); - if (count == event_count && !events_in_progress) { - saved_event_count = count; + if (count == (unsigned int)atomic_read(&event_count) + && !atomic_read(&events_in_progress)) { + saved_count = count; events_check_enabled = true; ret = true; } spin_unlock_irq(&events_lock); + if (!ret) + pm_wakeup_update_hit_counts(); return ret; } Index: linux-2.6/drivers/base/power/main.c =================================================================== --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -60,7 +60,8 @@ void device_pm_init(struct device *dev) dev->power.status = DPM_ON; init_completion(&dev->power.completion); complete_all(&dev->power.completion); - dev->power.wakeup_count = 0; + dev->power.wakeup = NULL; + spin_lock_init(&dev->power.lock); pm_runtime_init(dev); } Index: linux-2.6/drivers/base/power/power.h =================================================================== --- linux-2.6.orig/drivers/base/power/power.h +++ linux-2.6/drivers/base/power/power.h @@ -34,6 +34,7 @@ extern void device_pm_move_last(struct d static inline void device_pm_init(struct device *dev) { + spin_lock_init(&dev->power.lock); pm_runtime_init(dev); } Index: linux-2.6/drivers/base/power/runtime.c =================================================================== --- linux-2.6.orig/drivers/base/power/runtime.c +++ linux-2.6/drivers/base/power/runtime.c @@ -1099,8 +1099,6 @@ EXPORT_SYMBOL_GPL(pm_runtime_allow); */ void pm_runtime_init(struct device *dev) { - spin_lock_init(&dev->power.lock); - dev->power.runtime_status = RPM_SUSPENDED; dev->power.idle_notification = false; Index: linux-2.6/include/linux/suspend.h =================================================================== --- linux-2.6.orig/include/linux/suspend.h +++ linux-2.6/include/linux/suspend.h @@ -302,8 +302,8 @@ extern int unregister_pm_notifier(struct extern bool events_check_enabled; extern bool pm_check_wakeup_events(void); -extern bool pm_get_wakeup_count(unsigned long *count); -extern bool pm_save_wakeup_count(unsigned long count); +extern bool pm_get_wakeup_count(unsigned int *count); +extern bool pm_save_wakeup_count(unsigned int count); #else /* !CONFIG_PM_SLEEP */ static inline int register_pm_notifier(struct notifier_block *nb) Index: linux-2.6/kernel/power/main.c =================================================================== --- linux-2.6.orig/kernel/power/main.c +++ linux-2.6/kernel/power/main.c @@ -237,18 +237,18 @@ static ssize_t wakeup_count_show(struct struct kobj_attribute *attr, char *buf) { - unsigned long val; + unsigned int val; - return pm_get_wakeup_count(&val) ? sprintf(buf, "%lu\n", val) : -EINTR; + return pm_get_wakeup_count(&val) ? sprintf(buf, "%u\n", val) : -EINTR; } static ssize_t wakeup_count_store(struct kobject *kobj, struct kobj_attribute *attr, const char *buf, size_t n) { - unsigned long val; + unsigned int val; - if (sscanf(buf, "%lu", &val) == 1) { + if (sscanf(buf, "%u", &val) == 1) { if (pm_save_wakeup_count(val)) return n; } Index: linux-2.6/drivers/base/power/sysfs.c =================================================================== --- linux-2.6.orig/drivers/base/power/sysfs.c +++ linux-2.6/drivers/base/power/sysfs.c @@ -210,11 +210,122 @@ static DEVICE_ATTR(wakeup, 0644, wake_sh static ssize_t wakeup_count_show(struct device *dev, struct device_attribute *attr, char *buf) { - return sprintf(buf, "%lu\n", dev->power.wakeup_count); + unsigned long count = 0; + bool enabled = false; + + spin_lock_irq(&dev->power.lock); + if (dev->power.wakeup) { + count = dev->power.wakeup->event_count; + enabled = true; + } + spin_unlock_irq(&dev->power.lock); + return enabled ? sprintf(buf, "%lu\n", count) : sprintf(buf, "\n"); } static DEVICE_ATTR(wakeup_count, 0444, wakeup_count_show, NULL); -#endif + +static ssize_t wakeup_active_count_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + unsigned long count = 0; + bool enabled = false; + + spin_lock_irq(&dev->power.lock); + if (dev->power.wakeup) { + count = dev->power.wakeup->active_count; + enabled = true; + } + spin_unlock_irq(&dev->power.lock); + return enabled ? sprintf(buf, "%lu\n", count) : sprintf(buf, "\n"); +} + +static DEVICE_ATTR(wakeup_active_count, 0444, wakeup_active_count_show, NULL); + +static ssize_t wakeup_hit_count_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + unsigned long count = 0; + bool enabled = false; + + spin_lock_irq(&dev->power.lock); + if (dev->power.wakeup) { + count = dev->power.wakeup->hit_count; + enabled = true; + } + spin_unlock_irq(&dev->power.lock); + return enabled ? sprintf(buf, "%lu\n", count) : sprintf(buf, "\n"); +} + +static DEVICE_ATTR(wakeup_hit_count, 0444, wakeup_hit_count_show, NULL); + +static ssize_t wakeup_active_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + unsigned int active = 0; + bool enabled = false; + + spin_lock_irq(&dev->power.lock); + if (dev->power.wakeup) { + active = dev->power.wakeup->active; + enabled = true; + } + spin_unlock_irq(&dev->power.lock); + return enabled ? sprintf(buf, "%u\n", active) : sprintf(buf, "\n"); +} + +static DEVICE_ATTR(wakeup_active, 0444, wakeup_active_show, NULL); + +static ssize_t wakeup_total_time_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + s64 msec = 0; + bool enabled = false; + + spin_lock_irq(&dev->power.lock); + if (dev->power.wakeup) { + msec = ktime_to_ms(dev->power.wakeup->total_time); + enabled = true; + } + spin_unlock_irq(&dev->power.lock); + return enabled ? sprintf(buf, "%lld\n", msec) : sprintf(buf, "\n"); +} + +static DEVICE_ATTR(wakeup_total_time_ms, 0444, wakeup_total_time_show, NULL); + +static ssize_t wakeup_max_time_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + s64 msec = 0; + bool enabled = false; + + spin_lock_irq(&dev->power.lock); + if (dev->power.wakeup) { + msec = ktime_to_ms(dev->power.wakeup->max_time); + enabled = true; + } + spin_unlock_irq(&dev->power.lock); + return enabled ? sprintf(buf, "%lld\n", msec) : sprintf(buf, "\n"); +} + +static DEVICE_ATTR(wakeup_max_time_ms, 0444, wakeup_max_time_show, NULL); + +static ssize_t wakeup_last_time_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + s64 msec = 0; + bool enabled = false; + + spin_lock_irq(&dev->power.lock); + if (dev->power.wakeup) { + msec = ktime_to_ms(dev->power.wakeup->last_time); + enabled = true; + } + spin_unlock_irq(&dev->power.lock); + return enabled ? sprintf(buf, "%lld\n", msec) : sprintf(buf, "\n"); +} + +static DEVICE_ATTR(wakeup_last_time_ms, 0444, wakeup_last_time_show, NULL); +#endif /* CONFIG_PM_SLEEP */ #ifdef CONFIG_PM_ADVANCED_DEBUG #ifdef CONFIG_PM_RUNTIME @@ -288,6 +399,12 @@ static struct attribute * power_attrs[] &dev_attr_wakeup.attr, #ifdef CONFIG_PM_SLEEP &dev_attr_wakeup_count.attr, + &dev_attr_wakeup_active_count.attr, + &dev_attr_wakeup_hit_count.attr, + &dev_attr_wakeup_active.attr, + &dev_attr_wakeup_total_time_ms.attr, + &dev_attr_wakeup_max_time_ms.attr, + &dev_attr_wakeup_last_time_ms.attr, #endif #ifdef CONFIG_PM_ADVANCED_DEBUG &dev_attr_async.attr, ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Wakeup-events implementation 2010-08-17 15:53 ` Alan Stern 2010-08-17 18:37 ` David Brownell 2010-08-17 19:21 ` Rafael J. Wysocki @ 2010-08-18 7:04 ` Florian Mickler 2 siblings, 0 replies; 18+ messages in thread From: Florian Mickler @ 2010-08-18 7:04 UTC (permalink / raw) To: Alan Stern; +Cc: list, Linux-pm On Tue, 17 Aug 2010 11:53:53 -0400 (EDT) Alan Stern <stern@rowland.harvard.edu> wrote: > Also, I frankly have to wonder why you demand such a large amount of > detailed debugging about the kernel's wakeup/wakelock subsystem (as > opposed to userspace wakelocks). Yes, it's clear that if something > goes wrong then you will drain your phone's battery in a few hours > rather than a few days. But what if something goes wrong with the VM > subsystem, or the RF subsystem, or the keypad driver? The device would > crash immediately or at best become almost totally unusable. These are > much worse scenarios than losing some battery life -- have you added > comparable quantities of debugging to those other parts of the kernel? > > In addition, you retain all these debugging facilities in your > production builds, not just in the debug kernels. Doesn't this seem > excessive? > There is a theory about risk analysis and follow-up costs... i don't have anything concrete at hand but the main idea is that if you develop for example some monitoring for a production line of a pharmaceutical product, you have to take into consideration all possible failure-modes, the associated risk (likelihood * impact) and the likelihood of detecting these failures before shipping. A failure mode that is fatal (->high risk), and is not detectable before shipping is bad and a sign that you probably shouldn't produce at all. A failure mode that is fatal (->high risk) but is detectable in all cases before shipping is not such a big problem. With battery powered devices, you have failure-modes (run-time not optimal) that is not easily detectable before shipping. This is the worst kind of failure-mode because you don't know if it will happen or not. The risk of this failure mode (non-optimal runtime) is bad also, because for mobile devices the runtime is critical and the fact that we talk about millions of devices will turn this up with a non-negligible probability. So I totally understand that the android people want all the debugging they can get to make non-optimal runtime as visible as a broken vm or a not functioning keypad. After all, runtime is critical for mobile devices. Cheers, Flo ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2010-09-07 23:51 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <AANLkTimgJUAW9TU37XE4BqYWf3yCWvg-vgk3E0YLqcm9@mail.gmail.com>
2010-08-13 15:59 ` Wakeup-events implementation Alan Stern
2010-08-14 4:59 ` Arve Hjønnevåg
2010-08-16 19:22 ` Alan Stern
2010-08-17 0:30 ` Arve Hjønnevåg
2010-08-17 15:53 ` Alan Stern
2010-08-17 18:37 ` David Brownell
2010-08-17 19:19 ` Alan Stern
2010-08-17 19:21 ` Rafael J. Wysocki
2010-08-18 0:40 ` Arve Hjønnevåg
2010-08-18 10:32 ` Mark Brown
2010-08-18 15:20 ` Alan Stern
2010-08-18 19:17 ` Rafael J. Wysocki
2010-08-19 0:18 ` Arve Hjønnevåg
2010-08-19 21:09 ` Alan Stern
2010-08-19 23:12 ` Arve Hjønnevåg
2010-08-20 15:04 ` Alan Stern
2010-09-07 23:51 ` [RFC][PATCH] PM / Wakeup: Introduce wakeup source objects and event statistics (was: Re: Wakeup-events implementation) Rafael J. Wysocki
2010-08-18 7:04 ` Wakeup-events implementation Florian Mickler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox