linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: "Arve Hjønnevåg" <arve@android.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>,
	Linux PM list <linux-pm@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Magnus Damm <magnus.damm@gmail.com>,
	markgross@thegnar.org, Matthew Garrett <mjg@redhat.com>,
	Greg KH <gregkh@linuxfoundation.org>,
	John Stultz <john.stultz@linaro.org>,
	Brian Swetland <swetland@google.com>,
	Alan Stern <stern@rowland.harvard.edu>,
	Dmitry Torokhov <dmitry.torokhov@gmail.com>,
	"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Subject: Re: [RFC][PATCH 5/8] epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready
Date: Mon, 30 Apr 2012 11:58:19 +1000	[thread overview]
Message-ID: <20120430115819.633c2cb6@notabene.brown> (raw)
In-Reply-To: <CAMP5XgcmsDzB3eY53tF=+k07WNSnG11M90gPTKfSs4-f78CEtQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 9270 bytes --]

On Thu, 26 Apr 2012 20:49:51 -0700 Arve Hjønnevåg <arve@android.com> wrote:

> 2012/4/26 Rafael J. Wysocki <rjw@sisk.pl>:
> > On Thursday, April 26, 2012, NeilBrown wrote:
> >> On Sun, 22 Apr 2012 23:22:43 +0200 "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> >>
> >> > From: Arve Hjønnevåg <arve@android.com>
> >> >
> >> > When an epoll_event, that has the EPOLLWAKEUP flag set, is ready, a
> >> > wakeup_source will be active to prevent suspend. This can be used to
> >> > handle wakeup events from a driver that support poll, e.g. input, if
> >> > that driver wakes up the waitqueue passed to epoll before allowing
> >> > suspend.
> >> >
> >> > The current implementation uses an extra wakeup_source when
> >> > ep_scan_ready_list runs. This can cause problems if a single thread
> >> > is polling on wakeup events and frequent non-wakeup events (events
> >> > usually arrive during thread freezing) using the same epoll file.
> >>
> >> This is quite neat.
> >>
> >> If I understand it correctly, you register file descriptors with epoll_ctl()
> >> on an fd created with epoll_create(), and set the new EPOLLWAKEUP flag.
> >> Then when a regular 'poll' or 'select' on the epoll fd reports that it is
> >> readable you:
> 
> I think it makes more sense to use epoll_wait than mixing this with
> select or poll.
> 
> >>   - get a wakelock
> This may not be needed, since epoll does not reevaluate its state
> until you call into it again (at least using epoll_wait).
> 
> >>   - use epoll_wait to collect the events
> >>   - process the events
> >>   - release your wakelock
> >>   - go back to poll() or select() on the epoll fd.
> >> Correct?  As long as there are ready events with EPOLLWAKEUP set a
> >> wakeup_source is held active and the system won't go to sleep.
> >>
> >> My concern with this is about permissions.  It appears that any process could
> >> wait of some fd (maybe a pipe they created themselves) with EPOLLWAKEUP, and
> >> then simply never epoll_wait() for the event.  Then they would be keeping
> >> the system awake.  I don't think that is acceptable.
> >
> > I wonder what Arve has to say to that, but let me say that on systems without
> > autosleep every process can go into an infinite busy loop which is going to
> > drain battery relatively quickly just as well and I don't see why that's so
> > much different.
> >
> 
> I still think is useful to limit access to this feature. On a phone, a
> process stuck in an infinite loop will increase battery drain, but if
> this process does not have permission to prevent suspend, then this is
> only catastrophic if another process that have that permission is
> preventing suspend. I think we should add a capability for this.
> Assuming you agree, do want me to create a separate patch for that
> adds a capability, or roll it into this one.
> 
> >> So there needs to be some way to limit who can effectively block suspend by
> >> using EPOLLWAKEUP.
> >> (This is one of the reasons I like an all-user-space solution.  Policy issues
> >> like this can easily be decided in user-space but are clumsy to put into the
> >> kernel).
> >>
> >> Also, I'm having trouble understanding the ep->ws wakeup_source.
> >> The epi->ws makes lots of sense and I think I understand it all.
> >> However I don't see why you need a wakeup_source for the 'struct eventpoll'.
> >>
> >> Every time that 'poll' decides to call the ->poll fop for the eventpoll, this
> >> wakeup_source will be activated and deactivated which will abort any current
> >> suspend cycle even if there are no events to report.
> >>
> >> I suspect it can just go away.
> >
> > I'll leave this one entirely to Arve, if you don't mind. :-)
> >
> 
> I keep the wakeup-source active whenever the epitem is on a list
> (ep->rdllist or the local txlist). The temporary txlist is modified
> without holding the lock that protects ep->rdllist. It is easier to
> use a separate wakeup source to prevent suspend while this list is
> manipulated than trying to maintain the wakeup-source state in a
> different way than the existing eventpoll state. I think this only
> causes real problems if the same epoll file is used for frequent
> non-wakeup events (e.g. a gyro) and wakeup events. You should be able
> to work around this by using two epoll files.

Thanks for the explanation.  I can now see more clearly how your patch works.
I can also see why you might need the ep->ws wakeup_source.  However I don't
like it.

If it acted purely as a lock and prevented suspend while it was active then
it would be fine.  However it doesn't.  It also aborts any current suspend
attempt - so it is externally visible.
The way your code it written, *any* call to epoll_wait will abort the current
suspend cycle, even if it is called by a completely non-privileged user.
That may not obviously be harmful, but it makes the precise semantics of the
system call something quite non-obvious, and it is much better to have a very
clean semantic.
As you say, it can probably be worked-around but code is much safer when you
don't need to work-around things.

I see two alternatives:
1/ set the 'wakeup' flag on the whole epoll-fd, not on the individual events
   that it is asked to monitor.  i.e. add a new flag to epoll_create1()
   instead of to epoll_ctl events.
   Then you just need a single wakeup_source for the fd which is active
   whenever any event is ready.

   This interface might be generally nicer, I'm not sure.

2/ Find a way to get rid of ep->ws.
   Thinking about it more, I again think it isn't needed.
   The reason is that suspend is already exclusive with any process running in
   kernel context.
   One of the first things suspend does is to freeze all process and (for
   regular non-kernel-thread processes) this happens by sending a virtual
   signal which is acted up when the process returns from a system call or
   returns from a context switch.  So while any given system call is running
   (e.g. epoll_wait) suspend is blocked.  When epoll_wait sets
   TASK_INTERRUPTIBLE the 'freeze' signal will interrupt it of course, but
   this is the only point where suspend can interfere with epoll_wait, and you
   aren't holding ep->ws then anyway.
   Hopefully Rafael will correct me if I got that outline wrong.  But even if 
   I did, I think we need to get rid of ep->ws.

Also, I think it is important to clearly document how to use this safely.
You suggested that if any EPOLLWAKEUP event is ready, then suspend will
remain disabled not only until the event is handled, but also until the next
call to epoll_wait.  That sounds like very useful semantics, but it isn't at
all explicit in the patch.  I think it should be made very clear in
eventpoll.h how the flag can be used. (and then eventually get this into a
man page of course).

> 
> >> One last item that doesn't really belong here - but it is in context.
> >>
> >> This mechanism is elegant because it provides a single implementation that
> >> provides wakeup_source for almost any sort of device.  I would like to do the
> >> same thing for interrupts.
> >> Most (maybe all) of the wakeup device on my phone have an interrupt where the
> >> body is run in a thread.  When the thread has done it's work the event is
> >> visible to userspace so the EPOLLWAKEUP mechanism is all that is needed to
> >> complete the path to user-space (or for my user-space solution, nothing else
> >> is needed once it is visible to user-space).
> >> So we just need to ensure a clear path from the "top half" interrupt handler
> >> to the threaded handler.
> >> So I imagine attaching a wakeup source to every interrupt for which 'wakeup'
> >> is enabled, activating it when the top-half starts and relaxing it when the
> >> bottom-half completes.  With this in place, almost all drivers would get
> >> wakeup_source handling for free.
> >> Does this seem reasonable to you.
> >
> > Yes, it does.
> >
> 
> How useful is that? Suspend already synchronizes with interrupt
> handlers and will not proceed until they have returned. Are threaded
> interrupts handlers not always run at that stage? For drivers that use
> work-queues instead of a threaded interrupt handler, I think the
> suspend-blocking work-queue patch I wrote a while back is convenient.
> 

Maybe it isn't useful at all - I'm still working this stuff out.

Yes, threaded interrupts are run "straight away", but what exactly does that
mean?  And in particular, is there any interlocking to ensure they run
before suspend gets stop the CPU?  Maybe the scheduling priority of the
different threads is enough to make sure this works, as irq_threads are
SCHED_FIFO and  the suspending thread almost certainly isn't.  But is that
still a guarantee on an SMP machine?  irq_threads aren't freezable so suspend
won't block on them for that reason..

I really just want to be sure that some interlock is in place to ensure that
the threaded interrupt handler runs before suspend absolutely commits to
suspending.  If that is already the case, when what I suggest isn't needed as
you suggest.  Do you know of such an interlock?

Thanks,
NeilBrown



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  parent reply	other threads:[~2012-04-30  1:58 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-07  1:00 [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks" Rafael J. Wysocki
2012-02-07  1:01 ` [PATCH 1/8] PM / Sleep: Initialize wakeup source locks in wakeup_source_add() Rafael J. Wysocki
2012-02-07 22:29   ` John Stultz
2012-02-07 22:41     ` Rafael J. Wysocki
2012-02-07  1:03 ` [PATCH 2/8] PM / Sleep: Do not check wakeup too often in try_to_freeze_tasks() Rafael J. Wysocki
2012-02-07  1:03 ` [PATCH 3/8] PM / Sleep: Look for wakeup events in later stages of device suspend Rafael J. Wysocki
2012-02-07  1:04 ` [PATCH 4/8] PM / Sleep: Use wait queue to signal "no wakeup events in progress" Rafael J. Wysocki
2012-02-08 23:10   ` NeilBrown
2012-02-09  0:05     ` Rafael J. Wysocki
2012-02-12  1:27   ` mark gross
2012-02-07  1:05 ` [RFC][PATCH 5/8] PM / Sleep: Change wakeup statistics Rafael J. Wysocki
2012-02-15  6:15   ` Arve Hjønnevåg
2012-02-15 22:37     ` Rafael J. Wysocki
2012-02-17  2:11       ` Arve Hjønnevåg
2012-02-07  1:06 ` [RFC][PATCH 6/8] PM / Sleep: Implement opportunistic sleep Rafael J. Wysocki
2012-02-07 22:49   ` [Update][RFC][PATCH " Rafael J. Wysocki
2012-02-07  1:06 ` [RFC][PATCH 7/8] PM / Sleep: Add "prevent autosleep time" statistics to wakeup sources Rafael J. Wysocki
2012-02-07  1:07 ` [RFC][PATCH 8/8] PM / Sleep: Add user space interface for manipulating " Rafael J. Wysocki
2012-02-07  1:13 ` [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks" Rafael J. Wysocki
2012-02-08 23:57 ` NeilBrown
2012-02-10  0:44   ` Rafael J. Wysocki
2012-02-12  2:05     ` mark gross
2012-02-12 21:32       ` Rafael J. Wysocki
2012-02-14  0:11         ` Arve Hjønnevåg
2012-02-15 15:28           ` mark gross
2012-02-12  1:54   ` mark gross
2012-02-12  1:19 ` mark gross
2012-02-14  2:07 ` Arve Hjønnevåg
2012-02-14 23:22   ` Rafael J. Wysocki
2012-02-15  5:57     ` Arve Hjønnevåg
2012-02-15 23:07       ` Rafael J. Wysocki
2012-02-16 22:22         ` Rafael J. Wysocki
2012-02-17  3:56           ` Arve Hjønnevåg
2012-02-17 23:02             ` [PATCH] PM / Sleep: Add more wakeup source initialization routines Rafael J. Wysocki
2012-02-18 23:50               ` [Update][PATCH] " Rafael J. Wysocki
2012-02-20 23:04                 ` [Update 2x][PATCH] " Rafael J. Wysocki
2012-02-17  3:55         ` [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks" Arve Hjønnevåg
2012-02-17 20:57           ` Rafael J. Wysocki
2012-02-21 23:31 ` [RFC][PATCH 0/7] PM: Implement autosleep and "wake locks", take 2 Rafael J. Wysocki
2012-02-21 23:32   ` [RFC][PATCH 1/7] PM / Sleep: Look for wakeup events in later stages of device suspend Rafael J. Wysocki
2012-02-21 23:33   ` [RFC][PATCH 2/7] PM / Sleep: Use wait queue to signal "no wakeup events in progress" Rafael J. Wysocki
2012-02-21 23:34   ` [RFC][PATCH 3/7] PM / Sleep: Change wakeup source statistics to follow Android Rafael J. Wysocki
2012-02-21 23:34   ` [RFC][PATCH 4/7] Input / PM: Add ioctl to block suspend while event queue is not empty Rafael J. Wysocki
2012-02-24  5:16     ` Matt Helsley
2012-02-25  4:25       ` Arve Hjønnevåg
2012-02-25 23:33         ` Rafael J. Wysocki
2012-02-28  0:19         ` Matt Helsley
2012-02-26 20:57       ` Rafael J. Wysocki
2012-02-27 22:18         ` Matt Helsley
2012-02-28  1:17           ` Rafael J. Wysocki
2012-02-28  5:58         ` Arve Hjønnevåg
2012-03-04 22:56           ` Rafael J. Wysocki
2012-03-06  1:04             ` [PATCH 1/2] epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready Arve Hjønnevåg
2012-03-06  1:04               ` [PATCH 2/2] PM / Sleep: Add wakeup_source_activate and wakeup_source_deactivate tracepoints Arve Hjønnevåg
2012-02-21 23:35   ` [RFC][PATCH 5/7] PM / Sleep: Implement opportunistic sleep Rafael J. Wysocki
2012-02-22  8:45     ` Srivatsa S. Bhat
2012-02-22 22:10       ` Rafael J. Wysocki
2012-02-23  5:35         ` Srivatsa S. Bhat
2012-02-21 23:36   ` [RFC][PATCH 6/7] PM / Sleep: Add "prevent autosleep time" statistics to wakeup sources Rafael J. Wysocki
2012-02-21 23:37   ` [RFC][PATCH 7/7] PM / Sleep: Add user space interface for manipulating " Rafael J. Wysocki
2012-02-22  4:49   ` [RFC][PATCH 0/7] PM: Implement autosleep and "wake locks", take 2 John Stultz
2012-02-22  8:44     ` Srivatsa S. Bhat
2012-02-22 22:10       ` [RFC][PATCH 0/7] PM: Implement autosleep and "wake locks", take2 Rafael J. Wysocki
2012-02-23  6:25         ` Srivatsa S. Bhat
2012-02-23 21:26           ` Rafael J. Wysocki
2012-02-23 21:32             ` Rafael J. Wysocki
2012-02-24  4:44               ` Srivatsa S. Bhat
2012-02-24 23:21                 ` Rafael J. Wysocki
2012-02-25  4:43                   ` Arve Hjønnevåg
2012-02-25 20:43                     ` Rafael J. Wysocki
2012-02-25 19:20                   ` Srivatsa S. Bhat
2012-02-25 21:01                     ` Rafael J. Wysocki
2012-02-28 10:24                       ` Srivatsa S. Bhat
2012-04-22 21:19   ` [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks", take 3 Rafael J. Wysocki
2012-04-22 21:19     ` [PATCH 1/8] PM / Sleep: Look for wakeup events in later stages of device suspend Rafael J. Wysocki
2012-04-22 21:20     ` [PATCH 2/8] PM / Sleep: Use wait queue to signal "no wakeup events in progress" Rafael J. Wysocki
2012-04-23  4:01       ` mark gross
2012-04-22 21:21     ` [PATCH 3/8] PM / Sleep: Change wakeup source statistics to follow Android Rafael J. Wysocki
2012-04-22 21:21     ` [PATCH 4/8] PM / Sleep: Add wakeup_source_activate and wakeup_source_deactivate tracepoints Rafael J. Wysocki
2012-04-22 21:22     ` [RFC][PATCH 5/8] epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready Rafael J. Wysocki
2012-04-26  4:03       ` NeilBrown
2012-04-26 20:40         ` Rafael J. Wysocki
2012-04-27  3:49           ` Arve Hjønnevåg
2012-04-27 21:18             ` Rafael J. Wysocki
2012-04-27 23:26               ` [PATCH] " Arve Hjønnevåg
2012-04-30  1:58             ` NeilBrown [this message]
2012-05-01  0:52               ` [RFC][PATCH 5/8] " Arve Hjønnevåg
2012-05-01  2:18                 ` NeilBrown
2012-05-01  5:33                 ` [PATCH] " Arve Hjønnevåg
2012-05-01  6:28                   ` NeilBrown
2012-05-01 13:51                     ` Rafael J. Wysocki
2012-07-16  6:38                   ` Michael Kerrisk
2012-07-16 11:00                     ` Rafael J. Wysocki
2012-07-16 22:04                       ` Arve Hjønnevåg
2012-07-17  5:14                         ` Michael Kerrisk
2012-07-17 19:22                           ` Rafael J. Wysocki
2012-07-17 19:36                             ` Greg KH
2012-07-17 19:55                               ` Rafael J. Wysocki
2012-07-18  6:41                             ` Michael Kerrisk (man-pages)
2012-04-22 21:23     ` [RFC][PATCH 6/8] PM / Sleep: Implement opportunistic sleep Rafael J. Wysocki
2012-04-26  3:05       ` NeilBrown
2012-04-26 21:52         ` Rafael J. Wysocki
2012-04-27  0:39           ` NeilBrown
2012-04-27 21:22             ` Rafael J. Wysocki
2012-05-03  0:23           ` Arve Hjønnevåg
2012-05-03 13:28             ` Rafael J. Wysocki
2012-05-03 21:27               ` Arve Hjønnevåg
2012-05-03 22:20                 ` Rafael J. Wysocki
2012-05-03 22:16                   ` Arve Hjønnevåg
2012-05-03 22:24                     ` Rafael J. Wysocki
2012-04-22 21:24     ` [RFC][PATCH 7/8] PM / Sleep: Add "prevent autosleep time" statistics to wakeup sources Rafael J. Wysocki
2012-04-22 21:24     ` [RFC][PATCH 8/8] PM / Sleep: Add user space interface for manipulating " Rafael J. Wysocki
2012-04-24  1:35       ` John Stultz
2012-04-24 21:27         ` Rafael J. Wysocki
2012-04-26  6:31           ` NeilBrown
2012-04-26 22:04             ` Rafael J. Wysocki
2012-04-27  0:07               ` NeilBrown
2012-04-27 21:15                 ` Rafael J. Wysocki
2012-04-27  3:57               ` Arve Hjønnevåg
2012-04-27 21:14                 ` Rafael J. Wysocki
2012-04-27 21:17                   ` Arve Hjønnevåg
2012-04-27 21:34                     ` Rafael J. Wysocki
2012-05-03 19:29                       ` [PATCH 0/2]: Kconfig options for wakelocks limit and gc (was: Re: [RFC][PATCH 8/8] PM / Sleep: Add user space ...) Rafael J. Wysocki
2012-05-03 19:30                         ` [PATCH 1/2] PM / Sleep: Make the limit of user space wakeup sources configurable Rafael J. Wysocki
2012-05-03 19:34                         ` [PATCH 2/2] PM / Sleep: User space wakeup sources garbage collector Kconfig option Rafael J. Wysocki
2012-05-03 22:14                         ` [PATCH 0/2]: Kconfig options for wakelocks limit and gc (was: Re: [RFC][PATCH 8/8] PM / Sleep: Add user space ...) Arve Hjønnevåg
2012-05-03 22:20                           ` Rafael J. Wysocki
2012-04-23 16:49     ` [RFC][PATCH 0/8] PM: Implement autosleep and "wake locks", take 3 Greg KH
2012-04-23 19:51       ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120430115819.633c2cb6@notabene.brown \
    --to=neilb@suse.de \
    --cc=arve@android.com \
    --cc=dmitry.torokhov@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=magnus.damm@gmail.com \
    --cc=markgross@thegnar.org \
    --cc=mjg@redhat.com \
    --cc=rjw@sisk.pl \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=stern@rowland.harvard.edu \
    --cc=swetland@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).