* Re: epoll reporting events when it hasn't been asked to
@ 2004-04-01 18:25 Ben Mansell
2004-04-01 19:28 ` Davide Libenzi
0 siblings, 1 reply; 19+ messages in thread
From: Ben Mansell @ 2004-04-01 18:25 UTC (permalink / raw)
To: linux-kernel
> It is a feature. epoll OR user events with POLLHUP|POLLERR so that even if
> the user sets the event mask to zero, it can still know when something
> like those abnormal condition happened. Which problem do you see with this?
What should the application do if it gets events that it didn't ask for?
If you choose to ignore them, the next time epoll_wait() is called it
will return instantly with these same messages, so the app will spin and
eat CPU.
The alternative is to put some kind of sanity-check wrapper around
epoll_wait() calls, and match the output with what the app asked for.
If epoll starts returning messages that it doesn't want, the only
alternative is to get heavy-handed and try to get epoll to shut up with
EPOLL_CTL_DEL on the file descriptor. But this seems like fighting
against the OS.
Perhaps it should only OR the user event with POLLHUP|POLLERR if
POLLIN or POLLOUT is set?
Ben
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: epoll reporting events when it hasn't been asked to 2004-04-01 18:25 epoll reporting events when it hasn't been asked to Ben Mansell @ 2004-04-01 19:28 ` Davide Libenzi 2004-04-01 23:29 ` Steven Dake 0 siblings, 1 reply; 19+ messages in thread From: Davide Libenzi @ 2004-04-01 19:28 UTC (permalink / raw) To: Ben Mansell; +Cc: Linux Kernel Mailing List On Thu, 1 Apr 2004, Ben Mansell wrote: > > It is a feature. epoll OR user events with POLLHUP|POLLERR so that even if > > the user sets the event mask to zero, it can still know when something > > like those abnormal condition happened. Which problem do you see with this? > > What should the application do if it gets events that it didn't ask for? > If you choose to ignore them, the next time epoll_wait() is called it > will return instantly with these same messages, so the app will spin and > eat CPU. Shouldn't the application handle those exceptional conditions instead of ignoring them? > Perhaps it should only OR the user event with POLLHUP|POLLERR if > POLLIN or POLLOUT is set? This can certainly be done, since it's a one-liner fix. I'm not sure if it is the correct behaviour. Anyone else? - Davide ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: epoll reporting events when it hasn't been asked to 2004-04-01 19:28 ` Davide Libenzi @ 2004-04-01 23:29 ` Steven Dake 2004-04-02 9:04 ` Ben Mansell 0 siblings, 1 reply; 19+ messages in thread From: Steven Dake @ 2004-04-01 23:29 UTC (permalink / raw) To: Davide Libenzi; +Cc: Ben Mansell, Linux Kernel Mailing List On Thu, 2004-04-01 at 12:28, Davide Libenzi wrote: > On Thu, 1 Apr 2004, Ben Mansell wrote: > > > > It is a feature. epoll OR user events with POLLHUP|POLLERR so that even if > > > the user sets the event mask to zero, it can still know when something > > > like those abnormal condition happened. Which problem do you see with this? > > > > What should the application do if it gets events that it didn't ask for? > > If you choose to ignore them, the next time epoll_wait() is called it > > will return instantly with these same messages, so the app will spin and > > eat CPU. > > Shouldn't the application handle those exceptional conditions instead of > ignoring them? > > If an exception occurs (example a socket is disconnected) the socket should be removed from the fd list. There is really no point in passing in an excepted fd. epoll works just like poll and the expected SUS behavior in this regard. Thanks -steve ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: epoll reporting events when it hasn't been asked to 2004-04-01 23:29 ` Steven Dake @ 2004-04-02 9:04 ` Ben Mansell 2004-04-02 15:22 ` Davide Libenzi 0 siblings, 1 reply; 19+ messages in thread From: Ben Mansell @ 2004-04-02 9:04 UTC (permalink / raw) To: Steven Dake; +Cc: Davide Libenzi, Linux Kernel Mailing List On Thu, 1 Apr 2004, Steven Dake wrote: > On Thu, 2004-04-01 at 12:28, Davide Libenzi wrote: > > On Thu, 1 Apr 2004, Ben Mansell wrote: > > > > > > It is a feature. epoll OR user events with POLLHUP|POLLERR so that even if > > > > the user sets the event mask to zero, it can still know when something > > > > like those abnormal condition happened. Which problem do you see with this? > > > > > > What should the application do if it gets events that it didn't ask for? > > > If you choose to ignore them, the next time epoll_wait() is called it > > > will return instantly with these same messages, so the app will spin and > > > eat CPU. > > > > Shouldn't the application handle those exceptional conditions instead of > > ignoring them? > > If an exception occurs (example a socket is disconnected) the socket > should be removed from the fd list. There is really no point in passing > in an excepted fd. Is there any difference, speed-wise, between turning off all events to listen to with EPOLL_MOD, and removing the file descriptor with EPOLL_DEL? I had vaguely assumed that the former would be faster (especially if you might later want to resume listening for events), although that was just a guess. Ben ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: epoll reporting events when it hasn't been asked to 2004-04-02 9:04 ` Ben Mansell @ 2004-04-02 15:22 ` Davide Libenzi 2004-04-02 18:40 ` Is POLLHUP an input-only or bidirectional condition? (was: epoll reporting events when it hasn't been asked to) Jamie Lokier 2004-04-14 17:59 ` epoll reporting events when it hasn't been asked to Dirk Morris 0 siblings, 2 replies; 19+ messages in thread From: Davide Libenzi @ 2004-04-02 15:22 UTC (permalink / raw) To: Ben Mansell; +Cc: Steven Dake, Linux Kernel Mailing List On Fri, 2 Apr 2004, Ben Mansell wrote: > > If an exception occurs (example a socket is disconnected) the socket > > should be removed from the fd list. There is really no point in passing > > in an excepted fd. > > Is there any difference, speed-wise, between turning off all events to > listen to with EPOLL_MOD, and removing the file descriptor with > EPOLL_DEL? I had vaguely assumed that the former would be faster > (especially if you might later want to resume listening for events), > although that was just a guess. It is faster. OTOH nothing prevent you to use your current method. You have only to handle exceptional condition instead of ignoring them. Handling by, for example, removing the fd from the epoll set and unregistering/freeing the associated data structures. IMO we can leave the current behaviour, but if someone sees huge problems with this, the fix is a one-liner. - Davide ^ permalink raw reply [flat|nested] 19+ messages in thread
* Is POLLHUP an input-only or bidirectional condition? (was: epoll reporting events when it hasn't been asked to) 2004-04-02 15:22 ` Davide Libenzi @ 2004-04-02 18:40 ` Jamie Lokier 2004-04-03 12:19 ` Is POLLHUP an input-only or bidirectional condition? Richard Kettlewell 2004-04-03 21:44 ` Is POLLHUP an input-only or bidirectional condition? (was: epoll reporting events when it hasn't been asked to) Davide Libenzi 2004-04-14 17:59 ` epoll reporting events when it hasn't been asked to Dirk Morris 1 sibling, 2 replies; 19+ messages in thread From: Jamie Lokier @ 2004-04-02 18:40 UTC (permalink / raw) To: Davide Libenzi; +Cc: Ben Mansell, Steven Dake, Linux Kernel Mailing List [New thread because I want people who understand POLLHUP to clarify. The parent thread's question was: why does epoll always report POLLHUP and POLLERR conditions even when the program didn't ask for those. The trivial answer is because that's what poll() does.] Davide Libenzi wrote: > Handling by, for example, removing the fd from the epoll set and > unregistering/freeing the associated data structures. IMO we can leave the > current behaviour, but if someone sees huge problems with this, the fix is > a one-liner. None of select, poll or epoll allow a program to ignore POLLERR while checking POLLIN or POLLOUT. So at least epoll is consistent with the other two. It is possible to ignore POLLHUP conditions with select(), but not poll() or epoll. For sockets at least, POLLHUP should indicate the socket is fully closed, so that reading and writing will both fail. Thus it makes sense that POLLHUP is not ignorable, although curiously select() only treats POLLHUP as an _input_ condition, so it won't wake something that's waiting only for output readiness. poll() will always wake even if you're only waiting for POLLOUT. POLLERR is set by UDP sockets with a pending error condition, and that will be reported whether you read or write to the socket (except in some perverse conditions where MSG_MORE has been used - then app state machines could get confused). So it's appropriate for a POLLIN or POLLOUT waiter to be woken when there's a POLLERR condition. Summary: epoll is consistent with poll(). I'm not sure why poll() and select() treat POLLHUP differnently. A poll() for POLLOUT will be woken by a POLLHUP condition, yet a select() for output will _not_ be woken by a POLLHUP condition. Perhaps that indicates some confusion over what POLLHUP is supposed to mean, and when it should be set by devices and/or sockets: is it for input hangup conditions that allow further output, or for total hangup conditions where input and output are both guaranteed to fail? If it's the latter, as it seems to be for sockets, then the poll() and epoll behaviour makes sense, but select() doesn't. If it's the former, then the select() behaviour is the only one that makes sense. Hence my question: does anyone know for sure which POLLHUP behaviour is correct and sensible? -- Jamie ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Is POLLHUP an input-only or bidirectional condition? 2004-04-02 18:40 ` Is POLLHUP an input-only or bidirectional condition? (was: epoll reporting events when it hasn't been asked to) Jamie Lokier @ 2004-04-03 12:19 ` Richard Kettlewell 2004-04-03 21:44 ` Is POLLHUP an input-only or bidirectional condition? (was: epoll reporting events when it hasn't been asked to) Davide Libenzi 1 sibling, 0 replies; 19+ messages in thread From: Richard Kettlewell @ 2004-04-03 12:19 UTC (permalink / raw) To: Jamie Lokier Cc: Ben Mansell, Steven Dake, Linux Kernel Mailing List, Davide Libenzi Jamie Lokier <jamie@shareable.org> writes: > Perhaps that indicates some confusion over what POLLHUP is supposed to > mean, and when it should be set by devices and/or sockets: is it for > input hangup conditions that allow further output, or for total hangup > conditions where input and output are both guaranteed to fail? I spent some time a while ago looking into how various platforms treat POLLHUP. It's rather random... http://www.greenend.org.uk/rjk/2001/06/poll.html -- http://www.greenend.org.uk/rjk/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Is POLLHUP an input-only or bidirectional condition? (was: epoll reporting events when it hasn't been asked to) 2004-04-02 18:40 ` Is POLLHUP an input-only or bidirectional condition? (was: epoll reporting events when it hasn't been asked to) Jamie Lokier 2004-04-03 12:19 ` Is POLLHUP an input-only or bidirectional condition? Richard Kettlewell @ 2004-04-03 21:44 ` Davide Libenzi 2004-04-03 22:35 ` Jamie Lokier 1 sibling, 1 reply; 19+ messages in thread From: Davide Libenzi @ 2004-04-03 21:44 UTC (permalink / raw) To: Jamie Lokier; +Cc: Ben Mansell, Steven Dake, Linux Kernel Mailing List On Fri, 2 Apr 2004, Jamie Lokier wrote: > None of select, poll or epoll allow a program to ignore POLLERR while > checking POLLIN or POLLOUT. So at least epoll is consistent with the > other two. > > It is possible to ignore POLLHUP conditions with select(), but not > poll() or epoll. For sockets at least, POLLHUP should indicate the > socket is fully closed, so that reading and writing will both fail. > Thus it makes sense that POLLHUP is not ignorable, although curiously > select() only treats POLLHUP as an _input_ condition, so it won't wake > something that's waiting only for output readiness. poll() will > always wake even if you're only waiting for POLLOUT. > > POLLERR is set by UDP sockets with a pending error condition, and that > will be reported whether you read or write to the socket (except in > some perverse conditions where MSG_MORE has been used - then app state > machines could get confused). So it's appropriate for a POLLIN or > POLLOUT waiter to be woken when there's a POLLERR condition. > > Summary: epoll is consistent with poll(). I'm not sure why poll() and > select() treat POLLHUP differnently. A poll() for POLLOUT will be > woken by a POLLHUP condition, yet a select() for output will _not_ be > woken by a POLLHUP condition. The issue here was a little bit different. On the contrary to poll(2), with epoll and fd in resident inside the interest set, and epoll allows you to set the interest event mask to 0. In such condition, epoll does report POLLHUP and POLLERR events of the 0 masked fd, and this was the original Ben's argoument. Looking at poll(2) though, it seems that it does the same thing if you set the event mask to 0. So epoll is coherent with poll(2) in this. I personally believe that an application should handle those exceptional events in any case, by simply removing the fd from the epoll set (and lazily freeing the associated userspace data structures). So, if no big argouments will come against this, I'd rather prefer to keep such behaviour. OTOH the patch would be trivial (one or two lines) , so there will be no design problems in doing this. - Davide ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Is POLLHUP an input-only or bidirectional condition? (was: epoll reporting events when it hasn't been asked to) 2004-04-03 21:44 ` Is POLLHUP an input-only or bidirectional condition? (was: epoll reporting events when it hasn't been asked to) Davide Libenzi @ 2004-04-03 22:35 ` Jamie Lokier 2004-04-04 1:28 ` Davide Libenzi 2004-04-04 18:51 ` Ben Mansell 0 siblings, 2 replies; 19+ messages in thread From: Jamie Lokier @ 2004-04-03 22:35 UTC (permalink / raw) To: Davide Libenzi; +Cc: Ben Mansell, Steven Dake, Linux Kernel Mailing List Davide Libenzi wrote: > Looking at poll(2) though, it seems that it does the same thing if > you set the event mask to 0. So epoll is coherent with poll(2) in this. Yes. SUSv3 says POLLHUP, POLLERR and POLLNVAL are always reported even if not requested. > I personally believe that an application should handle those > exceptional events in any case, by simply removing the fd from the > epoll set (and lazily freeing the associated userspace data structures). Take a look at the new subject line :) Linux select() treats it as an input-only condition, implying that there might be useful things you can do with output to a file descriptor that's reporting POLLHUP, including waiting for output. However, SUSv3 says "This event [POLLHUP]and POLLOUT are mutually exclusive; a stream can never be writable if a hangup has occurred", implying that Linux select() is the oddity. > So, if no big argouments will come against this, I'd rather prefer to keep > such behaviour. OTOH the patch would be trivial (one or two lines) , so > there will be no design problems in doing this. I agree, in fact I'd argue specifically against changing it. Programmers familiar with poll() know that you don't have to set POLLHUP in the input mask -- because SUSv3 says so ("This flag [POLLHUP] is only valid in the revents bitmask; it is ignored in the events member"). They'd not be likely to notice a difference that subtle for epoll, when they convert application code, so it's good that there isn't a difference. Btw, I notice epoll never reports POLLNVAL. Is that correct? -- Jamie ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Is POLLHUP an input-only or bidirectional condition? (was: epoll reporting events when it hasn't been asked to) 2004-04-03 22:35 ` Jamie Lokier @ 2004-04-04 1:28 ` Davide Libenzi 2004-04-04 2:08 ` Jamie Lokier 2004-04-04 18:51 ` Ben Mansell 1 sibling, 1 reply; 19+ messages in thread From: Davide Libenzi @ 2004-04-04 1:28 UTC (permalink / raw) To: Jamie Lokier; +Cc: Ben Mansell, Steven Dake, Linux Kernel Mailing List On Sat, 3 Apr 2004, Jamie Lokier wrote: > Btw, I notice epoll never reports POLLNVAL. Is that correct? Yep, epoll does not allow you to push an invalid/unopen file descriptor inside the set. So you get an EBADF from epoll_ctl(). - Davide ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Is POLLHUP an input-only or bidirectional condition? (was: epoll reporting events when it hasn't been asked to) 2004-04-04 1:28 ` Davide Libenzi @ 2004-04-04 2:08 ` Jamie Lokier 2004-04-04 2:49 ` Davide Libenzi 0 siblings, 1 reply; 19+ messages in thread From: Jamie Lokier @ 2004-04-04 2:08 UTC (permalink / raw) To: Davide Libenzi; +Cc: Ben Mansell, Steven Dake, Linux Kernel Mailing List Davide Libenzi wrote: > > Btw, I notice epoll never reports POLLNVAL. Is that correct? > > Yep, epoll does not allow you to push an invalid/unopen file descriptor > inside the set. So you get an EBADF from epoll_ctl(). A comment in eventpoll.c says: * This semaphore is acquired by ep_free() during the epoll file * cleanup path and it is also acquired by eventpoll_release() * if a file has been pushed inside an epoll set and it is then * close()d without a previous call toepoll_ctl(EPOLL_CTL_DEL). I.e. implying that the final close() is possible while it's registered. (Btw, a function called eventpoll_release() doesn't exist). What happens when a file descriptor is closed while it is inside the set? I guess it's simply dropped from the set, is that right? -- Jamie ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Is POLLHUP an input-only or bidirectional condition? (was: epoll reporting events when it hasn't been asked to) 2004-04-04 2:08 ` Jamie Lokier @ 2004-04-04 2:49 ` Davide Libenzi 0 siblings, 0 replies; 19+ messages in thread From: Davide Libenzi @ 2004-04-04 2:49 UTC (permalink / raw) To: Jamie Lokier; +Cc: Ben Mansell, Steven Dake, Linux Kernel Mailing List On Sun, 4 Apr 2004, Jamie Lokier wrote: > A comment in eventpoll.c says: > > * This semaphore is acquired by ep_free() during the epoll file > * cleanup path and it is also acquired by eventpoll_release() > * if a file has been pushed inside an epoll set and it is then > * close()d without a previous call toepoll_ctl(EPOLL_CTL_DEL). > > I.e. implying that the final close() is possible while it's registered. > (Btw, a function called eventpoll_release() doesn't exist). Woops, you're right. The function is inside include/linux/eventpoll.h because it has been split into an inline to handle the fast path, plus the slow path eventpoll_release_file(). I'll send a patch to Andrew to fix comments. > What happens when a file descriptor is closed while it is inside the set? > > I guess it's simply dropped from the set, is that right? Yes, it is automatically removed from the epoll set, iif the underlying file* count goes to zero. - Davide ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Is POLLHUP an input-only or bidirectional condition? (was: epoll reporting events when it hasn't been asked to) 2004-04-03 22:35 ` Jamie Lokier 2004-04-04 1:28 ` Davide Libenzi @ 2004-04-04 18:51 ` Ben Mansell 2004-04-04 19:41 ` Davide Libenzi 2004-04-04 20:24 ` Jamie Lokier 1 sibling, 2 replies; 19+ messages in thread From: Ben Mansell @ 2004-04-04 18:51 UTC (permalink / raw) To: Jamie Lokier; +Cc: Davide Libenzi, Steven Dake, Linux Kernel Mailing List On Sat, 3 Apr 2004, Jamie Lokier wrote: > Davide Libenzi wrote: > > Looking at poll(2) though, it seems that it does the same thing if > > you set the event mask to 0. So epoll is coherent with poll(2) in this. > > Yes. SUSv3 says POLLHUP, POLLERR and POLLNVAL are always reported > even if not requested. Fair enough. However, if you're writing poll()-based code that doesn't want to listen to any events on a fd, you just wouldn't add it into the pollfd array in the first place. Since you have to generate the pollfd array for each time you call poll(), there is no real extra cost in taking a fd out temporarily, and putting it back in later when we care about what is going on with it. With epoll, adding a fd into the epoll set is a separate operation from the epoll_wait(), so if you really don't want to listen for any events on one FD, you'll have to do a EPOLL_DEL, and then later on do a EPOLL_ADD again if you want to bring it back in. Which is a bit nasty and inefficient. It would be nice if there was some shortcut to doing this. I'd still favour epoll only reporting POLLHUP|POLLERR events if the fd was also registered for POLLIN|POLLOUT. It may not be consistent with SUSv3's definition of poll(), but does it matter? epoll is not poll. As Richard Kettlewell's excellent poll test shows, relying on anything but the basics of poll() is impossible if you are trying to write code for several different OSs (or just different versions of the same OS!) Whatever poll() returns, all you can do is force a read() or a write() to try and find out what events really happened. This is not something you'd want to do if the application, by unsetting POLLIN & POLLOUT, has shown that it doesn't want to read() or write(). Ben ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Is POLLHUP an input-only or bidirectional condition? (was: epoll reporting events when it hasn't been asked to) 2004-04-04 18:51 ` Ben Mansell @ 2004-04-04 19:41 ` Davide Libenzi 2004-04-04 20:24 ` Jamie Lokier 1 sibling, 0 replies; 19+ messages in thread From: Davide Libenzi @ 2004-04-04 19:41 UTC (permalink / raw) To: Ben Mansell; +Cc: Jamie Lokier, Steven Dake, Linux Kernel Mailing List On Sun, 4 Apr 2004, Ben Mansell wrote: > With epoll, adding a fd into the epoll set is a separate operation from > the epoll_wait(), so if you really don't want to listen for any events > on one FD, you'll have to do a EPOLL_DEL, and then later on do a > EPOLL_ADD again if you want to bring it back in. Which is a bit nasty > and inefficient. I really fail to see how handling POLLHUP and POLLERR would be a problem, even for fds where you specified a 0 event mask. If you receive them, you remove the fd from the set, and you flag the associated data structure for a lazy removal at the end of the current event loop. Where is the problem here? - Davide ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Is POLLHUP an input-only or bidirectional condition? (was: epoll reporting events when it hasn't been asked to) 2004-04-04 18:51 ` Ben Mansell 2004-04-04 19:41 ` Davide Libenzi @ 2004-04-04 20:24 ` Jamie Lokier 1 sibling, 0 replies; 19+ messages in thread From: Jamie Lokier @ 2004-04-04 20:24 UTC (permalink / raw) To: Ben Mansell; +Cc: Davide Libenzi, Steven Dake, Linux Kernel Mailing List Ben Mansell wrote: > Since you have to generate the pollfd array for each time you call > poll(), there is no real extra cost in taking a fd out temporarily Wrong. You don't have to generate the pollfd array each time. That's why there are separate events and revents fields. Quite often no changes are required between each call to poll(), and only small changes the rest of the time. (Of course there is no escaping the O(n) overhead that the _kernel_ has when it scans the array, but it's avoidable in userspace). > With epoll, adding a fd into the epoll set is a separate operation from > the epoll_wait(), so if you really don't want to listen for any events > on one FD, you'll have to do a EPOLL_DEL, and then later on do a > EPOLL_ADD again if you want to bring it back in. Which is a bit nasty > and inefficient. No. If you don't want to listen for any events, and you predict those events haven't occurred already (POLLHUP, POLLERR, usually POLLIN), don't do any epoll_ctl() operations at all. Just call epoll_wait(). When you receive an event that you didn't want to listen for, set the corresponding flag in your userspace structure, and call EPOLL_CTL_MOD or EPOLL_CTL_DEL, depending on whether there are any other events you still want to listen for. See, your proposed method is slower than mine. I avoid *all* epoll_ctl() calls in the common path. Only in the uncommon path might I process an unwanted POLLHUP or POLLERR event, and in those cases either I may as well close the fd now (POLLHUP, after read to determine if it's EOF or an error), or the EPOLL_CTL_DEL if I want to ignore that fd for a while (POLLERR) is negligable because that's a rare event. > As Richard Kettlewell's excellent poll test shows, relying on anything > but the basics of poll() is impossible if you are trying to write code > for several different OSs (or just different versions of the same OS!) > Whatever poll() returns, all you can do is force a read() or a write() > to try and find out what events really happened. Indeed. Sometimes I wonder why there is anything other than POLLIN and POLLOUT, given that the only reasonable response to the other flags is to call read() to find out what happened. (Then again, maybe read() isn't enough to get error conditions (as flagged by POLLERR) on some broken OSs, and only MSG_ERRQUEUE will report them? I don't know). > This is not something you'd want to do if the application, by > unsetting POLLIN & POLLOUT, has shown that it doesn't want to read() > or write(). Indeed. That's why if you do receive POLLHUP or POLLERR and you're not interested in handling them right now, then _after_ receiving the events call EPOLL_CTL_DEL, not before. That lazy method usually avoids the system call. -- Jamie ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: epoll reporting events when it hasn't been asked to 2004-04-02 15:22 ` Davide Libenzi 2004-04-02 18:40 ` Is POLLHUP an input-only or bidirectional condition? (was: epoll reporting events when it hasn't been asked to) Jamie Lokier @ 2004-04-14 17:59 ` Dirk Morris 2004-04-14 19:39 ` Jamie Lokier 1 sibling, 1 reply; 19+ messages in thread From: Dirk Morris @ 2004-04-14 17:59 UTC (permalink / raw) To: Davide Libenzi; +Cc: Ben Mansell, Steven Dake, Linux Kernel Mailing List Davide Libenzi wrote: >On Fri, 2 Apr 2004, Ben Mansell wrote: > > >>>If an exception occurs (example a socket is disconnected) the socket >>>should be removed from the fd list. There is really no point in passing >>>in an excepted fd. >>> >>> >>Is there any difference, speed-wise, between turning off all events to >>listen to with EPOLL_MOD, and removing the file descriptor with >>EPOLL_DEL? I had vaguely assumed that the former would be faster >>(especially if you might later want to resume listening for events), >>although that was just a guess. >> >> I'd like to weigh in on this issue as I'm having the same issue as Ben. My application doesnt consider these to be exceptional events, but normal expected events, and thus I need them to be handled like normal events. (I can explain more off list if you'd like) So I just want to ignore all events for some time and then deal with any HUP's or ERR's at the appropriate time. When I used poll(), I always accomplished this by leaving this fd out of the poll fd set. This wasnt a huge hit because I basically had to rebuild the poll fd set at every iteration anyway as it changes rapidly. Now I'm switching to epoll, and the great thing about the epoll interface is I don't have to rebuild the entire fd set at every iteration. Like Ben, I'd prefer to be able to disable ALL events on a fd descriptor for some time, instead of removing it entirely. Since with poll I had to rebuild the set anyway, this 'disable' feature wasnt really useful, but would be a nice-to-have for epoll. :)) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: epoll reporting events when it hasn't been asked to 2004-04-14 17:59 ` epoll reporting events when it hasn't been asked to Dirk Morris @ 2004-04-14 19:39 ` Jamie Lokier 2004-04-14 20:21 ` Dirk Morris 0 siblings, 1 reply; 19+ messages in thread From: Jamie Lokier @ 2004-04-14 19:39 UTC (permalink / raw) To: Dirk Morris Cc: Davide Libenzi, Ben Mansell, Steven Dake, Linux Kernel Mailing List Dirk Morris wrote: > I need them to be handled like normal events. (I can explain more off > list if you'd like) Did you read my explanation of how to do this using the present epoll behaviour using _fewer_ syscalls than you are asking for? -- Jamie ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: epoll reporting events when it hasn't been asked to 2004-04-14 19:39 ` Jamie Lokier @ 2004-04-14 20:21 ` Dirk Morris 2004-04-14 21:48 ` Jamie Lokier 0 siblings, 1 reply; 19+ messages in thread From: Dirk Morris @ 2004-04-14 20:21 UTC (permalink / raw) To: Jamie Lokier Cc: Davide Libenzi, Ben Mansell, Steven Dake, Linux Kernel Mailing List Jamie Lokier wrote: >Dirk Morris wrote: > > >>I need them to be handled like normal events. (I can explain more off >>list if you'd like) >> >> > >Did you read my explanation of how to do this using the present epoll >behaviour using _fewer_ syscalls than you are asking for? > > Ah yes, I just went back and read it. From what I understand you're proposing to remove the fd from the set lazily instead of immediately. Which will save system calls in the cases were the HUP/ERR condition does not occur during the 'disabled' time. In my case, which you may choose to disregard, this condition is not irregular or in any way a special case. So the revision you have proposed is just an optimization. You could even use this same optimization with the disable feature (disable it lazily) and get even better performance with the same number of syscalls you proposed. I see no downside, except that it no longer conforms to the semantics of poll and select. Whether or not its worth it to deviate from this behavior over such a detail, I don't know. :) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: epoll reporting events when it hasn't been asked to 2004-04-14 20:21 ` Dirk Morris @ 2004-04-14 21:48 ` Jamie Lokier 0 siblings, 0 replies; 19+ messages in thread From: Jamie Lokier @ 2004-04-14 21:48 UTC (permalink / raw) To: Dirk Morris Cc: Davide Libenzi, Ben Mansell, Steven Dake, Linux Kernel Mailing List Dirk Morris wrote: > From what I understand you're proposing to remove the fd from the set > lazily instead of immediately. > Which will save system calls in the cases were the HUP/ERR condition > does not occur during the 'disabled' time. > > In my case, which you may choose to disregard, this condition is not > irregular or in any way a special case. > So the revision you have proposed is just an optimization. > You could even use this same optimization with the disable feature > (disable it lazily) and get even better performance with the same number > of syscalls you proposed. I don't think you would get any better performance even when HUP/ERR conditions are commonplace. A HUP condition means you cannot read & write from the fd any more, so even though you may defer handling it in userspace, there's nothing to be gained from disabling all epoll events after you receive the HUP: instead of lazy disabling, you might as well delete the fd from epoll as soon as you receive the HUP even though you don't want to handle it yet. (Btw, the comment about HUP in net/ipv4/tcp.c:tcp_poll() is illuminating). ERRs can occur many times while a socket is open so the algorithmic efficiency is worth considering. An ERR condition, at least on a socket, forces you to examine the error before you can perform a further read or write. That's because read and write operations will both check for pending error, so when you know there's an error condition, you know that the next read or write call is really a "tell me the error" call. So, assuming you apply the lazy strategy, after you receive an ERR, in principle you could decide that you want to do nothing with it until the next IN or OUT which represents non-error data readiness, and then you will examine the error code (by doing a read or write call) and then read or write actual data. Then indeed being able to ignore just ERR and still listen for IN and/or OUT would make a difference. But you might as well just read the error condition using MSG_ERRQUEUE, and spend your one system call that way instead - you can still defer the processing of the error code. There is a situation where that is algorithmically not as good as being able to ignore just ERR: The example is when you are receiving a malicious flood of ICMP packets which cause lots of error conditions on a UDP socket (or other similar things), and you want to ignore all of those while efficiently handling a lower rate of non-error data transfer. That's a very unusual situation and it doesn't occur except under attack circumstances with UDP (because real error ICMPs are a response to something you transmitted yourself). Other socket types or devices might give ERR a different meaning which causes them to be common relative to read and write readiness. If so, they probably shouldn't. > I see no downside, except that it no longer conforms to the semantics of > poll and select. > Whether or not its worth it to deviate from this behavior over such a > detail, I don't know. :) I see two downsides. They're not performance downsides, just practical: 1. Currently, you can implement epoll in terms of poll(), if you have an epoll-based program and want to create epoll emulation functions for running on an old kernel. If epoll were extended to permit ignoring HUP/ERR, that would no longer be possible. 2. Perhaps most programs will use a flexible library like libevent or something made for the program. It's possible that library will offer an API which sends the POLLIN/OUT/ERR/HUP bits to the application, and lets the application programmer interpret those bits in whatever way is appropriate. If it becomes easy to ignore HUP when the library works with epoll, applications may accidentally end up depending on that, and will unexpectedly fail when they are run one day on an older system, or even another OS where the same library works by calling poll() or select(). Summary: epoll is fine the way it is. Imho. -- Jamie ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2004-04-14 21:48 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-04-01 18:25 epoll reporting events when it hasn't been asked to Ben Mansell 2004-04-01 19:28 ` Davide Libenzi 2004-04-01 23:29 ` Steven Dake 2004-04-02 9:04 ` Ben Mansell 2004-04-02 15:22 ` Davide Libenzi 2004-04-02 18:40 ` Is POLLHUP an input-only or bidirectional condition? (was: epoll reporting events when it hasn't been asked to) Jamie Lokier 2004-04-03 12:19 ` Is POLLHUP an input-only or bidirectional condition? Richard Kettlewell 2004-04-03 21:44 ` Is POLLHUP an input-only or bidirectional condition? (was: epoll reporting events when it hasn't been asked to) Davide Libenzi 2004-04-03 22:35 ` Jamie Lokier 2004-04-04 1:28 ` Davide Libenzi 2004-04-04 2:08 ` Jamie Lokier 2004-04-04 2:49 ` Davide Libenzi 2004-04-04 18:51 ` Ben Mansell 2004-04-04 19:41 ` Davide Libenzi 2004-04-04 20:24 ` Jamie Lokier 2004-04-14 17:59 ` epoll reporting events when it hasn't been asked to Dirk Morris 2004-04-14 19:39 ` Jamie Lokier 2004-04-14 20:21 ` Dirk Morris 2004-04-14 21:48 ` Jamie Lokier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox