From: Jason Baron <jbaron-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org>
To: "Michael Kerrisk (man-pages)"
<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
viro-rfM+Q5joDG/XmaaqVzeoHQ@public.gmane.org,
normalperson-rMlxZR9MS24@public.gmane.org, m@silodev.com,
corbet-T1hC0tSOHrs@public.gmane.org,
luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org,
torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
hagen-GvnIQ6b/HdU@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH] epoll: add exclusive wakeups flag
Date: Mon, 14 Mar 2016 15:32:19 -0400 [thread overview]
Message-ID: <56E711C3.8020008@akamai.com> (raw)
In-Reply-To: <56E6F941.9040307-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
> [Restoring CC, which I see I accidentally dropped, one iteration back.]
>
> Hi Jason,
>
> Thanks for the review. I've tweaked one piece to respond to your
> feedback. But I also have another new question below.
>
> On 03/15/2016 03:55 AM, Jason Baron wrote:
>> On 03/11/2016 06:25 PM, Michael Kerrisk (man-pages) wrote:
>>> On 03/11/2016 09:51 PM, Jason Baron wrote:
>>>> On 03/11/2016 03:30 PM, Michael Kerrisk (man-pages) wrote:
>
> [...]
>
>> Hi Michael,
>>
>> Looks good. One comment below.
>>
>> Thanks,
>>
>>> EPOLLEXCLUSIVE (since Linux 4.5)
>>> Sets an exclusive wakeup mode for the epoll file
>>> descriptor that is being attached to the target file
>>> descriptor, fd. When a wakeup event occurs and multiple
>>> epoll file descriptors are attached to the same target
>>> file using EPOLLEXCLUSIVE, one or more of the epoll file
>>> descriptors will receive an event with epoll_wait(2).
>>> The default in this scenario (when EPOLLEXCLUSIVE is not
>>> set) is for all epoll file descriptors to receive an
>>> event. EPOLLEXCLUSIVE is thus useful for avoiding thun‐
>>> dering herd problems in certain scenarios.
>>>
>>> If the same file descriptor is in multiple epoll
>>> instances, some with the EPOLLEXCLUSIVE flag, and others
>>> without, then events will provided to all epoll
>>> instances that did not specify EPOLLEXCLUSIVE, and at
>>> least one of the epoll instances that did specify
>>> EPOLLEXCLUSIVE.
>>>
>>> The following values may be specified in conjunction
>>> with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and
>>> EPOLLET. EPOLLHUP and EPOLLERR can also be specified,
>>> but are ignored (as usual). Attempts to specify other
>>
>> I'm not sure 'ignored' is the right wording here. 'EPOLLHUP' and
>> 'EPOLERR' are always included in the set of events when something is
>> added as EPOLLEXCLUSIVE. This is consistent with the non-EPOLLEXCLUSIVE
>> add case.
>
> Yes.
>
>> So 'EPOLLHUP' and 'EPOLERR' may be specified but will be
>> included in the set of events on an add, whether they are specified or not.
>
> Yes. I understand your discomfort with the work "ignored", but the
> problem was that, because it made special mention of EPOLLHUP and EPOLLERR,
> your proposed text made it sound as though EPOLLEXCLUSIVE somehow was
> special with respect to these two flags. I wanted to clarify that it is not.
> How about this:
>
> The following values may be specified in conjunction
> with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and
> EPOLLET. EPOLLHUP and EPOLLERR can also be specified,
> but this is not required: as usual, these events are
> always reported if they occur, regardless of whether
> they are specified in events.
> ?
Yes, nothing special here with respect to EPOLLHUP and EPOLLERR. So this
looks fine to me.
>
>>> values in events yield an error. EPOLLEXCLUSIVE may be
>>> used only in an EPOLL_CTL_ADD operation; attempts to
>>> employ it with EPOLL_CTL_MOD yield an error. If
>>> EPOLLEXCLUSIVE has set using epoll_ctl(2), then a subse‐
>>> quent EPOLL_CTL_MOD on the same epfd, fd pair yields an
> b>> error. An epoll_ctl(2) that specifies EPOLLEXCLUSIVE in
>>> events and specifies the target file descriptor fd as an
>>> epoll instance will likewise fail. The error in all of
>>> these cases is EINVAL.
>>>
>>> ERRORS
>>> EINVAL An invalid event type was specified along with EPOLLEX‐
>>> CLUSIVE in events.
>>>
>>> EINVAL op was EPOLL_CTL_MOD and events included EPOLLEXCLUSIVE.
>>>
>>> EINVAL op was EPOLL_CTL_MOD and the EPOLLEXCLUSIVE flag has
>>> previously been applied to this epfd, fd pair.
>>>
>>> EINVAL EPOLLEXCLUSIVE was specified in event and fd is refers
>>> to an epoll instance.
>
> Returning to the second sentence in this description:
>
> When a wakeup event occurs and multiple epoll file descrip‐
> tors are attached to the same target file using EPOLLEXCLU‐
> SIVE, one or more of the epoll file descriptors will
> receive an event with epoll_wait(2).
>
> There is a point that is unclear to me: what does "target file" refer to?
> Is it an open file description (aka open file table entry) or an inode?
> I suspect the former, but it was not clear in your original text.
>
So from epoll's perspective, the wakeups are associated with a 'wait
queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done via
file->poll()) results in adding to the same 'wait queue' then we will
get 'exclusive' wakeup behavior.
So in general, I think the answer here is that its associated with the
inode (I coudn't say with 100% certainty without really looking at all
file->poll() implementations). Certainly, with the 'FIFO' example below,
the two scenarios will have the same behavior with respect to
EPOLLEXCLUSIVE.
Also, the 'non-exclusive' mode would be subject to the same question of
which wait queue is the epfd is associated with...
Thanks,
-Jason
> To make this point even clearer, here are two scenarios I'm thinking of.
> In each case, we're talking of monitoring the read end of a FIFO.
>
> ===
>
> Scenario 1:
>
> We have three processes each of which
> 1. Creates an epoll instance
> 2. Opens the read end of the FIFO
> 3. Adds the read end of the FIFO to the epoll instance, specifying
> EPOLLEXCLUSIVE
>
> When input becomes available on the FIFO, how many processes
> get a wakeup?
>
> ===
>
> Scenario 3
>
> A parent process opens the read end of a FIFO and then calls
> fork() three times to create three children. Each child then:
>
> 1. Creates an epoll instance
> 2. Adds the read end of the FIFO to the epoll instance, specifying
> EPOLLEXCLUSIVE
>
> When input becomes available on the FIFO, how many processes
> get a wakeup?
>
> ===
>
> Cheers,
>
> Michael
>
WARNING: multiple messages have this Message-ID (diff)
From: Jason Baron <jbaron@akamai.com>
To: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: mingo@kernel.org, peterz@infradead.org, viro@ftp.linux.org.uk,
normalperson@yhbt.net, m@silodev.com, corbet@lwn.net,
luto@amacapital.net, torvalds@linux-foundation.org,
hagen@jauu.net, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org
Subject: Re: [PATCH] epoll: add exclusive wakeups flag
Date: Mon, 14 Mar 2016 15:32:19 -0400 [thread overview]
Message-ID: <56E711C3.8020008@akamai.com> (raw)
In-Reply-To: <56E6F941.9040307@gmail.com>
On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
> [Restoring CC, which I see I accidentally dropped, one iteration back.]
>
> Hi Jason,
>
> Thanks for the review. I've tweaked one piece to respond to your
> feedback. But I also have another new question below.
>
> On 03/15/2016 03:55 AM, Jason Baron wrote:
>> On 03/11/2016 06:25 PM, Michael Kerrisk (man-pages) wrote:
>>> On 03/11/2016 09:51 PM, Jason Baron wrote:
>>>> On 03/11/2016 03:30 PM, Michael Kerrisk (man-pages) wrote:
>
> [...]
>
>> Hi Michael,
>>
>> Looks good. One comment below.
>>
>> Thanks,
>>
>>> EPOLLEXCLUSIVE (since Linux 4.5)
>>> Sets an exclusive wakeup mode for the epoll file
>>> descriptor that is being attached to the target file
>>> descriptor, fd. When a wakeup event occurs and multiple
>>> epoll file descriptors are attached to the same target
>>> file using EPOLLEXCLUSIVE, one or more of the epoll file
>>> descriptors will receive an event with epoll_wait(2).
>>> The default in this scenario (when EPOLLEXCLUSIVE is not
>>> set) is for all epoll file descriptors to receive an
>>> event. EPOLLEXCLUSIVE is thus useful for avoiding thun‐
>>> dering herd problems in certain scenarios.
>>>
>>> If the same file descriptor is in multiple epoll
>>> instances, some with the EPOLLEXCLUSIVE flag, and others
>>> without, then events will provided to all epoll
>>> instances that did not specify EPOLLEXCLUSIVE, and at
>>> least one of the epoll instances that did specify
>>> EPOLLEXCLUSIVE.
>>>
>>> The following values may be specified in conjunction
>>> with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and
>>> EPOLLET. EPOLLHUP and EPOLLERR can also be specified,
>>> but are ignored (as usual). Attempts to specify other
>>
>> I'm not sure 'ignored' is the right wording here. 'EPOLLHUP' and
>> 'EPOLERR' are always included in the set of events when something is
>> added as EPOLLEXCLUSIVE. This is consistent with the non-EPOLLEXCLUSIVE
>> add case.
>
> Yes.
>
>> So 'EPOLLHUP' and 'EPOLERR' may be specified but will be
>> included in the set of events on an add, whether they are specified or not.
>
> Yes. I understand your discomfort with the work "ignored", but the
> problem was that, because it made special mention of EPOLLHUP and EPOLLERR,
> your proposed text made it sound as though EPOLLEXCLUSIVE somehow was
> special with respect to these two flags. I wanted to clarify that it is not.
> How about this:
>
> The following values may be specified in conjunction
> with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and
> EPOLLET. EPOLLHUP and EPOLLERR can also be specified,
> but this is not required: as usual, these events are
> always reported if they occur, regardless of whether
> they are specified in events.
> ?
Yes, nothing special here with respect to EPOLLHUP and EPOLLERR. So this
looks fine to me.
>
>>> values in events yield an error. EPOLLEXCLUSIVE may be
>>> used only in an EPOLL_CTL_ADD operation; attempts to
>>> employ it with EPOLL_CTL_MOD yield an error. If
>>> EPOLLEXCLUSIVE has set using epoll_ctl(2), then a subse‐
>>> quent EPOLL_CTL_MOD on the same epfd, fd pair yields an
> b>> error. An epoll_ctl(2) that specifies EPOLLEXCLUSIVE in
>>> events and specifies the target file descriptor fd as an
>>> epoll instance will likewise fail. The error in all of
>>> these cases is EINVAL.
>>>
>>> ERRORS
>>> EINVAL An invalid event type was specified along with EPOLLEX‐
>>> CLUSIVE in events.
>>>
>>> EINVAL op was EPOLL_CTL_MOD and events included EPOLLEXCLUSIVE.
>>>
>>> EINVAL op was EPOLL_CTL_MOD and the EPOLLEXCLUSIVE flag has
>>> previously been applied to this epfd, fd pair.
>>>
>>> EINVAL EPOLLEXCLUSIVE was specified in event and fd is refers
>>> to an epoll instance.
>
> Returning to the second sentence in this description:
>
> When a wakeup event occurs and multiple epoll file descrip‐
> tors are attached to the same target file using EPOLLEXCLU‐
> SIVE, one or more of the epoll file descriptors will
> receive an event with epoll_wait(2).
>
> There is a point that is unclear to me: what does "target file" refer to?
> Is it an open file description (aka open file table entry) or an inode?
> I suspect the former, but it was not clear in your original text.
>
So from epoll's perspective, the wakeups are associated with a 'wait
queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done via
file->poll()) results in adding to the same 'wait queue' then we will
get 'exclusive' wakeup behavior.
So in general, I think the answer here is that its associated with the
inode (I coudn't say with 100% certainty without really looking at all
file->poll() implementations). Certainly, with the 'FIFO' example below,
the two scenarios will have the same behavior with respect to
EPOLLEXCLUSIVE.
Also, the 'non-exclusive' mode would be subject to the same question of
which wait queue is the epfd is associated with...
Thanks,
-Jason
> To make this point even clearer, here are two scenarios I'm thinking of.
> In each case, we're talking of monitoring the read end of a FIFO.
>
> ===
>
> Scenario 1:
>
> We have three processes each of which
> 1. Creates an epoll instance
> 2. Opens the read end of the FIFO
> 3. Adds the read end of the FIFO to the epoll instance, specifying
> EPOLLEXCLUSIVE
>
> When input becomes available on the FIFO, how many processes
> get a wakeup?
>
> ===
>
> Scenario 3
>
> A parent process opens the read end of a FIFO and then calls
> fork() three times to create three children. Each child then:
>
> 1. Creates an epoll instance
> 2. Adds the read end of the FIFO to the epoll instance, specifying
> EPOLLEXCLUSIVE
>
> When input becomes available on the FIFO, how many processes
> get a wakeup?
>
> ===
>
> Cheers,
>
> Michael
>
next prev parent reply other threads:[~2016-03-14 19:32 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-08 3:23 [PATCH] epoll: add exclusive wakeups flag Jason Baron
[not found] ` <cover.1449523436.git.jbaron-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org>
2015-12-08 3:23 ` [PATCH] epoll: add EPOLLEXCLUSIVE flag Jason Baron
2015-12-08 3:23 ` Jason Baron
2016-01-28 7:16 ` [PATCH] epoll: add exclusive wakeups flag Michael Kerrisk (man-pages)
2016-01-28 7:16 ` Michael Kerrisk (man-pages)
2016-01-28 17:57 ` Jason Baron
2016-01-29 8:14 ` Michael Kerrisk (man-pages)
[not found] ` <56AB1F6C.7000609-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-02-01 19:42 ` Jason Baron
2016-02-01 19:42 ` Jason Baron
2016-03-10 18:53 ` Jason Baron
[not found] ` <56E1C2B5.2040905-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org>
2016-03-10 19:47 ` Michael Kerrisk (man-pages)
2016-03-10 19:47 ` Michael Kerrisk (man-pages)
2016-03-10 19:58 ` Michael Kerrisk (man-pages)
2016-03-10 19:58 ` Michael Kerrisk (man-pages)
[not found] ` <56E1D1D7.8040000-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-03-10 20:40 ` Jason Baron
2016-03-10 20:40 ` Jason Baron
[not found] ` <56E1DBC2.6040109-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org>
2016-03-11 20:30 ` Michael Kerrisk (man-pages)
2016-03-11 20:30 ` Michael Kerrisk (man-pages)
[not found] ` <56E32FC5.4030902@akamai.com>
[not found] ` <56E353CF.6050503@gmail.com>
[not found] ` <56E6D0ED.20609@akamai.com>
2016-03-14 17:47 ` Michael Kerrisk (man-pages)
[not found] ` <56E6F941.9040307-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-03-14 19:32 ` Jason Baron [this message]
2016-03-14 19:32 ` Jason Baron
2016-03-14 20:01 ` Michael Kerrisk (man-pages)
2016-03-14 20:01 ` Michael Kerrisk (man-pages)
[not found] ` <56E71894.4090607-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-03-14 21:03 ` Michael Kerrisk (man-pages)
2016-03-14 21:03 ` Michael Kerrisk (man-pages)
2016-03-14 22:35 ` Jason Baron
2016-03-14 23:09 ` Madars Vitolins
[not found] ` <56E73C9B.9060206-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org>
2016-03-14 23:26 ` Michael Kerrisk (man-pages)
2016-03-14 23:26 ` Michael Kerrisk (man-pages)
[not found] ` <56E748B2.4080209-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-03-15 2:36 ` Jason Baron
2016-03-15 2:36 ` Jason Baron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56E711C3.8020008@akamai.com \
--to=jbaron-jqffy2xvxfxqt0dzr+alfa@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=corbet-T1hC0tSOHrs@public.gmane.org \
--cc=hagen-GvnIQ6b/HdU@public.gmane.org \
--cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org \
--cc=m@silodev.com \
--cc=mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=normalperson-rMlxZR9MS24@public.gmane.org \
--cc=peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
--cc=torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=viro-rfM+Q5joDG/XmaaqVzeoHQ@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.