* Re: epoll design problems with common fork/exec patterns
[not found] ` <Pine.LNX.4.64.0710291147180.3387-GPJ85BhbkB8RepQJljzAVbITYcZ0+W3JAL8bYrjMMd8@public.gmane.org>
@ 2008-02-26 15:13 ` Michael Kerrisk
[not found] ` <47C42CA7.4030607-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Michael Kerrisk @ 2008-02-26 15:13 UTC (permalink / raw)
To: Davide Libenzi
Cc: David Schwartz, dada1-fPLkHRcR87vqlBn2x/YWAg,
Chris "?" Heath, Linux-Kernel@Vger. Kernel. Org,
linux-man-u79uwXL29TY76Z2rM5mHXA
Davide Libenzi wrote:
> On Sun, 28 Oct 2007, David Schwartz wrote:
>
>> Eric Dumazet wrote:
>>
>>> Events are not necessarly reported "by descriptors". epoll uses an opaque
>>> field provided by the user.
>>>
>>> It's up to the user to properly chose a tag that will makes sense
>>> if the user
>>> app is playing dup()/close() games for example.
>> Great. So the only issue then is that the documentation is confusing. It
>> frequently uses the term "fd" where it means file. For example, it says:
>>
>> Q1 What happens if you add the same fd to an
>> epoll_set
>> twice?
>>
>> A1 You will probably get EEXIST. However, it is
>> possible
>> that two threads may add the same fd twice. This is
>> a
>> harmless condition.
>>
>> This gives no reason to think there's anything wrong with adding the same
>> file twice so long as you do so through different descriptors. (One can
>> imagine an application that does this to segregate read and write operations
>> to avoid a race where the descriptor is closed from under a writer due to
>> handling a fatal read error.) Obviously, that won't work.
>
> I agree, that is confusing. However, you can safely add two different file
> descriptors pointing to the same file*, with different event masks, and
> that will work as expected.
So can I summarize what I understand:
a) Adding the same file descriptor twice to an epoll set will cause an
error (EEXIST).
b) In a separate message to linux-man, Chris Heath says that two threads
*can't* add the same fd twice to an epoll set, despite what the existing
man page text says. I haven't tested that, but it sounds to me as though
it is likely to be true. Can you comment please Davide?
c) It is possible to add duplicated file descriptors referring to the same
underlying open file description ("file *"). As you note, this can be a
useful filtering technique, if the two file descriptors specify different
masks.
Assuming that is all correct, for man-pages-2.79, I've reworked the text
for Q1/A1 as follows:
Q1 What happens if you add the same file descriptor
to an epoll set twice?
A1 You will probably get EEXIST. However, it is pos-
sible to add a duplicate (dup(2), dup2(2),
fcntl(2) F_DUPFD, fork(2)) descriptor to the same
epoll set. This can be a useful technique for
filtering events, if the duplicate file descrip-
tors are registered with different events masks.
Seem okay Davide?
Cheers,
Michael
PS I've trimmed the part of this thread about Q6/A6, since I dealt with
that in another thread ("epoll and shared fd's").
--
Michael Kerrisk
Maintainer of the Linux man-pages project
http://www.kernel.org/doc/man-pages/
Want to report a man-pages bug? Look here:
http://www.kernel.org/doc/man-pages/reporting_bugs.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: epoll design problems with common fork/exec patterns
[not found] ` <47C42CA7.4030607-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2008-02-26 18:51 ` Davide Libenzi
[not found] ` <Pine.LNX.4.64.0802261049040.27243-GPJ85BhbkB8RepQJljzAVbITYcZ0+W3JAL8bYrjMMd8@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Davide Libenzi @ 2008-02-26 18:51 UTC (permalink / raw)
To: Michael Kerrisk
Cc: David Schwartz, dada1-fPLkHRcR87vqlBn2x/YWAg,
Chris "?" Heath, Linux-Kernel@Vger. Kernel. Org,
linux-man-u79uwXL29TY76Z2rM5mHXA
On Tue, 26 Feb 2008, Michael Kerrisk wrote:
> Davide Libenzi wrote:
> > On Sun, 28 Oct 2007, David Schwartz wrote:
> >
> >> Eric Dumazet wrote:
> >>
> >>> Events are not necessarly reported "by descriptors". epoll uses an opaque
> >>> field provided by the user.
> >>>
> >>> It's up to the user to properly chose a tag that will makes sense
> >>> if the user
> >>> app is playing dup()/close() games for example.
> >> Great. So the only issue then is that the documentation is confusing. It
> >> frequently uses the term "fd" where it means file. For example, it says:
> >>
> >> Q1 What happens if you add the same fd to an
> >> epoll_set
> >> twice?
> >>
> >> A1 You will probably get EEXIST. However, it is
> >> possible
> >> that two threads may add the same fd twice. This is
> >> a
> >> harmless condition.
> >>
> >> This gives no reason to think there's anything wrong with adding the same
> >> file twice so long as you do so through different descriptors. (One can
> >> imagine an application that does this to segregate read and write operations
> >> to avoid a race where the descriptor is closed from under a writer due to
> >> handling a fatal read error.) Obviously, that won't work.
> >
> > I agree, that is confusing. However, you can safely add two different file
> > descriptors pointing to the same file*, with different event masks, and
> > that will work as expected.
>
> So can I summarize what I understand:
>
> a) Adding the same file descriptor twice to an epoll set will cause an
> error (EEXIST).
Yes.
> b) In a separate message to linux-man, Chris Heath says that two threads
> *can't* add the same fd twice to an epoll set, despite what the existing
> man page text says. I haven't tested that, but it sounds to me as though
> it is likely to be true. Can you comment please Davide?
Yes, you can't add the same fd twice. Think about a DB where "file*,fd" is
the key.
> c) It is possible to add duplicated file descriptors referring to the same
> underlying open file description ("file *"). As you note, this can be a
> useful filtering technique, if the two file descriptors specify different
> masks.
>
> Assuming that is all correct, for man-pages-2.79, I've reworked the text
> for Q1/A1 as follows:
>
> Q1 What happens if you add the same file descriptor
> to an epoll set twice?
>
> A1 You will probably get EEXIST. However, it is pos-
> sible to add a duplicate (dup(2), dup2(2),
> fcntl(2) F_DUPFD, fork(2)) descriptor to the same
> epoll set. This can be a useful technique for
> filtering events, if the duplicate file descrip-
> tors are registered with different events masks.
>
> Seem okay Davide?
Looks sane to me.
- Davide
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: epoll design problems with common fork/exec patterns
[not found] ` <Pine.LNX.4.64.0802261049040.27243-GPJ85BhbkB8RepQJljzAVbITYcZ0+W3JAL8bYrjMMd8@public.gmane.org>
@ 2008-02-27 1:30 ` Chris "ク" Heath
[not found] ` <1204075804.5238.7.camel-DBi1IKlRe8YXiSwHZUBl+UgmxNRb6L7S@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Chris "ク" Heath @ 2008-02-27 1:30 UTC (permalink / raw)
To: Davide Libenzi
Cc: Michael Kerrisk, David Schwartz, dada1-fPLkHRcR87vqlBn2x/YWAg,
Linux-Kernel@Vger. Kernel. Org, linux-man-u79uwXL29TY76Z2rM5mHXA
On Tue, 2008-02-26 at 10:51 -0800, Davide Libenzi wrote:
> On Tue, 26 Feb 2008, Michael Kerrisk wrote:
>
> > Davide Libenzi wrote:
> > > On Sun, 28 Oct 2007, David Schwartz wrote:
> > >
> > >> Eric Dumazet wrote:
> > >>
> > >>> Events are not necessarly reported "by descriptors". epoll uses an opaque
> > >>> field provided by the user.
> > >>>
> > >>> It's up to the user to properly chose a tag that will makes sense
> > >>> if the user
> > >>> app is playing dup()/close() games for example.
> > >> Great. So the only issue then is that the documentation is confusing. It
> > >> frequently uses the term "fd" where it means file. For example, it says:
> > >>
> > >> Q1 What happens if you add the same fd to an
> > >> epoll_set
> > >> twice?
> > >>
> > >> A1 You will probably get EEXIST. However, it is
> > >> possible
> > >> that two threads may add the same fd twice. This is
> > >> a
> > >> harmless condition.
> > >>
> > >> This gives no reason to think there's anything wrong with adding the same
> > >> file twice so long as you do so through different descriptors. (One can
> > >> imagine an application that does this to segregate read and write operations
> > >> to avoid a race where the descriptor is closed from under a writer due to
> > >> handling a fatal read error.) Obviously, that won't work.
> > >
> > > I agree, that is confusing. However, you can safely add two different file
> > > descriptors pointing to the same file*, with different event masks, and
> > > that will work as expected.
> >
> > So can I summarize what I understand:
> >
> > a) Adding the same file descriptor twice to an epoll set will cause an
> > error (EEXIST).
>
> Yes.
>
>
>
> > b) In a separate message to linux-man, Chris Heath says that two threads
> > *can't* add the same fd twice to an epoll set, despite what the existing
> > man page text says. I haven't tested that, but it sounds to me as though
> > it is likely to be true. Can you comment please Davide?
>
> Yes, you can't add the same fd twice. Think about a DB where "file*,fd" is
> the key.
To clarify, the key appears to be file* plus the user-space integer that
represents the fd.
> > c) It is possible to add duplicated file descriptors referring to the same
> > underlying open file description ("file *"). As you note, this can be a
> > useful filtering technique, if the two file descriptors specify different
> > masks.
> >
> > Assuming that is all correct, for man-pages-2.79, I've reworked the text
> > for Q1/A1 as follows:
> >
> > Q1 What happens if you add the same file descriptor
> > to an epoll set twice?
> >
> > A1 You will probably get EEXIST. However, it is pos-
> > sible to add a duplicate (dup(2), dup2(2),
> > fcntl(2) F_DUPFD, fork(2)) descriptor to the same
> > epoll set. This can be a useful technique for
> > filtering events, if the duplicate file descrip-
> > tors are registered with different events masks.
> >
> > Seem okay Davide?
>
> Looks sane to me.
I think fork(2) should not be in the above list. fork(2) duplicates the
kernel's fd, but the user-space integer that represents the fd remains
the same, so you will get EEXIST if you try to add the fd that was
duplicated by fork.
Chris
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: epoll design problems with common fork/exec patterns
[not found] ` <1204075804.5238.7.camel-DBi1IKlRe8YXiSwHZUBl+UgmxNRb6L7S@public.gmane.org>
@ 2008-02-27 19:35 ` Davide Libenzi
[not found] ` <Pine.LNX.4.64.0802271131180.377-GPJ85BhbkB8RepQJljzAVbITYcZ0+W3JAL8bYrjMMd8@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Davide Libenzi @ 2008-02-27 19:35 UTC (permalink / raw)
To: Chris "ク" Heath
Cc: Michael Kerrisk, David Schwartz, dada1-fPLkHRcR87vqlBn2x/YWAg,
Linux-Kernel@Vger. Kernel. Org, linux-man-u79uwXL29TY76Z2rM5mHXA
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN, Size: 1654 bytes --]
On Tue, 26 Feb 2008, Chris "ã~B¯" Heath wrote:
> On Tue, 2008-02-26 at 10:51 -0800, Davide Libenzi wrote:
> >
> > Yes, you can't add the same fd twice. Think about a DB where "file*,fd" is
> > the key.
>
> To clarify, the key appears to be file* plus the user-space integer that
> represents the fd.
Yes, that's what I said.
> > > c) It is possible to add duplicated file descriptors referring to the same
> > > underlying open file description ("file *"). As you note, this can be a
> > > useful filtering technique, if the two file descriptors specify different
> > > masks.
> > >
> > > Assuming that is all correct, for man-pages-2.79, I've reworked the text
> > > for Q1/A1 as follows:
> > >
> > > Q1 What happens if you add the same file descriptor
> > > to an epoll set twice?
> > >
> > > A1 You will probably get EEXIST. However, it is pos-
> > > sible to add a duplicate (dup(2), dup2(2),
> > > fcntl(2) F_DUPFD, fork(2)) descriptor to the same
> > > epoll set. This can be a useful technique for
> > > filtering events, if the duplicate file descrip-
> > > tors are registered with different events masks.
> > >
> > > Seem okay Davide?
> >
> > Looks sane to me.
>
> I think fork(2) should not be in the above list. fork(2) duplicates the
> kernel's fd, but the user-space integer that represents the fd remains
> the same, so you will get EEXIST if you try to add the fd that was
> duplicated by fork.
Good catch, fork(2) should not be there.
- Davide
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: epoll design problems with common fork/exec patterns
[not found] ` <Pine.LNX.4.64.0802271131180.377-GPJ85BhbkB8RepQJljzAVbITYcZ0+W3JAL8bYrjMMd8@public.gmane.org>
@ 2008-02-28 13:12 ` Michael Kerrisk
[not found] ` <cfd18e0f0802280512q43a457d0sc9a8dc83c51e8e1c-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Michael Kerrisk @ 2008-02-28 13:12 UTC (permalink / raw)
To: Davide Libenzi
Cc: Chris "ク" Heath, David Schwartz,
dada1-fPLkHRcR87vqlBn2x/YWAg, Linux-Kernel@Vger. Kernel. Org,
linux-man-u79uwXL29TY76Z2rM5mHXA
On Wed, Feb 27, 2008 at 8:35 PM, Davide Libenzi <davidel-AhlLAIvw+VFwn5V/fVEdqA@public.gmane.orgg> wrote:
> On Tue, 26 Feb 2008, Chris "ã~B¯" Heath wrote:
>
> > On Tue, 2008-02-26 at 10:51 -0800, Davide Libenzi wrote:
> > >
>
> > > Yes, you can't add the same fd twice. Think about a DB where "file*,fd" is
> > > the key.
> >
> > To clarify, the key appears to be file* plus the user-space integer that
> > represents the fd.
>
> Yes, that's what I said.
>
> > > > c) It is possible to add duplicated file descriptors referring to the same
> > > > underlying open file description ("file *"). As you note, this can be a
> > > > useful filtering technique, if the two file descriptors specify different
> > > > masks.
> > > >
> > > > Assuming that is all correct, for man-pages-2.79, I've reworked the text
> > > > for Q1/A1 as follows:
> > > >
> > > > Q1 What happens if you add the same file descriptor
> > > > to an epoll set twice?
> > > >
> > > > A1 You will probably get EEXIST. However, it is pos-
> > > > sible to add a duplicate (dup(2), dup2(2),
> > > > fcntl(2) F_DUPFD, fork(2)) descriptor to the same
> > > > epoll set. This can be a useful technique for
> > > > filtering events, if the duplicate file descrip-
> > > > tors are registered with different events masks.
> > > >
> > > > Seem okay Davide?
> > >
> > > Looks sane to me.
> >
> > I think fork(2) should not be in the above list. fork(2) duplicates the
> > kernel's fd, but the user-space integer that represents the fd remains
> > the same, so you will get EEXIST if you try to add the fd that was
> > duplicated by fork.
>
> Good catch, fork(2) should not be there.
Okay -- removed.
But it is an ugly inconsistency. On the one hand, a child process
cannot add the duplicate file descriptor to the epoll set. (In every
other case that I can think of , descriptors duplicated by fork have
similar semantics to descriptors duplicated by dup() and friends.) On
the other hand, the very fact that the child has a duplicate of the
descriptor means that even if the parent closes its descriptor, then
epoll_wait() in the parent will continue to receive notifications for
that descriptor because of the duplicated descriptor in the child.
The choice of [file *, fd] as the key for epoll sets really does seem
unfortunate. Keying on [pid, fd] would have given saner semantics, it
seems to me. Obviously it can't be changed now though.
Cheers,
Michael
--
Michael Kerrisk
Maintainer of the Linux man-pages project
http://www.kernel.org/doc/man-pages/
Want to report a man-pages bug? Look here:
http://www.kernel.org/doc/man-pages/reporting_bugs.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: epoll design problems with common fork/exec patterns
[not found] ` <cfd18e0f0802280512q43a457d0sc9a8dc83c51e8e1c-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-02-28 13:23 ` Michael Kerrisk
[not found] ` <cfd18e0f0802280523p1bdacfc5s274387f8280238c8-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-02-28 19:23 ` Davide Libenzi
1 sibling, 1 reply; 13+ messages in thread
From: Michael Kerrisk @ 2008-02-28 13:23 UTC (permalink / raw)
To: Davide Libenzi
Cc: Chris "ク" Heath, David Schwartz,
dada1-fPLkHRcR87vqlBn2x/YWAg, Linux-Kernel@Vger. Kernel. Org,
linux-man-u79uwXL29TY76Z2rM5mHXA
On Thu, Feb 28, 2008 at 2:12 PM, Michael Kerrisk
<mtk.manpages-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:
>
> On Wed, Feb 27, 2008 at 8:35 PM, Davide Libenzi <davidel@xmailserver.org> wrote:
> > On Tue, 26 Feb 2008, Chris "ã~B¯" Heath wrote:
> >
> > > On Tue, 2008-02-26 at 10:51 -0800, Davide Libenzi wrote:
> > > >
> >
> > > > Yes, you can't add the same fd twice. Think about a DB where "file*,fd" is
> > > > the key.
> > >
> > > To clarify, the key appears to be file* plus the user-space integer that
> > > represents the fd.
> >
> > Yes, that's what I said.
> >
> > > > > c) It is possible to add duplicated file descriptors referring to the same
> > > > > underlying open file description ("file *"). As you note, this can be a
> > > > > useful filtering technique, if the two file descriptors specify different
> > > > > masks.
> > > > >
> > > > > Assuming that is all correct, for man-pages-2.79, I've reworked the text
> > > > > for Q1/A1 as follows:
> > > > >
> > > > > Q1 What happens if you add the same file descriptor
> > > > > to an epoll set twice?
> > > > >
> > > > > A1 You will probably get EEXIST. However, it is pos-
> > > > > sible to add a duplicate (dup(2), dup2(2),
> > > > > fcntl(2) F_DUPFD, fork(2)) descriptor to the same
> > > > > epoll set. This can be a useful technique for
> > > > > filtering events, if the duplicate file descrip-
> > > > > tors are registered with different events masks.
> > > > >
> > > > > Seem okay Davide?
> > > >
> > > > Looks sane to me.
> > >
> > > I think fork(2) should not be in the above list. fork(2) duplicates the
> > > kernel's fd, but the user-space integer that represents the fd remains
> > > the same, so you will get EEXIST if you try to add the fd that was
> > > duplicated by fork.
> >
> > Good catch, fork(2) should not be there.
>
> Okay -- removed.
>
> But it is an ugly inconsistency. On the one hand, a child process
> cannot add the duplicate file descriptor to the epoll set. (In every
> other case that I can think of , descriptors duplicated by fork have
> similar semantics to descriptors duplicated by dup() and friends.) On
> the other hand, the very fact that the child has a duplicate of the
> descriptor means that even if the parent closes its descriptor, then
> epoll_wait() in the parent will continue to receive notifications for
> that descriptor because of the duplicated descriptor in the child.
>
> The choice of [file *, fd] as the key for epoll sets really does seem
> unfortunate. Keying on [pid, fd] would have given saner semantics, it
> seems to me. Obviously it can't be changed now though.
Davide,
with the earlier discussion in this thread in mind, I added a Q0/A0 to
epoll.7, just make the point about keys clear:
Q0 What is the key used to distinguish the file descrip-
tors in an epoll set?
A0 The key is the combination of the file descriptor
number and the open file description (also known as
"open file handle", the kernel's internal representa-
tion of an open file).
Does that seem okay?
Cheers,
Michael
--
Michael Kerrisk
Maintainer of the Linux man-pages project
http://www.kernel.org/doc/man-pages/
Want to report a man-pages bug? Look here:
http://www.kernel.org/doc/man-pages/reporting_bugs.html
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: epoll design problems with common fork/exec patterns
[not found] ` <cfd18e0f0802280512q43a457d0sc9a8dc83c51e8e1c-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-02-28 13:23 ` Michael Kerrisk
@ 2008-02-28 19:23 ` Davide Libenzi
[not found] ` <Pine.LNX.4.64.0802281106050.7660-GPJ85BhbkB8RepQJljzAVbITYcZ0+W3JAL8bYrjMMd8@public.gmane.org>
1 sibling, 1 reply; 13+ messages in thread
From: Davide Libenzi @ 2008-02-28 19:23 UTC (permalink / raw)
To: Michael Kerrisk
Cc: =?X-UNKNOWN?Q?Chris_=22=A5=AF=22_Heath?=, David Schwartz,
dada1-fPLkHRcR87vqlBn2x/YWAg, Linux-Kernel@Vger. Kernel. Org,
linux-man-u79uwXL29TY76Z2rM5mHXA
On Thu, 28 Feb 2008, Michael Kerrisk wrote:
> But it is an ugly inconsistency. On the one hand, a child process
> cannot add the duplicate file descriptor to the epoll set. (In every
> other case that I can think of , descriptors duplicated by fork have
> similar semantics to descriptors duplicated by dup() and friends.) On
> the other hand, the very fact that the child has a duplicate of the
> descriptor means that even if the parent closes its descriptor, then
> epoll_wait() in the parent will continue to receive notifications for
> that descriptor because of the duplicated descriptor in the child.
Have you ever tried to think what it means for different *processes*
sharing a single epoll fd and doing epoll_wait() over it?
Most common case is a single event fetch thread plus dispatch. Going to
epoll_wait() over a single epoll fd from many *threads* is very much
possible, but requires care (news at 11, system software development
requires care too).
Sharing a single epoll fd (by the means of any process sharing it doing
add/wait) from different *processes* makes almost no sense at all.
"a child process cannot add the duplicate file descriptor to the epoll
set" ... how do you expect the parent (that doesn't even have the new fd
mapped) to react to such events?
If the next question is "But then why we made the epoll fd inheritable?",
the answer is, because it makes sense in many cases for a parent to hand
over an fd set to a child.
> The choice of [file *, fd] as the key for epoll sets really does seem
> unfortunate. Keying on [pid, fd] would have given saner semantics, it
> seems to me. Obviously it can't be changed now though.
I think we already went over this, and I think I clearly explained you the
reasons of not hooking into sys_close.
- Davide
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: epoll design problems with common fork/exec patterns
[not found] ` <cfd18e0f0802280523p1bdacfc5s274387f8280238c8-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-02-28 19:34 ` Davide Libenzi
0 siblings, 0 replies; 13+ messages in thread
From: Davide Libenzi @ 2008-02-28 19:34 UTC (permalink / raw)
To: Michael Kerrisk
Cc: =?X-UNKNOWN?Q?Chris_=22=A5=AF=22_Heath?=, David Schwartz,
dada1-fPLkHRcR87vqlBn2x/YWAg, Linux-Kernel@Vger. Kernel. Org,
linux-man-u79uwXL29TY76Z2rM5mHXA
On Thu, 28 Feb 2008, Michael Kerrisk wrote:
> Davide,
>
> with the earlier discussion in this thread in mind, I added a Q0/A0 to
> epoll.7, just make the point about keys clear:
>
>
> Q0 What is the key used to distinguish the file descrip-
> tors in an epoll set?
>
> A0 The key is the combination of the file descriptor
> number and the open file description (also known as
> "open file handle", the kernel's internal representa-
> tion of an open file).
>
> Does that seem okay?
Looks fine to me! We need to clarify better Q6 WRT fork().
- Davide
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: epoll design problems with common fork/exec patterns
[not found] ` <Pine.LNX.4.64.0802281106050.7660-GPJ85BhbkB8RepQJljzAVbITYcZ0+W3JAL8bYrjMMd8@public.gmane.org>
@ 2008-02-29 15:46 ` Michael Kerrisk
[not found] ` <cfd18e0f0802290746p3cb7efc9j72394cd77ff37829-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Michael Kerrisk @ 2008-02-29 15:46 UTC (permalink / raw)
To: Davide Libenzi
Cc: Chris "¥¯" Heath, David Schwartz,
dada1-fPLkHRcR87vqlBn2x/YWAg, Linux-Kernel@Vger. Kernel. Org,
linux-man-u79uwXL29TY76Z2rM5mHXA
Hi Davide,
On Thu, Feb 28, 2008 at 8:23 PM, Davide Libenzi <davidel-AhlLAIvw+VEjIGhXcJzhZg@public.gmane.org> wrote:
> On Thu, 28 Feb 2008, Michael Kerrisk wrote:
>
> > But it is an ugly inconsistency. On the one hand, a child process
> > cannot add the duplicate file descriptor to the epoll set. (In every
> > other case that I can think of , descriptors duplicated by fork have
> > similar semantics to descriptors duplicated by dup() and friends.) On
> > the other hand, the very fact that the child has a duplicate of the
> > descriptor means that even if the parent closes its descriptor, then
> > epoll_wait() in the parent will continue to receive notifications for
> > that descriptor because of the duplicated descriptor in the child.
>
> Have you ever tried to think what it means for different *processes*
> sharing a single epoll fd and doing epoll_wait() over it?
As I think is clear, I've only given it very limited thought ;-).
The point is that the existing implementation actually supports
"different *processes* sharing a single epoll fd and doing
epoll_wait() over it", but the semantics are unintuitive. It may be
that the existing implementation was the best way of doing things.
But when I see the strange corner cases in the semantics, I can't help
but wonder (way too late), whether there might have been some other
way of implementing things that led to more intuitive semantics.
> Most common case is a single event fetch thread plus dispatch. Going to
> epoll_wait() over a single epoll fd from many *threads* is very much
> possible, but requires care (news at 11, system software development
> requires care too).
> Sharing a single epoll fd (by the means of any process sharing it doing
> add/wait) from different *processes* makes almost no sense at all.
>
> "a child process cannot add the duplicate file descriptor to the epoll
> set" ... how do you expect the parent (that doesn't even have the new fd
> mapped) to react to such events?
(Not sure if you missed my meaning here. Of course the parent already
has the fd mapped; it's the fd that the child inherited. Anyway, my
real point was that while descriptors duplicated by fork() are
normally semantically similar to other duplicated descriptors, when it
comes to epoll they are not -- and that has the potential to surprise
users.)
> If the next question is "But then why we made the epoll fd inheritable?",
> the answer is, because it makes sense in many cases for a parent to hand
> over an fd set to a child.
Fair enough.
So here's an idea about how things might alternatively have been done:
a) The key for epoll entries could have been [file *, fd, PID]
b) an epoll_wait() only returns events for fds where the PID maps that
of the caller.
c) a close of a file descriptor removes the corresponding [file *,
fd, PID] from the epoll set.
d) when a fork() is done, then the epoll set has a new set of keys
added. These are duplicates of the [file *, fd, PID] entries for the
parent, but with the PID of the child substituted into the new keys.
Say the parent had PID 1000, and the child has PID 2000. If the epoll
set initially contained:
[X, 3, 1000]
[Y, 4, 1000]
then after fork() we'd have:
[X, 3, 1000]
[Y, 4, 1000]
[X, 3, 2000]
[Y, 4, 2000]
There is of course room for debate about the efficiency of this
approach, I suppose. But it seems to me (and perhaps I've missed a
number of things) that that could have given sane semantics with
respect to fork(), duplicated descriptors, and close(). Furthermore,
it would have allowed us to sanely support "different *processes*
sharing a single epoll fd and doing epoll_wait() over it".
Of course, this is all academic now: we can't change the ABI.
> > The choice of [file *, fd] as the key for epoll sets really does seem
> > unfortunate. Keying on [pid, fd] would have given saner semantics, it
> > seems to me. Obviously it can't be changed now though.
>
> I think we already went over this, and I think I clearly explained you the
> reasons of not hooking into sys_close.
You said elsewhere:
[[
That'd mean placing an eventpoll custom hook into sys_close(). Looks very
bad to me, and probably will look even worse to other kernel folks.
Is not much a performance issue (a check to see if a file* is an eventpoll
file is as easy as comparing the f_op pointer), but a design/style issue.
]]
But that wasn't very clear to me actually. I note that filp_close()
already has special case handling for dnotify (R.I.P.) and fcntl()
)aka POSIX) file locks, so there was already precedent for a custom
hook, AFAICS, and epoll is at least as worthy of special treatment as
either of those cases.
Cheers,
Michael
--
Michael Kerrisk
Maintainer of the Linux man-pages project
http://www.kernel.org/doc/man-pages/
Want to report a man-pages bug? Look here:
http://www.kernel.org/doc/man-pages/reporting_bugs.html
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: epoll design problems with common fork/exec patterns
[not found] ` <cfd18e0f0802290746p3cb7efc9j72394cd77ff37829-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-02-29 19:19 ` Davide Libenzi
2008-02-29 19:54 ` Michael Kerrisk
0 siblings, 1 reply; 13+ messages in thread
From: Davide Libenzi @ 2008-02-29 19:19 UTC (permalink / raw)
To: Michael Kerrisk
Cc: =?X-UNKNOWN?Q?Chris_=22=A5=AF=22_Heath?=, David Schwartz,
dada1-fPLkHRcR87vqlBn2x/YWAg, Linux-Kernel@Vger. Kernel. Org,
linux-man-u79uwXL29TY76Z2rM5mHXA
On Fri, 29 Feb 2008, Michael Kerrisk wrote:
> As I think is clear, I've only given it very limited thought ;-).
>
> The point is that the existing implementation actually supports
> "different *processes* sharing a single epoll fd and doing
> epoll_wait() over it", but the semantics are unintuitive. It may be
> that the existing implementation was the best way of doing things.
> But when I see the strange corner cases in the semantics, I can't help
> but wonder (way too late), whether there might have been some other
> way of implementing things that led to more intuitive semantics.
Oh boy. The fact that you can have an epoll fd cross the fork boundary,
does not mean that any indiscriminate use of it leads to sane results:
efd = epoll_create();
fork();
pipe(fds);
epoll_ctl(efd, ADD, fds[0]);
epoll_wait(); ????
...
pipe(fds);
epoll_ctl(efd, ADD, fds[0]);
epoll_wait(); ????
It is *NOT* a matter of semantics.
> > If the next question is "But then why we made the epoll fd inheritable?",
> > the answer is, because it makes sense in many cases for a parent to hand
> > over an fd set to a child.
>
> Fair enough.
>
> So here's an idea about how things might alternatively have been done:
>
> a) The key for epoll entries could have been [file *, fd, PID]
>
> b) an epoll_wait() only returns events for fds where the PID maps that
> of the caller.
>
> c) a close of a file descriptor removes the corresponding [file *,
> fd, PID] from the epoll set.
>
> d) when a fork() is done, then the epoll set has a new set of keys
> added. These are duplicates of the [file *, fd, PID] entries for the
> parent, but with the PID of the child substituted into the new keys.
> Say the parent had PID 1000, and the child has PID 2000. If the epoll
> set initially contained:
>
> [X, 3, 1000]
> [Y, 4, 1000]
>
> then after fork() we'd have:
>
> [X, 3, 1000]
> [Y, 4, 1000]
> [X, 3, 2000]
> [Y, 4, 2000]
>
> There is of course room for debate about the efficiency of this
> approach, I suppose.
There sure is :)
> You said elsewhere:
>
> [[
> That'd mean placing an eventpoll custom hook into sys_close(). Looks very
> bad to me, and probably will look even worse to other kernel folks.
> Is not much a performance issue (a check to see if a file* is an eventpoll
> file is as easy as comparing the f_op pointer), but a design/style issue.
> ]]
>
> But that wasn't very clear to me actually. I note that filp_close()
> already has special case handling for dnotify (R.I.P.) and fcntl()
> )aka POSIX) file locks, so there was already precedent for a custom
> hook, AFAICS, and epoll is at least as worthy of special treatment as
> either of those cases.
I guess that over the time, Al became software WRT junk going there :)
- Davide
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: epoll design problems with common fork/exec patterns
2008-02-29 19:19 ` Davide Libenzi
@ 2008-02-29 19:54 ` Michael Kerrisk
2008-03-02 15:11 ` Sam Varshavchik
0 siblings, 1 reply; 13+ messages in thread
From: Michael Kerrisk @ 2008-02-29 19:54 UTC (permalink / raw)
To: Davide Libenzi
Cc: Chris "¥¯" Heath, David Schwartz, dada1,
Linux-Kernel@Vger. Kernel. Org, linux-man
On Fri, Feb 29, 2008 at 8:19 PM, Davide Libenzi <davidel@xmailserver.org> wrote:
> On Fri, 29 Feb 2008, Michael Kerrisk wrote:
>
> > As I think is clear, I've only given it very limited thought ;-).
> >
> > The point is that the existing implementation actually supports
> > "different *processes* sharing a single epoll fd and doing
> > epoll_wait() over it", but the semantics are unintuitive. It may be
> > that the existing implementation was the best way of doing things.
> > But when I see the strange corner cases in the semantics, I can't help
> > but wonder (way too late), whether there might have been some other
> > way of implementing things that led to more intuitive semantics.
>
> Oh boy. The fact that you can have an epoll fd cross the fork boundary,
> does not mean that any indiscriminate use of it leads to sane results:
I ddidn't mean that it did. Certainly in the current implementation
it there will insane situations ;-).
> efd = epoll_create();
> fork();
> pipe(fds);
> epoll_ctl(efd, ADD, fds[0]);
> epoll_wait(); ????
> ...
> pipe(fds);
> epoll_ctl(efd, ADD, fds[0]);
> epoll_wait(); ????
>
>
> It is *NOT* a matter of semantics.
Of course -- but I don't think I suggested that I disagree on this.
> > > If the next question is "But then why we made the epoll fd inheritable?",
> > > the answer is, because it makes sense in many cases for a parent to hand
> > > over an fd set to a child.
> >
> > Fair enough.
> >
> > So here's an idea about how things might alternatively have been done:
> >
> > a) The key for epoll entries could have been [file *, fd, PID]
> >
> > b) an epoll_wait() only returns events for fds where the PID maps that
> > of the caller.
> >
> > c) a close of a file descriptor removes the corresponding [file *,
> > fd, PID] from the epoll set.
> >
> > d) when a fork() is done, then the epoll set has a new set of keys
> > added. These are duplicates of the [file *, fd, PID] entries for the
> > parent, but with the PID of the child substituted into the new keys.
> > Say the parent had PID 1000, and the child has PID 2000. If the epoll
> > set initially contained:
> >
> > [X, 3, 1000]
> > [Y, 4, 1000]
> >
> > then after fork() we'd have:
> >
> > [X, 3, 1000]
> > [Y, 4, 1000]
> > [X, 3, 2000]
> > [Y, 4, 2000]
> >
> > There is of course room for debate about the efficiency of this
> > approach, I suppose.
>
> There sure is :)
Okay -- but I suspect it could have been made fairly efficient.
> > You said elsewhere:
> >
> > [[
> > That'd mean placing an eventpoll custom hook into sys_close(). Looks very
> > bad to me, and probably will look even worse to other kernel folks.
> > Is not much a performance issue (a check to see if a file* is an eventpoll
> > file is as easy as comparing the f_op pointer), but a design/style issue.
> > ]]
> >
> > But that wasn't very clear to me actually. I note that filp_close()
> > already has special case handling for dnotify (R.I.P.) and fcntl()
> > )aka POSIX) file locks, so there was already precedent for a custom
> > hook, AFAICS, and epoll is at least as worthy of special treatment as
> > either of those cases.
>
> I guess that over the time, Al became software WRT junk going there :)
Sorry -- I don't understand that last sentence?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: epoll design problems with common fork/exec patterns
2008-02-29 19:54 ` Michael Kerrisk
@ 2008-03-02 15:11 ` Sam Varshavchik
[not found] ` <cone.1204470717.369031.30230.500-lO+bjgoT4TKm14v+eVDVcBDJ/jce7dRH@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Sam Varshavchik @ 2008-03-02 15:11 UTC (permalink / raw)
To: linux-man-u79uwXL29TY76Z2rM5mHXA
Cc: Michael Kerrisk, Davide Libenzi, Chris ¥¯ Heath,
David Schwartz, dada1-fPLkHRcR87vqlBn2x/YWAg,
Linux-Kernel@Vger. Kernel. Org
[-- Attachment #1: Type: text/plain, Size: 677 bytes --]
Hijacking this epoll thread, the following related question occurs to me:
#Q8
# Does an operation on a file descriptor affect the already collected but
#not yet reported events?
#
#A8
# You can do two operations on an existing file descriptor. Remove would
#be meaningless for this case. Modify will re-read available I/O.
Why is EPOLL_CTL_DEL considered meaningless? A process is wrapping up its
business and is preparing to remove the fd from the epoll set, and then
close the file descriptor itself. In the meantime, the fd became readable,
and a POLLIN event gets collected. So, what happens to the collected event,
when the EPOLL_CTL_DEL operation is made?
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: epoll design problems with common fork/exec patterns
[not found] ` <cone.1204470717.369031.30230.500-lO+bjgoT4TKm14v+eVDVcBDJ/jce7dRH@public.gmane.org>
@ 2008-03-02 21:44 ` Davide Libenzi
0 siblings, 0 replies; 13+ messages in thread
From: Davide Libenzi @ 2008-03-02 21:44 UTC (permalink / raw)
To: Sam Varshavchik
Cc: linux-man-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk,
=?X-UNKNOWN?Q?Chris_=C2=A5=C2=AF_Heath?=, David Schwartz,
dada1-fPLkHRcR87vqlBn2x/YWAg, Linux-Kernel@Vger. Kernel. Org
On Sun, 2 Mar 2008, Sam Varshavchik wrote:
> Hijacking this epoll thread, the following related question occurs to me:
>
> #Q8
> # Does an operation on a file descriptor affect the already collected but
> #not yet reported events?
> #
> #A8
> # You can do two operations on an existing file descriptor. Remove would
> #be meaningless for this case. Modify will re-read available I/O.
>
> Why is EPOLL_CTL_DEL considered meaningless? A process is wrapping up its
> business and is preparing to remove the fd from the epoll set, and then close
> the file descriptor itself. In the meantime, the fd became readable, and a
> POLLIN event gets collected. So, what happens to the collected event, when the
> EPOLL_CTL_DEL operation is made?
Any epoll_wait() done after the POLLIN and before the EPOLL_CTL_DEL, will
show up. After the EPOLL_CTL_DEL, of course, no events will be reported.
- Davide
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2008-03-02 21:44 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <MDEHLPKNGKAHNMBLJOLKAEHCHNAC.davids@webmaster.com>
[not found] ` <Pine.LNX.4.64.0710291147180.3387@alien.or.mcafeemobile.com>
[not found] ` <Pine.LNX.4.64.0710291147180.3387-GPJ85BhbkB8RepQJljzAVbITYcZ0+W3JAL8bYrjMMd8@public.gmane.org>
2008-02-26 15:13 ` epoll design problems with common fork/exec patterns Michael Kerrisk
[not found] ` <47C42CA7.4030607-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2008-02-26 18:51 ` Davide Libenzi
[not found] ` <Pine.LNX.4.64.0802261049040.27243-GPJ85BhbkB8RepQJljzAVbITYcZ0+W3JAL8bYrjMMd8@public.gmane.org>
2008-02-27 1:30 ` Chris "ク" Heath
[not found] ` <1204075804.5238.7.camel-DBi1IKlRe8YXiSwHZUBl+UgmxNRb6L7S@public.gmane.org>
2008-02-27 19:35 ` Davide Libenzi
[not found] ` <Pine.LNX.4.64.0802271131180.377-GPJ85BhbkB8RepQJljzAVbITYcZ0+W3JAL8bYrjMMd8@public.gmane.org>
2008-02-28 13:12 ` Michael Kerrisk
[not found] ` <cfd18e0f0802280512q43a457d0sc9a8dc83c51e8e1c-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-02-28 13:23 ` Michael Kerrisk
[not found] ` <cfd18e0f0802280523p1bdacfc5s274387f8280238c8-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-02-28 19:34 ` Davide Libenzi
2008-02-28 19:23 ` Davide Libenzi
[not found] ` <Pine.LNX.4.64.0802281106050.7660-GPJ85BhbkB8RepQJljzAVbITYcZ0+W3JAL8bYrjMMd8@public.gmane.org>
2008-02-29 15:46 ` Michael Kerrisk
[not found] ` <cfd18e0f0802290746p3cb7efc9j72394cd77ff37829-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-02-29 19:19 ` Davide Libenzi
2008-02-29 19:54 ` Michael Kerrisk
2008-03-02 15:11 ` Sam Varshavchik
[not found] ` <cone.1204470717.369031.30230.500-lO+bjgoT4TKm14v+eVDVcBDJ/jce7dRH@public.gmane.org>
2008-03-02 21:44 ` Davide Libenzi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox