* select implementation not POSIX compliant?
@ 2004-08-11 17:00 Nick Palmer
2004-08-11 19:40 ` Alex Riesen
0 siblings, 1 reply; 10+ messages in thread
From: Nick Palmer @ 2004-08-11 17:00 UTC (permalink / raw)
To: linux-kernel
Hey all,
I am working on porting some software from Solaris to Linux 2.6.7. I have
run into a problem with the interaction of select and/or recvmsg and close
in our multi-threaded application. The application expects that a close
call on a socket that another thread is blocking in select and/or recvmsg
on will cause select and/or recvmsg to return with an error. Linux does
not seem to do this. (I also verified that the same issue exists in Linux
2.4.25, just to be sure it wasn't introduced in 2.6 in case you were
wondering.)
I found this thread:
http://www.ussg.iu.edu/hypermail/linux/kernel/0006.3/0414.html
which indicates that we must call shutdown first in order to get the
desired behavior, which works as described. However this doesn't seem to
be a POSIX compliant implementation by my read of the POSIX specification
for select. (The specification for recvmsg doesn't specifically talk about
this condition, so I can accept that the implementation on Linux is
compliant for recvmsg, even if the behavior is a bit surprising to me, but
not for select.)
>From the POSIX specification for select from
http://www.unix.org/single_unix_specification/:
A descriptor shall be considered ready for reading when a call to an input
function with O_NONBLOCK clear would not block, whether or not the
function would transfer data successfully.
<snip>
If a descriptor refers to a socket, the implied input function is the
recvmsg() function with parameters requesting normal and ancillary data,
such that the presence of either type shall cause the socket to be marked
as readable.
I have a test case (email me off list if you want a copy) that shows that
a call to the input function does not block, but instead returns an error
after a close call, yet the select called before the close continues to
block. Furthermore, a call to close and then select in the same thread
blocks while the other thread is still in select, which has a very large
surprise factor, since the code would work were it not for the other
select.
This is certainly a large enough difference from Solaris to cause our
POSIX application not to work on Linux, and I imagine I'm not the only one
that has experienced problems with this implementation. Is there a good
argument for why it has been implemented this way? It is certainly less
than intuitive.
Thanks for addressing this issue,
-Nick
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: select implementation not POSIX compliant?
2004-08-11 17:00 select implementation not POSIX compliant? Nick Palmer
@ 2004-08-11 19:40 ` Alex Riesen
2004-08-11 20:33 ` khandelw
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Alex Riesen @ 2004-08-11 19:40 UTC (permalink / raw)
To: linux-kernel; +Cc: Nick Palmer, netdev
On linux-kernel, Nick Palmer wrote:
> I am working on porting some software from Solaris to Linux 2.6.7. I
> have run into a problem with the interaction of select and/or
> recvmsg and close in our multi-threaded application. The application
> expects that a close call on a socket that another thread is
> blocking in select and/or recvmsg on will cause select and/or
> recvmsg to return with an error. Linux does not seem to do this. (I
> also verified that the same issue exists in Linux 2.4.25, just to be
> sure it wasn't introduced in 2.6 in case you were wondering.)
It works always for stream sockets and does not at all (even with
shutdown, even using poll(2) or read(2) instead of select) for dgram
sockets.
What domain (inet, local) are your sockets in?
What type (stream, dgram)?
There will probably be a problem anyway with changing the behaviour:
there surely is lots of code, which start complaining about select and
poll finishing "unexpectedly".
I used this to check:
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <netinet/in.h>
#include <fcntl.h>
int main(int argc, char* argv[])
{
int status;
int fds[2];
fd_set set;
#if 0
puts("stream");
if ( socketpair(PF_LOCAL, SOCK_STREAM, 0, fds) < 0 )
#else
puts("dgram");
if ( socketpair(PF_LOCAL, SOCK_DGRAM, 0, fds) < 0 )
#endif
{
perror("socketpair");
exit(1);
}
fcntl(fds[0], F_SETFL, fcntl(fds[0], F_GETFL) | O_NONBLOCK);
fcntl(fds[1], F_SETFL, fcntl(fds[1], F_GETFL) | O_NONBLOCK);
switch ( fork() )
{
case 0:
sleep(1);
close(fds[0]);
shutdown(fds[1], SHUT_RD);
close(fds[1]);
exit(0);
break;
case -1:
perror("fork");
exit(1);
}
close(fds[1]);
FD_ZERO(&set);
FD_SET(fds[0], &set);
select(fds[0] + 1, &set, NULL, NULL, 0);
wait(&status);
return 0;
}
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: select implementation not POSIX compliant?
2004-08-11 19:40 ` Alex Riesen
@ 2004-08-11 20:33 ` khandelw
2004-08-11 21:23 ` Alex Riesen
2004-08-13 20:13 ` Nick Palmer
2004-08-11 21:57 ` Steven Dake
2004-08-13 20:12 ` Nick Palmer
2 siblings, 2 replies; 10+ messages in thread
From: khandelw @ 2004-08-11 20:33 UTC (permalink / raw)
To: Alex Riesen; +Cc: linux-kernel, Nick Palmer, netdev
select should work for any type of socket. Its based on the type of file
descriptor not whether it is stream/dgram.
man recvmsg -
recvmsg() may be used to receive data on a socket whether it
is in a connected state or not. s is a socket created with
socket(3SOCKET).
so why should recvmsg return error???? upon closing the socket in other thread?
wouldn't the socket linger around for some time...
If no messages are available at the socket, the receive call
waits for a message to arrive, unless the socket is non-
blocking (see fcntl(2)) in which case -1 is returned with
the external variable errno set to EWOULDBLOCK.
Quoting Alex Riesen <fork0@users.sourceforge.net>:
> On linux-kernel, Nick Palmer wrote:
> > I am working on porting some software from Solaris to Linux 2.6.7. I
> > have run into a problem with the interaction of select and/or
> > recvmsg and close in our multi-threaded application. The application
> > expects that a close call on a socket that another thread is
> > blocking in select and/or recvmsg on will cause select and/or
> > recvmsg to return with an error. Linux does not seem to do this. (I
> > also verified that the same issue exists in Linux 2.4.25, just to be
> > sure it wasn't introduced in 2.6 in case you were wondering.)
>
> It works always for stream sockets and does not at all (even with
> shutdown, even using poll(2) or read(2) instead of select) for dgram
> sockets.
>
> What domain (inet, local) are your sockets in?
> What type (stream, dgram)?
>
> There will probably be a problem anyway with changing the behaviour:
> there surely is lots of code, which start complaining about select and
> poll finishing "unexpectedly".
>
> I used this to check:
>
> #include <unistd.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/socket.h>
> #include <sys/wait.h>
> #include <netinet/in.h>
> #include <fcntl.h>
>
> int main(int argc, char* argv[])
> {
> int status;
> int fds[2];
> fd_set set;
> #if 0
> puts("stream");
> if ( socketpair(PF_LOCAL, SOCK_STREAM, 0, fds) < 0 )
> #else
> puts("dgram");
> if ( socketpair(PF_LOCAL, SOCK_DGRAM, 0, fds) < 0 )
> #endif
> {
> perror("socketpair");
> exit(1);
> }
> fcntl(fds[0], F_SETFL, fcntl(fds[0], F_GETFL) | O_NONBLOCK);
> fcntl(fds[1], F_SETFL, fcntl(fds[1], F_GETFL) | O_NONBLOCK);
> switch ( fork() )
> {
> case 0:
> sleep(1);
> close(fds[0]);
> shutdown(fds[1], SHUT_RD);
> close(fds[1]);
> exit(0);
> break;
> case -1:
> perror("fork");
> exit(1);
> }
> close(fds[1]);
> FD_ZERO(&set);
> FD_SET(fds[0], &set);
> select(fds[0] + 1, &set, NULL, NULL, 0);
> wait(&status);
> return 0;
> }
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: select implementation not POSIX compliant?
2004-08-11 20:33 ` khandelw
@ 2004-08-11 21:23 ` Alex Riesen
2004-08-13 20:13 ` Nick Palmer
1 sibling, 0 replies; 10+ messages in thread
From: Alex Riesen @ 2004-08-11 21:23 UTC (permalink / raw)
To: khandelw; +Cc: linux-kernel, Nick Palmer, netdev
I missed the point: threads! _Not_ duplicated handles.
Ignore me.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: select implementation not POSIX compliant?
2004-08-11 20:33 ` khandelw
2004-08-11 21:23 ` Alex Riesen
@ 2004-08-13 20:13 ` Nick Palmer
1 sibling, 0 replies; 10+ messages in thread
From: Nick Palmer @ 2004-08-13 20:13 UTC (permalink / raw)
To: khandelw; +Cc: Alex Riesen, linux-kernel, netdev
khandelw@cs.fsu.edu wrote:
> select should work for any type of socket. Its based on the type of file
> descriptor not whether it is stream/dgram.
Agreed, but as Alex Riesen has shown with his test case, the behavior
differs based on the type of socket. This doesn't seem quite right, but
was not my original point.
> so why should recvmsg return error???? upon closing the socket in
other thread?
> wouldn't the socket linger around for some time...
Only if SO_LINGER is on, and then only for the linger time. I would
expect recvmsg to set errno to EINTR or EINVAL indicating that the recv
message was interrupted or is no longer valid since the socket has
closed. This is not the case. Instead it returns 0, and doesn't set errno.
-Nick
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: select implementation not POSIX compliant?
2004-08-11 19:40 ` Alex Riesen
2004-08-11 20:33 ` khandelw
@ 2004-08-11 21:57 ` Steven Dake
2004-08-13 20:12 ` Nick Palmer
2 siblings, 0 replies; 10+ messages in thread
From: Steven Dake @ 2004-08-11 21:57 UTC (permalink / raw)
To: Alex Riesen; +Cc: linux-kernel, Nick Palmer, netdev
You will find poll works as you desire but select does not. I recommend
porting to poll anyway; select sucks bad. You might even try out epoll
in 2.6.
Thanks
Good luck
On Wed, 2004-08-11 at 12:40, Alex Riesen wrote:
> On linux-kernel, Nick Palmer wrote:
> > I am working on porting some software from Solaris to Linux 2.6.7. I
> > have run into a problem with the interaction of select and/or
> > recvmsg and close in our multi-threaded application. The application
> > expects that a close call on a socket that another thread is
> > blocking in select and/or recvmsg on will cause select and/or
> > recvmsg to return with an error. Linux does not seem to do this. (I
> > also verified that the same issue exists in Linux 2.4.25, just to be
> > sure it wasn't introduced in 2.6 in case you were wondering.)
>
> It works always for stream sockets and does not at all (even with
> shutdown, even using poll(2) or read(2) instead of select) for dgram
> sockets.
>
> What domain (inet, local) are your sockets in?
> What type (stream, dgram)?
>
> There will probably be a problem anyway with changing the behaviour:
> there surely is lots of code, which start complaining about select and
> poll finishing "unexpectedly".
>
> I used this to check:
>
> #include <unistd.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/socket.h>
> #include <sys/wait.h>
> #include <netinet/in.h>
> #include <fcntl.h>
>
> int main(int argc, char* argv[])
> {
> int status;
> int fds[2];
> fd_set set;
> #if 0
> puts("stream");
> if ( socketpair(PF_LOCAL, SOCK_STREAM, 0, fds) < 0 )
> #else
> puts("dgram");
> if ( socketpair(PF_LOCAL, SOCK_DGRAM, 0, fds) < 0 )
> #endif
> {
> perror("socketpair");
> exit(1);
> }
> fcntl(fds[0], F_SETFL, fcntl(fds[0], F_GETFL) | O_NONBLOCK);
> fcntl(fds[1], F_SETFL, fcntl(fds[1], F_GETFL) | O_NONBLOCK);
> switch ( fork() )
> {
> case 0:
> sleep(1);
> close(fds[0]);
> shutdown(fds[1], SHUT_RD);
> close(fds[1]);
> exit(0);
> break;
> case -1:
> perror("fork");
> exit(1);
> }
> close(fds[1]);
> FD_ZERO(&set);
> FD_SET(fds[0], &set);
> select(fds[0] + 1, &set, NULL, NULL, 0);
> wait(&status);
> return 0;
> }
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: select implementation not POSIX compliant?
2004-08-11 19:40 ` Alex Riesen
2004-08-11 20:33 ` khandelw
2004-08-11 21:57 ` Steven Dake
@ 2004-08-13 20:12 ` Nick Palmer
2 siblings, 0 replies; 10+ messages in thread
From: Nick Palmer @ 2004-08-13 20:12 UTC (permalink / raw)
To: Alex Riesen; +Cc: linux-kernel, netdev
Alex Riesen wrote:
> On linux-kernel, Nick Palmer wrote:
>>The application
>>expects that a close call on a socket that another thread is
>>blocking in select and/or recvmsg on will cause select and/or
>>recvmsg to return with an error. Linux does not seem to do this.
>
>
> It works always for stream sockets and does not at all (even with
> shutdown, even using poll(2) or read(2) instead of select) for dgram
> sockets.
>
> What domain (inet, local) are your sockets in?
inet.
> What type (stream, dgram)?
We use both, though the breakage I was trying to fix was with a dgram
socket.
You are correct that it does not work for dgram sockets at all! I had
not noticed the difference between the two in the test case I wrote,
since I hadn't tested streams. Thanks for pointing that out. Note that
shutdown will cause a dgram socket to exit from a recv* call though, as
this is the workaround I am using right now. On Solaris close will do
the job. However when the recv from ends it returns 0, but does not set
errno, which indicates that there may be more data that can be retrieved
with another call to recv. On Solaris both shutdown and close cause
errno to be set.
There is no way then to cause a select on a dgram socket to break out at
all short of kludging some dgram packet transmission to cause it to happen.
Yech!
Thanks for looking into the issue more,
-Nick
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: select implementation not POSIX compliant?
@ 2004-08-11 20:49 Manfred Spraul
2004-08-13 20:18 ` Nick Palmer
0 siblings, 1 reply; 10+ messages in thread
From: Manfred Spraul @ 2004-08-11 20:49 UTC (permalink / raw)
To: Nick Palmer; +Cc: linux-kernel
Nick wrote:
>Furthermore, a call to close and then select in the same thread
>blocks while the other thread is still in select, which has a very large
>surprise factor, since the code would work were it not for the other
>select.
>
>
>
Could you post the test case for this behavior: I assume your test app
is buggy: a select call that is executed after close returned must
return EBADF, everything else would be a bug.
Regarding your main point: The return result from select/poll is
undefined in Linux if you close a descriptor while another thread polls
or selects it.
This is consistent with the behavior of other Unices - for example HP UX
kills the process if you replace a descriptor that is being polled with
dup2.
--
Manfred
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: select implementation not POSIX compliant?
2004-08-11 20:49 Manfred Spraul
@ 2004-08-13 20:18 ` Nick Palmer
2004-08-13 22:05 ` Alan Cox
0 siblings, 1 reply; 10+ messages in thread
From: Nick Palmer @ 2004-08-13 20:18 UTC (permalink / raw)
To: Manfred Spraul; +Cc: linux-kernel
Manfred Spraul wrote:
> Could you post the test case for this behavior: I assume your test app
> is buggy: a select call that is executed after close returned must
> return EBADF, everything else would be a bug.
Actually Solaris and Linux are consistent in terms of the behavior of
select in this respect. I suspect that the first select is blocking the
socket from being used at all, so the second select can't tell that it
is closed.
> Regarding your main point: The return result from select/poll is
> undefined in Linux if you close a descriptor while another thread polls
> or selects it.
> This is consistent with the behavior of other Unices - for example HP UX
> kills the process if you replace a descriptor that is being polled with
> dup2.
Right, hence my feeling that select is over all fairly broken. The big
difference between Solaris and Linux though is that close will call
recv* calls to return on Solaris, and close doesn't do that on Linux.
The work around is to use shutdown on Linux before calling close. This
also works on Solaris, though it makes the recv set errno differently.
Thanks for looking at this issue,
-Nick
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: select implementation not POSIX compliant?
2004-08-13 20:18 ` Nick Palmer
@ 2004-08-13 22:05 ` Alan Cox
0 siblings, 0 replies; 10+ messages in thread
From: Alan Cox @ 2004-08-13 22:05 UTC (permalink / raw)
To: Nick Palmer; +Cc: Manfred Spraul, Linux Kernel Mailing List
On Gwe, 2004-08-13 at 21:18, Nick Palmer wrote:
> Actually Solaris and Linux are consistent in terms of the behavior of
> select in this respect. I suspect that the first select is blocking the
> socket from being used at all, so the second select can't tell that it
> is closed.
The objects are refcounted so the socket hasn't gone away until the
point the select returns.
Alan
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2004-08-13 23:07 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-11 17:00 select implementation not POSIX compliant? Nick Palmer
2004-08-11 19:40 ` Alex Riesen
2004-08-11 20:33 ` khandelw
2004-08-11 21:23 ` Alex Riesen
2004-08-13 20:13 ` Nick Palmer
2004-08-11 21:57 ` Steven Dake
2004-08-13 20:12 ` Nick Palmer
-- strict thread matches above, loose matches on Subject: below --
2004-08-11 20:49 Manfred Spraul
2004-08-13 20:18 ` Nick Palmer
2004-08-13 22:05 ` Alan Cox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox