* Multithread select() bug
@ 2004-05-10 21:26 Andre Ben Hamou
2004-05-10 21:47 ` Davide Libenzi
2004-05-10 21:57 ` Eric Dumazet
0 siblings, 2 replies; 10+ messages in thread
From: Andre Ben Hamou @ 2004-05-10 21:26 UTC (permalink / raw)
To: linux-kernel
Here's the scenario...
- parent thread P creates a connected socket pair S[0, 1]
- P spawns a child thread C and passes it S
- C selects on S[0]
- P closes S[0]
As I understand the semantics of the select call, C should now return
immediately in response to the closure (and it does on Mac OS X).
However, the following test code behaves otherwise for the two test
cases I've tried (2.4.21 and 2.6.5). Compilation command used: 'gcc
foobar.c -lpthread'.
Cheers,
Andre Ben Hamou
Imperial College London
--- BEGIN TEST CODE (foobar.c)---
#include <assert.h> // assert
#include <pthread.h> // pthread_create
#include <sys/select.h> // select
#include <sys/types.h> // socketpair
#include <sys/socket.h> // socketpair
#include <unistd.h> // sleep
#include <stdio.h> // printf
void *threadFuntion (void *sockets) {
int socket = ((int *)sockets)[0];
struct timeval timeout = {tv_sec: 5, tv_usec: 0};
// Allocate a file descriptor set with the passed socket
fd_set fds;
FD_ZERO (&fds);
FD_SET (socket, &fds);
// Select to read / register exceptions on the FD set
select (socket + 1, &fds, NULL, &fds, &timeout);
return NULL;
}
int main (void) {
int sockets[2];
pthread_t thread;
// Create a connected pair of sockets
assert (socketpair (PF_UNIX, SOCK_STREAM, 0, sockets) != -1);
printf ("sockets: {%i, %i}\n", sockets[0], sockets[1]);
// Create a POSIX thread
// - use the default configuration
// - invoke 'threadFunction' as the root function of the thread
// - pass the socket array to 'threadFunction'
assert (pthread_create (&thread,
NULL,
threadFuntion,
sockets) == 0);
// Wait for a second and then close the socket being selected on
sleep (1);
assert (close (sockets[0]) == 0);
printf ("Socket closed\n");
// Wait for the thread to exit - SHOULD BE ~ INSTANTANEOUS
assert (pthread_join (thread, NULL) == 0);
printf ("Thread joined\n");
assert (close (sockets[1]) == 0);
return 0;
}
--- END TEST CODE ---
--
...and, on the seventh day, God switched off his Mac.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Multithread select() bug
2004-05-10 21:26 Multithread select() bug Andre Ben Hamou
@ 2004-05-10 21:47 ` Davide Libenzi
2004-05-10 21:56 ` Andre Ben Hamou
2004-05-10 21:57 ` Eric Dumazet
1 sibling, 1 reply; 10+ messages in thread
From: Davide Libenzi @ 2004-05-10 21:47 UTC (permalink / raw)
To: Andre Ben Hamou; +Cc: linux-kernel
On Mon, 10 May 2004, Andre Ben Hamou wrote:
> void *threadFuntion (void *sockets) {
> int socket = ((int *)sockets)[0];
> struct timeval timeout = {tv_sec: 5, tv_usec: 0};
>
> // Allocate a file descriptor set with the passed socket
> fd_set fds;
> FD_ZERO (&fds);
> FD_SET (socket, &fds);
>
> // Select to read / register exceptions on the FD set
> select (socket + 1, &fds, NULL, &fds, &timeout);
Try:
select (socket + 1, &fds, &fds, &fds, &timeout);
^^^^^
- Davide
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Multithread select() bug
2004-05-10 21:47 ` Davide Libenzi
@ 2004-05-10 21:56 ` Andre Ben Hamou
2004-05-10 22:09 ` Davide Libenzi
0 siblings, 1 reply; 10+ messages in thread
From: Andre Ben Hamou @ 2004-05-10 21:56 UTC (permalink / raw)
To: Davide Libenzi; +Cc: linux-kernel
Davide Libenzi wrote:
> Try:
>
> select (socket + 1, &fds, &fds, &fds, &timeout);
> ^^^^^
That does work, but only as a workaround (and not a universally
applicable one). Select *should* return upon the closure of either end
of a socket connection that is in the read-FD-set only (unless I've
completely misunderstood the various references).
Cheers,
Andre Ben Hamou
Imperial College London
--
...and, on the seventh day, God switched off his Mac.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Multithread select() bug
2004-05-10 21:26 Multithread select() bug Andre Ben Hamou
2004-05-10 21:47 ` Davide Libenzi
@ 2004-05-10 21:57 ` Eric Dumazet
2004-05-10 22:11 ` Andre Ben Hamou
1 sibling, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2004-05-10 21:57 UTC (permalink / raw)
To: Andre Ben Hamou; +Cc: linux-kernel
Andre Ben Hamou wrote:
> Here's the scenario...
>
> - parent thread P creates a connected socket pair S[0, 1]
> - P spawns a child thread C and passes it S
> - C selects on S[0]
> - P closes S[0]
>
Your program is racy and have undefined behavior.
A thread should not close a handle 'used by another thread blocked in a
sytemcall'
The race is : if a thread does a close(fd), then the fd value may be
reused by another thread during an open()/socket()/dup()... syscall, and
the first thread could issue the select() syscall (or
read()/write()/...) on the bad file.
Some Unixes defines different semantics (Solaris comes to mind), but
these semantics are not part of standards.
Eric
> As I understand the semantics of the select call, C should now return
> immediately in response to the closure (and it does on Mac OS X).
> However, the following test code behaves otherwise for the two test
> cases I've tried (2.4.21 and 2.6.5). Compilation command used: 'gcc
> foobar.c -lpthread'.
>
> Cheers,
>
> Andre Ben Hamou
> Imperial College London
>
>
> --- BEGIN TEST CODE (foobar.c)---
>
> #include <assert.h> // assert
> #include <pthread.h> // pthread_create
> #include <sys/select.h> // select
> #include <sys/types.h> // socketpair
> #include <sys/socket.h> // socketpair
> #include <unistd.h> // sleep
> #include <stdio.h> // printf
>
> void *threadFuntion (void *sockets) {
> int socket = ((int *)sockets)[0];
> struct timeval timeout = {tv_sec: 5, tv_usec: 0};
>
> // Allocate a file descriptor set with the passed socket
> fd_set fds;
> FD_ZERO (&fds);
> FD_SET (socket, &fds);
>
> // Select to read / register exceptions on the FD set
> select (socket + 1, &fds, NULL, &fds, &timeout);
>
> return NULL;
> }
>
> int main (void) {
> int sockets[2];
> pthread_t thread;
>
> // Create a connected pair of sockets
> assert (socketpair (PF_UNIX, SOCK_STREAM, 0, sockets) != -1);
> printf ("sockets: {%i, %i}\n", sockets[0], sockets[1]);
>
> // Create a POSIX thread
> // - use the default configuration
> // - invoke 'threadFunction' as the root function of the thread
> // - pass the socket array to 'threadFunction'
> assert (pthread_create (&thread,
> NULL,
> threadFuntion,
> sockets) == 0);
>
> // Wait for a second and then close the socket being selected on
> sleep (1);
> assert (close (sockets[0]) == 0);
> printf ("Socket closed\n");
>
> // Wait for the thread to exit - SHOULD BE ~ INSTANTANEOUS
> assert (pthread_join (thread, NULL) == 0);
> printf ("Thread joined\n");
>
> assert (close (sockets[1]) == 0);
> return 0;
> }
>
> --- END TEST CODE ---
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Multithread select() bug
2004-05-10 21:56 ` Andre Ben Hamou
@ 2004-05-10 22:09 ` Davide Libenzi
0 siblings, 0 replies; 10+ messages in thread
From: Davide Libenzi @ 2004-05-10 22:09 UTC (permalink / raw)
To: Andre Ben Hamou; +Cc: Linux Kernel Mailing List
On Mon, 10 May 2004, Andre Ben Hamou wrote:
> Davide Libenzi wrote:
> > Try:
> >
> > select (socket + 1, &fds, &fds, &fds, &timeout);
> > ^^^^^
>
> That does work, but only as a workaround (and not a universally
> applicable one). Select *should* return upon the closure of either end
> of a socket connection that is in the read-FD-set only (unless I've
> completely misunderstood the various references).
The standard says that select should return when the operation against the
fd would not block *or* would return an error different from
would-have-blocked. The following operation on a closed fd would be an
EBADF, that fits the standard description. OTOH, MT is messy. Consider a
thread that closes the fd and immediately after open another one having
the same fd#. Bottom line is, don't do it.
- Davide
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Multithread select() bug
2004-05-10 21:57 ` Eric Dumazet
@ 2004-05-10 22:11 ` Andre Ben Hamou
2004-05-10 22:31 ` Eric Dumazet
0 siblings, 1 reply; 10+ messages in thread
From: Andre Ben Hamou @ 2004-05-10 22:11 UTC (permalink / raw)
To: Eric Dumazet; +Cc: linux-kernel
Eric Dumazet wrote:
> Your program is racy and have undefined behavior.
>
> A thread should not close a handle 'used by another thread blocked in a
> sytemcall'
>
> The race is : if a thread does a close(fd), then the fd value may be
> reused by another thread during an open()/socket()/dup()... syscall, and
> the first thread could issue the select() syscall (or
> read()/write()/...) on the bad file.
Apologies, but I don't follow this.
It was my understanding that the (potentially) many threads of a single
process all share a canonical file descriptor table. Hence as long as
the various calls you mention are issued in a guaranteed order,
maintaining state as you go (which is what the 1 second sleep in the
test code was a very quick and dirty way to almost do), I don't see how
a race condition arises.
If I were to replace the sleep (1) with, say a global semaphore or
something similar, would your explanation still hold?
Cheers,
Andre Ben Hamou
Imperial College London
--
...and, on the seventh day, God switched off his Mac.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Multithread select() bug
2004-05-10 22:11 ` Andre Ben Hamou
@ 2004-05-10 22:31 ` Eric Dumazet
2004-05-10 23:01 ` Andre Ben Hamou
0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2004-05-10 22:31 UTC (permalink / raw)
To: Andre Ben Hamou; +Cc: linux-kernel
Andre Ben Hamou wrote:
> Eric Dumazet wrote:
>
>> Your program is racy and have undefined behavior.
>>
>> A thread should not close a handle 'used by another thread blocked in
>> a sytemcall'
>>
>> The race is : if a thread does a close(fd), then the fd value may be
>> reused by another thread during an open()/socket()/dup()... syscall,
>> and the first thread could issue the select() syscall (or
>> read()/write()/...) on the bad file.
>
>
> Apologies, but I don't follow this.
>
> It was my understanding that the (potentially) many threads of a
> single process all share a canonical file descriptor table. Hence as
> long as the various calls you mention are issued in a guaranteed
> order, maintaining state as you go (which is what the 1 second sleep
> in the test code was a very quick and dirty way to almost do), I don't
> see how a race condition arises.
>
> If I were to replace the sleep (1) with, say a global semaphore or
> something similar, would your explanation still hold?
>
So please how do you guarantee that thread 1 runs *before* thread 2)
Thread 1)
select( fd)
Thread 2)
close(fd)
Thats not possible.
Only pthread synchronization are mutexes (or rwlocks, or semaphore).
And you cannot release mutex/rwlock/semaphore *after* entering Thread1)
select()
So yes, there is a race condition.
In you example, it can happens that thread 1 must sleep 10 seconds
before calling select(), because of system scheduling. Your sleep(1)
cannot garant that the close() is done after the select() call blocked
into kernel. You can try whatever semaphore you want, you wont be able
to have a 100% reliable program (even on Solaris)
Eric
> Cheers,
>
> Andre Ben Hamou
> Imperial College London
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Multithread select() bug
2004-05-10 22:31 ` Eric Dumazet
@ 2004-05-10 23:01 ` Andre Ben Hamou
2004-05-11 6:07 ` Armin Schindler
2004-05-11 6:24 ` Eric Dumazet
0 siblings, 2 replies; 10+ messages in thread
From: Andre Ben Hamou @ 2004-05-10 23:01 UTC (permalink / raw)
To: Eric Dumazet; +Cc: linux-kernel
Eric Dumazet wrote:
> So please how do you guarantee that thread 1 runs *before* thread 2)
>
> Thread 1)
> select( fd)
>
> Thread 2)
> close(fd)
>
> Thats not possible.
>
I see where you're coming from, in that there is a potential race
condition as to the socket being connected as I reach the select call.
This is an important concern but it is, I think, orthogonal to the
original problem as there are two possible socket states at the point at
which select gets called (as far as I can see)...
1. The socket is in its connected state
2. The socket has already been closed by the parent thread
As I understand it, if 1 is true (which corresponds to my original
post), then select should return the moment the socket gets closed and,
if 2 is true (which I believe corresponds to your concern), then select
should return immediately anyway as the socket would not block if read from.
Sorry to be a pest, but I'm trying to get this clear in my head. Is it
possible I've over-estimated the thread-safety of the select and close
calls?
Cheers,
Andre Ben Hamou
Imperial College London
--
...and, on the seventh day, God switched off his Mac.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Multithread select() bug
2004-05-10 23:01 ` Andre Ben Hamou
@ 2004-05-11 6:07 ` Armin Schindler
2004-05-11 6:24 ` Eric Dumazet
1 sibling, 0 replies; 10+ messages in thread
From: Armin Schindler @ 2004-05-11 6:07 UTC (permalink / raw)
To: Andre Ben Hamou; +Cc: Eric Dumazet, Linux Kernel Mailinglist
On Tue, 11 May 2004, Andre Ben Hamou wrote:
> Eric Dumazet wrote:
> > So please how do you guarantee that thread 1 runs *before* thread 2)
> >
> > Thread 1)
> > select( fd)
> >
> > Thread 2)
> > close(fd)
> >
> > Thats not possible.
> >
Some time ago I sent a patch for the select() systemcall to return
if the last fd was closed. If select() specifies more than one fd, then
staying in select is valid, but if there is no more fd left to wait on,
select() should return I think. I don't know what standards say here
or other Unix do in that case, so I did't have a good reason for the patch
and it was ignored.
The patch was for 2.4, but 2.6 is the same and poll() as well.
Armin
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Multithread select() bug
2004-05-10 23:01 ` Andre Ben Hamou
2004-05-11 6:07 ` Armin Schindler
@ 2004-05-11 6:24 ` Eric Dumazet
1 sibling, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2004-05-11 6:24 UTC (permalink / raw)
To: Andre Ben Hamou; +Cc: linux-kernel
Andre Ben Hamou wrote:
>
> I see where you're coming from, in that there is a potential race
> condition as to the socket being connected as I reach the select call.
>
> This is an important concern but it is, I think, orthogonal to the
> original problem as there are two possible socket states at the point
> at which select gets called (as far as I can see)...
>
> 1. The socket is in its connected state
> 2. The socket has already been closed by the parent thread
>
> As I understand it, if 1 is true (which corresponds to my original
> post), then select should return the moment the socket gets closed
> and, if 2 is true (which I believe corresponds to your concern), then
> select should return immediately anyway as the socket would not block
> if read from.
>
> Sorry to be a pest, but I'm trying to get this clear in my head. Is it
> possible I've over-estimated the thread-safety of the select and close
> calls?
>
As the safety of what you are trying to do cannot be guaranted, you
should not even try to do that.
Then, apart from this safety showstopper, you should consider that what
you assumed is simply wrong :
- Some Operating Systems do something special to signal blocked threads
(blocked on a read()/write()/select()/poll()) if the underlying file
handle is closed.
- linux does NOT. And this is done for several reasons. These reasons
are known and were discussed on this list at several occasions.
So the rule is : close() syscall should be serialized, according to a
synchronization of your choice, so that no other thread could be using
the file descriptor at the same time.
If you want to 'send some signal to the blocked thread' in a portable
way, you could setup a pipe to be able to send a special message on it.
The 'worker thread' could add the read side of the pipe (pipefd[0]) to
its select fd set, and will be notified of the Thread 2) willing to
dismantle your socket, using a standard write(pipefd[1]). This will be
portable : When the 'worker thread' returns from select() call, it can
then close the socket fd himself.
By the way, prefer poll() syscall to select() one. Poll is much more
scalable/efficient if you have only one or two fd to monitor.
See you
Eric
> Cheers,
>
> Andre Ben Hamou
> Imperial College London
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2004-05-11 6:33 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-10 21:26 Multithread select() bug Andre Ben Hamou
2004-05-10 21:47 ` Davide Libenzi
2004-05-10 21:56 ` Andre Ben Hamou
2004-05-10 22:09 ` Davide Libenzi
2004-05-10 21:57 ` Eric Dumazet
2004-05-10 22:11 ` Andre Ben Hamou
2004-05-10 22:31 ` Eric Dumazet
2004-05-10 23:01 ` Andre Ben Hamou
2004-05-11 6:07 ` Armin Schindler
2004-05-11 6:24 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox