public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Multithread select() bug
@ 2004-05-10 21:26 Andre Ben Hamou
  2004-05-10 21:47 ` Davide Libenzi
  2004-05-10 21:57 ` Eric Dumazet
  0 siblings, 2 replies; 10+ messages in thread
From: Andre Ben Hamou @ 2004-05-10 21:26 UTC (permalink / raw)
  To: linux-kernel

Here's the scenario...

- parent thread P creates a connected socket pair S[0, 1]
- P spawns a child thread C and passes it S
- C selects on S[0]
- P closes S[0]

As I understand the semantics of the select call, C should now return 
immediately in response to the closure (and it does on Mac OS X). 
However, the following test code behaves otherwise for the two test 
cases I've tried (2.4.21 and 2.6.5). Compilation command used: 'gcc 
foobar.c -lpthread'.

Cheers,

Andre Ben Hamou
Imperial College London


--- BEGIN TEST CODE (foobar.c)---

#include <assert.h>         // assert
#include <pthread.h>        // pthread_create
#include <sys/select.h>     // select
#include <sys/types.h>      // socketpair
#include <sys/socket.h>     // socketpair
#include <unistd.h>         // sleep
#include <stdio.h>          // printf

void *threadFuntion (void *sockets) {
     int socket = ((int *)sockets)[0];
     struct timeval timeout = {tv_sec: 5, tv_usec: 0};

     // Allocate a file descriptor set with the passed socket
     fd_set fds;
     FD_ZERO (&fds);
     FD_SET (socket, &fds);

     // Select to read / register exceptions on the FD set
     select (socket + 1, &fds, NULL, &fds, &timeout);

     return NULL;
}

int main (void) {
     int sockets[2];
     pthread_t thread;

     // Create a connected pair of sockets
     assert (socketpair (PF_UNIX, SOCK_STREAM, 0, sockets) != -1);
     printf ("sockets: {%i, %i}\n", sockets[0], sockets[1]);

     // Create a POSIX thread
     // - use the default configuration
     // - invoke 'threadFunction' as the root function of the thread
     // - pass the socket array to 'threadFunction'
     assert (pthread_create (&thread,
			    NULL,
                             threadFuntion,
                             sockets) == 0);

     // Wait for a second and then close the socket being selected on
     sleep (1);
     assert (close (sockets[0]) == 0);
     printf ("Socket closed\n");

     // Wait for the thread to exit - SHOULD BE ~ INSTANTANEOUS
     assert (pthread_join (thread, NULL) == 0);
     printf ("Thread joined\n");

     assert (close (sockets[1]) == 0);
     return 0;
}

--- END TEST CODE ---

-- 

...and, on the seventh day, God switched off his Mac.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multithread select() bug
  2004-05-10 21:26 Multithread select() bug Andre Ben Hamou
@ 2004-05-10 21:47 ` Davide Libenzi
  2004-05-10 21:56   ` Andre Ben Hamou
  2004-05-10 21:57 ` Eric Dumazet
  1 sibling, 1 reply; 10+ messages in thread
From: Davide Libenzi @ 2004-05-10 21:47 UTC (permalink / raw)
  To: Andre Ben Hamou; +Cc: linux-kernel

On Mon, 10 May 2004, Andre Ben Hamou wrote:

> void *threadFuntion (void *sockets) {
>      int socket = ((int *)sockets)[0];
>      struct timeval timeout = {tv_sec: 5, tv_usec: 0};
> 
>      // Allocate a file descriptor set with the passed socket
>      fd_set fds;
>      FD_ZERO (&fds);
>      FD_SET (socket, &fds);
> 
>      // Select to read / register exceptions on the FD set
>      select (socket + 1, &fds, NULL, &fds, &timeout);

Try:

	select (socket + 1, &fds, &fds, &fds, &timeout);
                                 ^^^^^


- Davide


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multithread select() bug
  2004-05-10 21:47 ` Davide Libenzi
@ 2004-05-10 21:56   ` Andre Ben Hamou
  2004-05-10 22:09     ` Davide Libenzi
  0 siblings, 1 reply; 10+ messages in thread
From: Andre Ben Hamou @ 2004-05-10 21:56 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: linux-kernel

Davide Libenzi wrote:
> Try:
> 
> 	select (socket + 1, &fds, &fds, &fds, &timeout);
>                                  ^^^^^

That does work, but only as a workaround (and not a universally 
applicable one). Select *should* return upon the closure of either end 
of a socket connection that is in the read-FD-set only (unless I've 
completely misunderstood the various references).

Cheers,

Andre Ben Hamou
Imperial College London

-- 

...and, on the seventh day, God switched off his Mac.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multithread select() bug
  2004-05-10 21:26 Multithread select() bug Andre Ben Hamou
  2004-05-10 21:47 ` Davide Libenzi
@ 2004-05-10 21:57 ` Eric Dumazet
  2004-05-10 22:11   ` Andre Ben Hamou
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2004-05-10 21:57 UTC (permalink / raw)
  To: Andre Ben Hamou; +Cc: linux-kernel

Andre Ben Hamou wrote:

> Here's the scenario...
>
> - parent thread P creates a connected socket pair S[0, 1]
> - P spawns a child thread C and passes it S
> - C selects on S[0]
> - P closes S[0]
>
Your program is racy and have undefined behavior.

A thread should not close a handle 'used by another thread blocked in a 
sytemcall'

The race is : if a thread does a close(fd), then the fd value may be 
reused by another thread during an open()/socket()/dup()... syscall, and 
the first thread could issue the select() syscall (or 
read()/write()/...) on the bad file.

Some Unixes defines different semantics (Solaris comes to mind), but 
these semantics are not part of  standards.

Eric

> As I understand the semantics of the select call, C should now return 
> immediately in response to the closure (and it does on Mac OS X). 
> However, the following test code behaves otherwise for the two test 
> cases I've tried (2.4.21 and 2.6.5). Compilation command used: 'gcc 
> foobar.c -lpthread'.
>
> Cheers,
>
> Andre Ben Hamou
> Imperial College London
>
>
> --- BEGIN TEST CODE (foobar.c)---
>
> #include <assert.h>         // assert
> #include <pthread.h>        // pthread_create
> #include <sys/select.h>     // select
> #include <sys/types.h>      // socketpair
> #include <sys/socket.h>     // socketpair
> #include <unistd.h>         // sleep
> #include <stdio.h>          // printf
>
> void *threadFuntion (void *sockets) {
>     int socket = ((int *)sockets)[0];
>     struct timeval timeout = {tv_sec: 5, tv_usec: 0};
>
>     // Allocate a file descriptor set with the passed socket
>     fd_set fds;
>     FD_ZERO (&fds);
>     FD_SET (socket, &fds);
>
>     // Select to read / register exceptions on the FD set
>     select (socket + 1, &fds, NULL, &fds, &timeout);
>
>     return NULL;
> }
>
> int main (void) {
>     int sockets[2];
>     pthread_t thread;
>
>     // Create a connected pair of sockets
>     assert (socketpair (PF_UNIX, SOCK_STREAM, 0, sockets) != -1);
>     printf ("sockets: {%i, %i}\n", sockets[0], sockets[1]);
>
>     // Create a POSIX thread
>     // - use the default configuration
>     // - invoke 'threadFunction' as the root function of the thread
>     // - pass the socket array to 'threadFunction'
>     assert (pthread_create (&thread,
>                 NULL,
>                             threadFuntion,
>                             sockets) == 0);
>
>     // Wait for a second and then close the socket being selected on
>     sleep (1);
>     assert (close (sockets[0]) == 0);
>     printf ("Socket closed\n");
>
>     // Wait for the thread to exit - SHOULD BE ~ INSTANTANEOUS
>     assert (pthread_join (thread, NULL) == 0);
>     printf ("Thread joined\n");
>
>     assert (close (sockets[1]) == 0);
>     return 0;
> }
>
> --- END TEST CODE ---
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multithread select() bug
  2004-05-10 21:56   ` Andre Ben Hamou
@ 2004-05-10 22:09     ` Davide Libenzi
  0 siblings, 0 replies; 10+ messages in thread
From: Davide Libenzi @ 2004-05-10 22:09 UTC (permalink / raw)
  To: Andre Ben Hamou; +Cc: Linux Kernel Mailing List

On Mon, 10 May 2004, Andre Ben Hamou wrote:

> Davide Libenzi wrote:
> > Try:
> > 
> > 	select (socket + 1, &fds, &fds, &fds, &timeout);
> >                                  ^^^^^
> 
> That does work, but only as a workaround (and not a universally 
> applicable one). Select *should* return upon the closure of either end 
> of a socket connection that is in the read-FD-set only (unless I've 
> completely misunderstood the various references).

The standard says that select should return when the operation against the 
fd would not block *or* would return an error different from 
would-have-blocked. The following operation on a closed fd would be an 
EBADF, that fits the standard description. OTOH, MT is messy. Consider a 
thread that closes the fd and immediately after open another one having 
the same fd#. Bottom line is, don't do it.



- Davide


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multithread select() bug
  2004-05-10 21:57 ` Eric Dumazet
@ 2004-05-10 22:11   ` Andre Ben Hamou
  2004-05-10 22:31     ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Andre Ben Hamou @ 2004-05-10 22:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel

Eric Dumazet wrote:
> Your program is racy and have undefined behavior.
> 
> A thread should not close a handle 'used by another thread blocked in a 
> sytemcall'
> 
> The race is : if a thread does a close(fd), then the fd value may be 
> reused by another thread during an open()/socket()/dup()... syscall, and 
> the first thread could issue the select() syscall (or 
> read()/write()/...) on the bad file.

Apologies, but I don't follow this.

It was my understanding that the (potentially) many threads of a single 
process all share a canonical file descriptor table. Hence as long as 
the various calls you mention are issued in a guaranteed order, 
maintaining state as you go (which is what the 1 second sleep in the 
test code was a very quick and dirty way to almost do), I don't see how 
a race condition arises.

If I were to replace the sleep (1) with, say a global semaphore or 
something similar, would your explanation still hold?

Cheers,

Andre Ben Hamou
Imperial College London

-- 

...and, on the seventh day, God switched off his Mac.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multithread select() bug
  2004-05-10 22:11   ` Andre Ben Hamou
@ 2004-05-10 22:31     ` Eric Dumazet
  2004-05-10 23:01       ` Andre Ben Hamou
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2004-05-10 22:31 UTC (permalink / raw)
  To: Andre Ben Hamou; +Cc: linux-kernel

Andre Ben Hamou wrote:

> Eric Dumazet wrote:
>
>> Your program is racy and have undefined behavior.
>>
>> A thread should not close a handle 'used by another thread blocked in 
>> a sytemcall'
>>
>> The race is : if a thread does a close(fd), then the fd value may be 
>> reused by another thread during an open()/socket()/dup()... syscall, 
>> and the first thread could issue the select() syscall (or 
>> read()/write()/...) on the bad file.
>
>
> Apologies, but I don't follow this.
>
> It was my understanding that the (potentially) many threads of a 
> single process all share a canonical file descriptor table. Hence as 
> long as the various calls you mention are issued in a guaranteed 
> order, maintaining state as you go (which is what the 1 second sleep 
> in the test code was a very quick and dirty way to almost do), I don't 
> see how a race condition arises.
>
> If I were to replace the sleep (1) with, say a global semaphore or 
> something similar, would your explanation still hold?
>
So please how do you guarantee that thread 1 runs *before* thread 2)

Thread 1)
        select( fd)

Thread 2)
        close(fd)

Thats not possible.

Only pthread synchronization  are mutexes (or rwlocks, or semaphore). 
And you cannot release mutex/rwlock/semaphore *after* entering Thread1) 
select()

So yes, there is a race condition.

In you example, it can happens that thread 1 must sleep 10 seconds 
before calling select(), because of system scheduling. Your sleep(1) 
cannot garant that the close() is done after the select() call blocked 
into kernel. You can try whatever semaphore you want, you wont be able 
to have a 100% reliable program (even on Solaris)

Eric

> Cheers,
>
> Andre Ben Hamou
> Imperial College London
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multithread select() bug
  2004-05-10 22:31     ` Eric Dumazet
@ 2004-05-10 23:01       ` Andre Ben Hamou
  2004-05-11  6:07         ` Armin Schindler
  2004-05-11  6:24         ` Eric Dumazet
  0 siblings, 2 replies; 10+ messages in thread
From: Andre Ben Hamou @ 2004-05-10 23:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel

Eric Dumazet wrote:
> So please how do you guarantee that thread 1 runs *before* thread 2)
> 
> Thread 1)
>        select( fd)
> 
> Thread 2)
>        close(fd)
> 
> Thats not possible.
> 

I see where you're coming from, in that there is a potential race 
condition as to the socket being connected as I reach the select call.

This is an important concern but it is, I think, orthogonal to the 
original problem as there are two possible socket states at the point at 
which select gets called (as far as I can see)...

1. The socket is in its connected state
2. The socket has already been closed by the parent thread

As I understand it, if 1 is true (which corresponds to my original 
post), then select should return the moment the socket gets closed and, 
if 2 is true (which I believe corresponds to your concern), then select 
should return immediately anyway as the socket would not block if read from.

Sorry to be a pest, but I'm trying to get this clear in my head. Is it 
possible I've over-estimated the thread-safety of the select and close 
calls?

Cheers,

Andre Ben Hamou
Imperial College London

-- 

...and, on the seventh day, God switched off his Mac.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multithread select() bug
  2004-05-10 23:01       ` Andre Ben Hamou
@ 2004-05-11  6:07         ` Armin Schindler
  2004-05-11  6:24         ` Eric Dumazet
  1 sibling, 0 replies; 10+ messages in thread
From: Armin Schindler @ 2004-05-11  6:07 UTC (permalink / raw)
  To: Andre Ben Hamou; +Cc: Eric Dumazet, Linux Kernel Mailinglist

On Tue, 11 May 2004, Andre Ben Hamou wrote:
> Eric Dumazet wrote:
> > So please how do you guarantee that thread 1 runs *before* thread 2)
> >
> > Thread 1)
> >        select( fd)
> >
> > Thread 2)
> >        close(fd)
> >
> > Thats not possible.
> >

Some time ago I sent a patch for the select() systemcall to return
if the last fd was closed. If select() specifies more than one fd, then
staying in select is valid, but if there is no more fd left to wait on,
select() should return I think. I don't know what standards say here
or other Unix do in that case, so I did't have a good reason for the patch
and it was ignored.
The patch was for 2.4, but 2.6 is the same and poll() as well.

Armin


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multithread select() bug
  2004-05-10 23:01       ` Andre Ben Hamou
  2004-05-11  6:07         ` Armin Schindler
@ 2004-05-11  6:24         ` Eric Dumazet
  1 sibling, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2004-05-11  6:24 UTC (permalink / raw)
  To: Andre Ben Hamou; +Cc: linux-kernel

Andre Ben Hamou wrote:

>
> I see where you're coming from, in that there is a potential race 
> condition as to the socket being connected as I reach the select call.
>
> This is an important concern but it is, I think, orthogonal to the 
> original problem as there are two possible socket states at the point 
> at which select gets called (as far as I can see)...
>
> 1. The socket is in its connected state
> 2. The socket has already been closed by the parent thread
>
> As I understand it, if 1 is true (which corresponds to my original 
> post), then select should return the moment the socket gets closed 
> and, if 2 is true (which I believe corresponds to your concern), then 
> select should return immediately anyway as the socket would not block 
> if read from.
>
> Sorry to be a pest, but I'm trying to get this clear in my head. Is it 
> possible I've over-estimated the thread-safety of the select and close 
> calls?
>
As the safety of what you are trying to do cannot be guaranted, you 
should not even try to do that.
Then, apart from this safety showstopper, you should consider that what 
you assumed is simply wrong :
- Some Operating Systems do something special to signal blocked threads 
(blocked on a read()/write()/select()/poll()) if the underlying file 
handle is closed.
- linux does NOT. And this is done for several reasons. These reasons 
are known and were discussed on this list at several occasions.

So the rule is : close() syscall should be serialized, according to a 
synchronization of your choice, so that no other thread could be using 
the file descriptor at the same time.
If you want to 'send some signal to the blocked thread' in a portable 
way, you could setup a pipe to be able to send a special message on it. 
The 'worker thread' could add the read side of the pipe (pipefd[0]) to 
its select fd set, and will be notified of the Thread 2) willing to 
dismantle your socket, using a standard write(pipefd[1]). This will be 
portable : When the 'worker thread' returns from select() call, it can 
then close the socket fd himself.

By the way, prefer poll() syscall to select() one. Poll is much more 
scalable/efficient if you have only one or two fd to monitor.

See you
Eric

> Cheers,
>
> Andre Ben Hamou
> Imperial College London
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2004-05-11  6:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-10 21:26 Multithread select() bug Andre Ben Hamou
2004-05-10 21:47 ` Davide Libenzi
2004-05-10 21:56   ` Andre Ben Hamou
2004-05-10 22:09     ` Davide Libenzi
2004-05-10 21:57 ` Eric Dumazet
2004-05-10 22:11   ` Andre Ben Hamou
2004-05-10 22:31     ` Eric Dumazet
2004-05-10 23:01       ` Andre Ben Hamou
2004-05-11  6:07         ` Armin Schindler
2004-05-11  6:24         ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox