public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* recv() hangs until SIGCHLD ?
@ 2008-10-10 13:30 Nicolas Cannasse
  2008-10-10 19:17 ` Stephen Hemminger
  0 siblings, 1 reply; 10+ messages in thread
From: Nicolas Cannasse @ 2008-10-10 13:30 UTC (permalink / raw)
  To: linux-kernel

Hi,

We've been tracking a bug in our server application for some time now, 
and now that we could isolate it we're stuck without a meaningful 
explanation. Hope somehow would be able to give use some answers.

We run a multithread application which is using pthreads and sockets. A 
thread uses accept() then dispatch the socket to one of the workers 
threads that process it. Sockets are then not used simultaneously by 
several threads.

In some rare cases, one (or several) threads are hanging in recv(). Both 
lsof and ls /proc/<pid>/fd show that the socket used is in ESTABLISHED 
mode but when checking on the host on which it's connected (a mysql DB) 
we can't find the corresponding client socket (as it's been closed 
already on the other side).

We are using the Boehm GC which uses the signals SIGXCPU and SIGPWR to 
pause+restart the threads when running a GC cycle. We are correctly 
handling EINTR in send() and recv() by restarting the call in case they 
get interrupted this way.

However, when attaching GDB to our locked thread it seems that even when 
the GC runs, recv() does not exit (the breakpoint after it is not 
reached). If we send SIGCHLD to the hanging thread with GDB, recv() does 
exit and the thread is correctly unlocked. If we don't, it will hang 
forever.

Additional details : recv() is using MSG_NOSIGNAL and we have enabled 
TCP_NODELAY on the socket by using setsockopt. Some other 
not-multithreaded apps are using the same Databases and this behavior 
does not occur for them.

Any idea how we can stop this from happening or what additional things 
we can check to get more informations on what's occurring ?

Thanks a lot,
Nicolas


^ permalink raw reply	[flat|nested] 10+ messages in thread
* recv() hangs until SIGCHLD ?
@ 2008-10-10 16:43 Nicolas Cannasse
  2008-10-11  4:48 ` David Schwartz
  0 siblings, 1 reply; 10+ messages in thread
From: Nicolas Cannasse @ 2008-10-10 16:43 UTC (permalink / raw)
  To: linux-kernel

Hi,

We've been tracking a bug in our server application for some time now,
and now that we could isolate it we're stuck without a meaningful
explanation. Hope somehow would be able to give use some answers.

We run a multithread application which is using pthreads and sockets.
A thread uses accept() then dispatch the socket to one of the workers
threads that process it. Sockets are then not used simultaneously by
several threads.

In some rare cases, one (or several) threads are hanging in recv().
Both lsof and ls /proc/<pid>/fd show that the socket used is in
ESTABLISHED mode but when checking on the host on which it's connected
(a mysql DB) we can't find the corresponding client socket (as it's
been closed already on the other side).

We are using the Boehm GC which uses the signals SIGXCPU and SIGPWR to
pause+restart the threads when running a GC cycle. We are correctly
handling EINTR in send() and recv() by restarting the call in case
they get interrupted this way.

However, when attaching GDB to our locked thread it seems that even
when the GC runs, recv() does not exit (the breakpoint after it is not
reached). If we send SIGCHLD to the hanging thread with GDB, recv()
does exit and the thread is correctly unlocked. If we don't, it will
hang forever.

Additional details : recv() is using MSG_NOSIGNAL and we have enabled
TCP_NODELAY on the socket by using setsockopt. Some other
not-multithreaded apps are using the same Databases and this behavior
does not occur for them.

Any idea how we can stop this from happening or what additional things
we can check to get more informations on what's occurring ?

Thanks a lot,
Nicolas

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-10-13 15:03 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-10 13:30 recv() hangs until SIGCHLD ? Nicolas Cannasse
2008-10-10 19:17 ` Stephen Hemminger
2008-10-11  8:28   ` Nicolas Cannasse
2008-10-11 12:20     ` David Schwartz
2008-10-12 15:47       ` Stephen Hemminger
2008-10-13  8:31     ` Nicolas Cannasse
2008-10-13 15:02       ` Nicolas Cannasse
  -- strict thread matches above, loose matches on Subject: below --
2008-10-10 16:43 Nicolas Cannasse
2008-10-11  4:48 ` David Schwartz
2008-10-11  9:30   ` Samuel Thibault

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox