public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nicolas Cannasse <ncannasse@motion-twin.com>
To: Stephen Hemminger <shemminger@vyatta.com>
Cc: linux-net@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: recv() hangs until SIGCHLD ?
Date: Sat, 11 Oct 2008 10:28:53 +0200	[thread overview]
Message-ID: <48F063C5.3000707@motion-twin.com> (raw)
In-Reply-To: <20081010211700.58e953a2@speedy>

>> We run a multithread application which is using pthreads and sockets. A 
>> thread uses accept() then dispatch the socket to one of the workers 
>> threads that process it. Sockets are then not used simultaneously by 
>> several threads.
>>
>> In some rare cases, one (or several) threads are hanging in recv(). Both 
>> lsof and ls /proc/<pid>/fd show that the socket used is in ESTABLISHED 
>> mode but when checking on the host on which it's connected (a mysql DB) 
>> we can't find the corresponding client socket (as it's been closed 
>> already on the other side).
>>
>> We are using the Boehm GC which uses the signals SIGXCPU and SIGPWR to 
>> pause+restart the threads when running a GC cycle. We are correctly 
>> handling EINTR in send() and recv() by restarting the call in case they 
>> get interrupted this way.
>>
>> However, when attaching GDB to our locked thread it seems that even when 
>> the GC runs, recv() does not exit (the breakpoint after it is not 
>> reached). If we send SIGCHLD to the hanging thread with GDB, recv() does 
>> exit and the thread is correctly unlocked. If we don't, it will hang 
>> forever.
>>
>> Additional details : recv() is using MSG_NOSIGNAL and we have enabled 
>> TCP_NODELAY on the socket by using setsockopt. Some other 
>> not-multithreaded apps are using the same Databases and this behavior 
>> does not occur for them.
>>
>> Any idea how we can stop this from happening or what additional things 
>> we can check to get more informations on what's occurring ?
>>
>> Thanks a lot,
>> Nicolas
> 
> Look at Receive queue length with ss or netstat for the hung thread. It will
> show if there is anything that thread could read.
> 
> If there is data and the thread didn't wake up then that is a libc or kernel problem;
> but if there is no data, then look for cases where earlier interrupted io actually
> consumed the data already or blame the sending process not the receiver.
> Also are the sockets blocking or non-blocking?

The sockets are non-blocking.

Checking with netstat and ss I can confirm that both Send and Recv 
queues are empty, which makes the recv() behavior consistent.

However since this problem does not occur without threads, we can be 
sure that the blame is still on the receiver.

In a practical case, we have a thread blocked in recv() for more than 12 
hours, which is way beyond the timeout of the sender connection. The 
socket has already been closed by the sender so recv() should at least 
be noticed and returns 0.

Is it safe to assume that when either send() or recv() get interrupted 
by a signal and returns EINTR, no actual data has been either sent or 
consumed ? And if it's not, is there any other way around this ?

Best,
Nicolas


  reply	other threads:[~2008-10-11  8:29 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-10 13:30 recv() hangs until SIGCHLD ? Nicolas Cannasse
2008-10-10 19:17 ` Stephen Hemminger
2008-10-11  8:28   ` Nicolas Cannasse [this message]
2008-10-11 12:20     ` David Schwartz
2008-10-12 15:47       ` Stephen Hemminger
2008-10-13  8:31     ` Nicolas Cannasse
2008-10-13 15:02       ` Nicolas Cannasse
  -- strict thread matches above, loose matches on Subject: below --
2008-10-10 16:43 Nicolas Cannasse
2008-10-11  4:48 ` David Schwartz
2008-10-11  9:30   ` Samuel Thibault

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48F063C5.3000707@motion-twin.com \
    --to=ncannasse@motion-twin.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-net@vger.kernel.org \
    --cc=shemminger@vyatta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox