From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59597) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZMbie-0001qL-G7 for qemu-devel@nongnu.org; Tue, 04 Aug 2015 08:53:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZMbia-0006jE-7I for qemu-devel@nongnu.org; Tue, 04 Aug 2015 08:53:24 -0400 Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:57945 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZMbiZ-0006iZ-T2 for qemu-devel@nongnu.org; Tue, 04 Aug 2015 08:53:20 -0400 Message-ID: <55C0B5B9.4040001@kamp.de> Date: Tue, 04 Aug 2015 14:53:13 +0200 From: Peter Lieven MIME-Version: 1.0 References: <55BB2DF7.8010808@kamp.de> <55BB302D.50108@redhat.com> <55BB335A.1010009@kamp.de> <55BB3FE7.3000106@redhat.com> <55C08461.1040308@kamp.de> <55C0A7AA.70609@redhat.com> <55C0A88D.1010800@kamp.de> <55C0AB81.8020404@redhat.com> <55C0B03D.8000109@kamp.de> In-Reply-To: <55C0B03D.8000109@kamp.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [Qemu-stable] Recent patches for 2.4 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini , Stefan Hajnoczi Cc: "qemu-devel@nongnu.org" , ronnie sahlberg , qemu-stable@nongnu.org Am 04.08.2015 um 14:29 schrieb Peter Lieven: > Am 04.08.2015 um 14:09 schrieb Paolo Bonzini: >> >> On 04/08/2015 13:57, Peter Lieven wrote: >>> Okay, what I found out is that in aio_poll I get revents = POLLIN for >>> the nfs file descriptor. But there is no data available on the socket. >> Does read return 0 or EAGAIN? >> >> If it returns EAGAIN, the bug is in the QEMU main loop or the kernel. >> It should never happen that poll returns POLLIN and read returns EAGAIN. >> >> If it returns 0, it means the other side called shutdown(fd, SHUT_WR). >> Then I think the bug is in the libnfs driver or more likely libnfs. You >> should stop polling the POLLIN event after read has returned 0 once. > > You might be right. Ronnie originally used the FIONREAD ioctl before every read and considered > the socket as disconnected if the available bytes returned where 0. > I found that I get available bytes == 0 from that ioctl even if the socket was not closed. > This seems to be some kind of bug in Linux - at least what I have thought. > > See BUGS in the select(2) manpage. > > Under Linux, select() may report a socket file descriptor as "ready for reading", while nevertheless a subsequent read blocks. This could for example happen when data has arrived but > upon examination has wrong checksum and is discarded. There may be other circumstances in which a file descriptor is spuriously reported as ready. Thus it may be safer to use O_NON‐ > BLOCK on sockets that should not block. > > I will debug further, but it seems to be that I receive a POLLIN even if there is no data available. I see 0 bytes from the recv call inside libnfs and continue without a deadlock - at least > so far. > > Would it be a good idea to count the number of 0 bytes from recv and react after I received 0 bytes for a number of consecutive times? > > And then: stop polling POLLIN or reconnect? Okay, got it. Ronnie was using FIONREAD without checking for EAGAIN or EINTR. I will send a patch for libnfs to reconnect if count == 0. Libiscsi is not affected, it reconnects if count is 0. Peter