From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52176) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XElTl-0005uq-9U for qemu-devel@nongnu.org; Tue, 05 Aug 2014 16:37:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XElTd-00073T-0p for qemu-devel@nongnu.org; Tue, 05 Aug 2014 16:37:05 -0400 Received: from mail1.windriver.com ([147.11.146.13]:35035) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XElTc-000731-Nq for qemu-devel@nongnu.org; Tue, 05 Aug 2014 16:36:56 -0400 Received: from ALA-HCA.corp.ad.wrs.com (ala-hca.corp.ad.wrs.com [147.11.189.40]) by mail1.windriver.com (8.14.9/8.14.5) with ESMTP id s75KapwD026711 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL) for ; Tue, 5 Aug 2014 13:36:52 -0700 (PDT) Message-ID: <53E14062.1070700@windriver.com> Date: Tue, 5 Aug 2014 14:36:50 -0600 From: Chris Friesen MIME-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] [bug?] getting EAGAIN on connect() to virtio-serial unix socket on host List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Hi, I'm running qemu 1.4.2 (soon planning on moving to 1.7). I'm running two instances of qemu with a virtio-serial channel each, exposed on the host via unix stream sockets. I've got an app that tries to connect() to both of them in turn. The connect() to the first socket fails with EAGAIN, the second one succeeds, and all subsequent retries on the first fail. Here's an strace() of the sequence: socket(PF_FILE, SOCK_STREAM, 0) = 6 fcntl(6, F_GETFL) = 0x2 (flags O_RDWR) fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0 connect(6, {sa_family=AF_FILE, sun_path="/var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock"}, 61) = -1 EAGAIN (Resource temporarily unavailable) clock_gettime(CLOCK_MONOTONIC, {158877, 262941763}) = 0 socket(PF_FILE, SOCK_STREAM, 0) = 7 fcntl(7, F_GETFL) = 0x2 (flags O_RDWR) fcntl(7, F_SETFL, O_RDWR|O_NONBLOCK) = 0 connect(7, {sa_family=AF_FILE, sun_path="/var/lib/libvirt/qemu/cgcs.messaging.instance-00000008.sock"}, 61) = 0 getdents(5, /* 0 entries */, 32768) = 0 close(5) = 0 clock_gettime(CLOCK_MONOTONIC, {158877, 265359109}) = 0 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN}], 3, 997) = 0 (Timeout) clock_gettime(CLOCK_MONOTONIC, {158878, 265914614}) = 0 connect(6, {sa_family=AF_FILE, sun_path="/var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock"}, 61) = -1 EAGAIN (Resource temporarily unavailable) With the app not running, netstat seems to show that something is trying to connect to the socket in question: root@compute-0:~# netstat -ap unix |grep messaging unix 2 [ ACC ] STREAM LISTENING 1109818 17379/qemu-system-x /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock unix 2 [ ACC ] STREAM LISTENING 1110051 17425/qemu-system-x /var/lib/libvirt/qemu/cgcs.messaging.instance-00000008.sock unix 2 [ ] STREAM CONNECTING 0 - /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock unix 2 [ ] STREAM CONNECTING 0 - /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock unix 2 [ ] STREAM CONNECTED 1109848 17379/qemu-system-x /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock Here's /proc/net/unix for completeness: root@compute-0:~/host-guest-comm# grep -a messaging /proc/net/unix ffff880045c35540: 00000002 00000000 00010000 0001 01 1109818 /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock ffff8800576b8a80: 00000002 00000000 00010000 0001 01 1110051 /var/lib/libvirt/qemu/cgcs.messaging.instance-00000008.sock ffff880045e2f040: 00000002 00000000 00000000 0001 02 0 /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock ffff88004bc5ea80: 00000002 00000000 00000000 0001 02 0 /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock ffff880045e2f540: 00000002 00000000 00000000 0001 03 1109848 /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock The crazy thing is that I can't figure out what could be causing the CONNECTED/CONNECTING sockets. There are no background processes of the connecting app running, no zombie processes, no forked children, etc. Just to make things more interesting, I successfully ran this application several times (connecting to both sockets) before this behaviour started happening. I was running it under strace and just killed it with ctrl-C. I contacted the linux kernel netdev list and they suggested it might be due to the listen() backlog of 1, combined with somehow missing a connection attempt on the socket and thus never calling accept(). Anyone got any ideas? Please CC me since I'm not subscribed to the list. Thanks, Chris