From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39066) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X4Z40-00031I-RP for qemu-devel@nongnu.org; Tue, 08 Jul 2014 13:20:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1X4Z3q-0007MP-Nn for qemu-devel@nongnu.org; Tue, 08 Jul 2014 13:20:20 -0400 Received: from e36.co.us.ibm.com ([32.97.110.154]:60128) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X4Z3m-0007Iy-W0 for qemu-devel@nongnu.org; Tue, 08 Jul 2014 13:20:10 -0400 Received: from /spool/local by e36.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 8 Jul 2014 11:20:06 -0600 From: Michael Roth Date: Tue, 8 Jul 2014 12:16:40 -0500 Message-Id: <1404839947-1086-10-git-send-email-mdroth@linux.vnet.ibm.com> In-Reply-To: <1404839947-1086-1-git-send-email-mdroth@linux.vnet.ibm.com> References: <1404839947-1086-1-git-send-email-mdroth@linux.vnet.ibm.com> Subject: [Qemu-devel] [PATCH 009/156] tap: avoid deadlocking rx List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: qemu-stable@nongnu.org From: Stefan Hajnoczi The net subsystem has a control flow mechanism so peer NetClientStates can tell each other to stop sending packets. This is used to stop monitoring the tap file descriptor for incoming packets if the guest rx ring has no spare buffers. There is a corner case when tap_can_send() is true at the beginning of an event loop iteration but becomes false before the tap_send() fd handler is invoked. tap_send() will read the packet from the tap file descriptor and attempt to send it. The net queue will hold on to the packet and return 0, indicating that further I/O is not possible. tap then stops monitoring the file descriptor for reads. This is unlike the normal case where tap_can_send() is the same before and during the event loop iteration. The event loop would simply not monitor the file descriptor if tap_can_send() returns true. Upon next iteration it would check tap_can_send() again and begin monitoring if we can send. The deadlock happens because tap_send() explicitly disabled read_poll. This is done with the expectation that the peer will call qemu_net_queue_flush(). But hw/net/virtio-net.c does not monitor vm_running transitions and issue the flush. Hence we're left with a broken tap device. Cc: qemu-stable@nongnu.org Reported-by: Neil Skrypuch Tested-by: Neil Skrypuch Signed-off-by: Stefan Hajnoczi (cherry picked from commit 68e5ec64009812dbaa03ed9cfded9344986f5304) Signed-off-by: Michael Roth --- net/tap.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/net/tap.c b/net/tap.c index 39c1cda..6b87a73 100644 --- a/net/tap.c +++ b/net/tap.c @@ -190,7 +190,7 @@ static void tap_send(void *opaque) TAPState *s = opaque; int size; - do { + while (qemu_can_send_packet(&s->nc)) { uint8_t *buf = s->buf; size = tap_read_packet(s->fd, s->buf, sizeof(s->buf)); @@ -206,8 +206,11 @@ static void tap_send(void *opaque) size = qemu_send_packet_async(&s->nc, buf, size, tap_send_completed); if (size == 0) { tap_read_poll(s, false); + break; + } else if (size < 0) { + break; } - } while (size > 0 && qemu_can_send_packet(&s->nc)); + } } bool tap_has_ufo(NetClientState *nc) -- 1.9.1