From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1O1g0v-0004Vf-Jn for qemu-devel@nongnu.org; Tue, 13 Apr 2010 09:18:49 -0400 Received: from [140.186.70.92] (port=47655 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O1g0r-0004Gx-Ge for qemu-devel@nongnu.org; Tue, 13 Apr 2010 09:18:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1O1fxt-0002kG-Py for qemu-devel@nongnu.org; Tue, 13 Apr 2010 09:15:43 -0400 Received: from goliath.siemens.de ([192.35.17.28]:16218) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O1fxt-0002jw-Eo for qemu-devel@nongnu.org; Tue, 13 Apr 2010 09:15:41 -0400 Message-ID: <4BC46E77.8020706@siemens.com> Date: Tue, 13 Apr 2010 15:15:35 +0200 From: Jan Kiszka MIME-Version: 1.0 Subject: Re: [Qemu-devel] How to lock-up your tap-based VM network References: <4BC34D95.7050804@siemens.com> <201004122107.19425.paul@codesourcery.com> <4BC46169.7020204@siemens.com> <201004131403.19402.paul@codesourcery.com> In-Reply-To: <201004131403.19402.paul@codesourcery.com> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paul Brook Cc: "qemu-devel@nongnu.org" Paul Brook wrote: >> Paul Brook wrote: >>>> A major reason for this deadlock could likely be removed by shutting >>>> down the tap (if peered) or dropping packets in user space (in case of >>>> vlan) when a NIC is stopped or otherwise shut down. Currently most (if >>>> not all) NIC models seem to signal both "queue full" and "RX disabled" >>>> via !can_receive(). >>> No. A disabled device should return true from can_recieve, then discard >>> the packets in its receive callback. Failure to do so is a bug in the >>> device. It looks like the virtio-net device may be buggy. >> That's not a virtio-only issue. In fact, we ran into this over pcnet, >> and a quick check of other popular PCI NIC models (except for rtl8139) >> revealed the same picture: They only report can_receive if their >> receiver unit is up and ready (some also include the queue state, but >> that's an "add-on"). > > If so these are also buggy. > >> I think it's clear why: "can_receive" strongly suggests that a suspended >> receiver should make the model return false. If we want to keep this >> handler, it should be refactored to something like "queue_full". > > I don't see a need to refactor anything. You just need to fix the devices > that incorrectly return false when their RX engine is disabled. "can_receive" is a misleading name. NIC model developers will continue to make mistakes when implementing this function. And I don't think we want such implementations all around: static int rtl8139_can_receive(VLANClientState *nc) { RTL8139State *s = DO_UPCAST(NICState, nc, nc)->opaque; int avail; /* Receive (drop) packets if card is disabled. */ if (!s->clock_enabled) return 1; if (!rtl8139_receiver_enabled(s)) return 1; if (rtl8139_cp_receiver_enabled(s)) { /* ??? Flow control not implemented in c+ mode. This is a hack to work around slirp deficiencies anyway. */ return 1; } else { avail = MOD2(s->RxBufferSize + s->RxBufPtr - s->RxBufAddr, s->RxBufferSize); return (avail == 0 || avail >= 1514); } } Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux