From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1O1f67-0002mg-03 for qemu-devel@nongnu.org; Tue, 13 Apr 2010 08:20:07 -0400 Received: from [140.186.70.92] (port=40662 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O1f61-0002m8-Pt for qemu-devel@nongnu.org; Tue, 13 Apr 2010 08:20:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1O1f60-0006Xs-0H for qemu-devel@nongnu.org; Tue, 13 Apr 2010 08:20:01 -0400 Received: from goliath.siemens.de ([192.35.17.28]:22205) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O1f5z-0006RB-MZ for qemu-devel@nongnu.org; Tue, 13 Apr 2010 08:19:59 -0400 Message-ID: <4BC46169.7020204@siemens.com> Date: Tue, 13 Apr 2010 14:19:53 +0200 From: Jan Kiszka MIME-Version: 1.0 Subject: Re: [Qemu-devel] How to lock-up your tap-based VM network References: <4BC34D95.7050804@siemens.com> <201004122107.19425.paul@codesourcery.com> In-Reply-To: <201004122107.19425.paul@codesourcery.com> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paul Brook Cc: "qemu-devel@nongnu.org" Paul Brook wrote: >> A major reason for this deadlock could likely be removed by shutting >> down the tap (if peered) or dropping packets in user space (in case of >> vlan) when a NIC is stopped or otherwise shut down. Currently most (if >> not all) NIC models seem to signal both "queue full" and "RX disabled" >> via !can_receive(). > > No. A disabled device should return true from can_recieve, then discard the > packets in its receive callback. Failure to do so is a bug in the device. It > looks like the virtio-net device may be buggy. That's not a virtio-only issue. In fact, we ran into this over pcnet, and a quick check of other popular PCI NIC models (except for rtl8139) revealed the same picture: They only report can_receive if their receiver unit is up and ready (some also include the queue state, but that's an "add-on"). I think it's clear why: "can_receive" strongly suggests that a suspended receiver should make the model return false. If we want to keep this handler, it should be refactored to something like "queue_full". But before starting any refactoring endeavor: Do we have a consensus on the direction? Refactor can_receive to queue_full? Or even drop it? Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux