From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1O1gZl-0005QU-G0 for qemu-devel@nongnu.org; Tue, 13 Apr 2010 09:54:49 -0400 Received: from [140.186.70.92] (port=39538 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O1gZd-0005DP-5d for qemu-devel@nongnu.org; Tue, 13 Apr 2010 09:54:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1O1fm6-0000an-Mo for qemu-devel@nongnu.org; Tue, 13 Apr 2010 09:03:32 -0400 Received: from mx20.gnu.org ([199.232.41.8]:3206) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O1fm6-0000ai-IJ for qemu-devel@nongnu.org; Tue, 13 Apr 2010 09:03:30 -0400 Received: from mail.codesourcery.com ([38.113.113.100]) by mx20.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1O1fm5-0001jW-VE for qemu-devel@nongnu.org; Tue, 13 Apr 2010 09:03:30 -0400 From: Paul Brook Subject: Re: [Qemu-devel] How to lock-up your tap-based VM network Date: Tue, 13 Apr 2010 14:03:19 +0100 References: <4BC34D95.7050804@siemens.com> <201004122107.19425.paul@codesourcery.com> <4BC46169.7020204@siemens.com> In-Reply-To: <4BC46169.7020204@siemens.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201004131403.19402.paul@codesourcery.com> List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Jan Kiszka > Paul Brook wrote: > >> A major reason for this deadlock could likely be removed by shutting > >> down the tap (if peered) or dropping packets in user space (in case of > >> vlan) when a NIC is stopped or otherwise shut down. Currently most (if > >> not all) NIC models seem to signal both "queue full" and "RX disabled" > >> via !can_receive(). > > > > No. A disabled device should return true from can_recieve, then discard > > the packets in its receive callback. Failure to do so is a bug in the > > device. It looks like the virtio-net device may be buggy. > > That's not a virtio-only issue. In fact, we ran into this over pcnet, > and a quick check of other popular PCI NIC models (except for rtl8139) > revealed the same picture: They only report can_receive if their > receiver unit is up and ready (some also include the queue state, but > that's an "add-on"). If so these are also buggy. > I think it's clear why: "can_receive" strongly suggests that a suspended > receiver should make the model return false. If we want to keep this > handler, it should be refactored to something like "queue_full". I don't see a need to refactor anything. You just need to fix the devices that incorrectly return false when their RX engine is disabled. Paul