From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1O1f8g-0003QC-MD for qemu-devel@nongnu.org; Tue, 13 Apr 2010 08:22:46 -0400 Received: from [140.186.70.92] (port=57120 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O1f8f-0003Ps-EJ for qemu-devel@nongnu.org; Tue, 13 Apr 2010 08:22:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1O1f8Z-0000yy-Gw for qemu-devel@nongnu.org; Tue, 13 Apr 2010 08:22:45 -0400 Received: from goliath.siemens.de ([192.35.17.28]:22914) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O1f8Z-0000yY-7u for qemu-devel@nongnu.org; Tue, 13 Apr 2010 08:22:39 -0400 Message-ID: <4BC46209.2090404@siemens.com> Date: Tue, 13 Apr 2010 14:22:33 +0200 From: Jan Kiszka MIME-Version: 1.0 Subject: Re: [Qemu-devel] How to lock-up your tap-based VM network References: <4BC34D95.7050804@siemens.com> <201004122107.19425.paul@codesourcery.com> <20100412214947.GC6148@shareable.org> In-Reply-To: <20100412214947.GC6148@shareable.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jamie Lokier Cc: Paul Brook , "qemu-devel@nongnu.org" Jamie Lokier wrote: > Paul Brook wrote: >>> A major reason for this deadlock could likely be removed by shutting >>> down the tap (if peered) or dropping packets in user space (in case of >>> vlan) when a NIC is stopped or otherwise shut down. Currently most (if >>> not all) NIC models seem to signal both "queue full" and "RX disabled" >>> via !can_receive(). >> No. A disabled device should return true from can_recieve, then discard the >> packets in its receive callback. Failure to do so is a bug in the device. It >> looks like the virtio-net device may be buggy. > > I agree - or alternatively signal that there's no point sending it > packets and they should be dropped without bothering to construct them. > > But anyway, this flow control mechanism is buggy - what if instead of > an interface down, you just have a *slow* guest? That should not push > back so much that it makes other guests networking with each other > slow down. Indeed. So, instead of the current scheme that tries to stop the sender when some receiver overflows, we must turn it up side down so that this stop can _never_ happen. We may keep a sufficiently long queue to reduce the risk of packet drops, but we can't prevent this for all cases anyway. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux