From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tom Lendacky Subject: Re: network shutdown under heavy load Date: Tue, 19 Jan 2010 15:29:01 -0600 Message-ID: <201001191529.01519.tahm@linux.vnet.ibm.com> References: <4B265E84.3070008@binaryfreedom.info> <201001131313.56223.tahm@linux.vnet.ibm.com> <20100113215228.GB2666@sequoia.sous-sol.org> Mime-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Cc: Avi Kivity , Herbert Xu , rek2 , kvm@vger.kernel.org, markmc@redhat.com To: Chris Wright Return-path: Received: from e35.co.us.ibm.com ([32.97.110.153]:53395 "EHLO e35.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754894Ab0ASV3Q (ORCPT ); Tue, 19 Jan 2010 16:29:16 -0500 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e35.co.us.ibm.com (8.14.3/8.13.1) with ESMTP id o0JLFSLB015716 for ; Tue, 19 Jan 2010 14:15:28 -0700 Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o0JLT4du225888 for ; Tue, 19 Jan 2010 14:29:04 -0700 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.14.3/8.13.1/NCO v10.0 AVout) with ESMTP id o0JLT3CP009826 for ; Tue, 19 Jan 2010 14:29:04 -0700 In-Reply-To: <20100113215228.GB2666@sequoia.sous-sol.org> Sender: kvm-owner@vger.kernel.org List-ID: On Wednesday 13 January 2010 03:52:28 pm Chris Wright wrote: > (Mark cc'd, sound familiar?) > > * Tom Lendacky (tahm@linux.vnet.ibm.com) wrote: > > On Sunday 10 January 2010 06:38:54 am Avi Kivity wrote: > > > On 01/10/2010 02:35 PM, Herbert Xu wrote: > > > > On Sun, Jan 10, 2010 at 02:30:12PM +0200, Avi Kivity wrote: > > > >> This isn't in 2.6.27.y. Herbert, can you send it there? > > > > > > > > It appears that now that TX is fixed we have a similar problem > > > > with RX. Once I figure that one out I'll send them together. > > > > I've been experiencing the network shutdown issue also. I've been > > running netperf tests across 10GbE adapters with Qemu 0.12.1.2, RHEL5.4 > > guests and 2.6.32 kernel (from kvm.git) guests. I instrumented Qemu to > > print out some network statistics. It appears that at some point in the > > netperf test the receiving guest ends up having the 10GbE device > > "receive_disabled" variable in its VLANClientState structure stuck at 1. > > From looking at the code it appears that the virtio-net driver in the > > guest should cause qemu_flush_queued_packets in net.c to eventually run > > and clear the "receive_disabled" variable but it's not happening. I > > don't seem to have these issues when I have a lot of debug settings > > active in the guest kernel which results in very low/poor network > > performance - maybe some kind of race condition? > Ok, here's an update. After realizing that none of the ethtool offload options were enabled in my guest, I found that I needed to be using the -netdev option on the qemu command line. Once I did that, some ethtool offload options were enabled and the deadlock did not appear when I did networking between guests on different machines. However, the deadlock did appear when I did networking between guests on the same machine. Tom > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >