From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark McLoughlin Subject: Re: virtio_net hang Date: Fri, 14 Nov 2008 18:26:44 +0000 Message-ID: <1226687204.9332.113.camel@blaa> References: <20081113122709.GB14254@easter-eggs.com> <1226589153.19068.7.camel@blaa> <20081113152452.GI14254@easter-eggs.com> <20081114092339.GC11961@easter-eggs.com> Reply-To: Mark McLoughlin Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org To: Emmanuel Lacour Return-path: Received: from mx2.redhat.com ([66.187.237.31]:45203 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751844AbYKNS1y (ORCPT ); Fri, 14 Nov 2008 13:27:54 -0500 In-Reply-To: <20081114092339.GC11961@easter-eggs.com> Sender: kvm-owner@vger.kernel.org List-ID: On Fri, 2008-11-14 at 10:23 +0100, Emmanuel Lacour wrote: > On Thu, Nov 13, 2008 at 04:24:52PM +0100, Emmanuel Lacour wrote: > > On Thu, Nov 13, 2008 at 03:12:33PM +0000, Mark McLoughlin wrote: > > > The fact that re-loading the virtio_net driver fixes things up makes me > > > suspect you've found a bug in the virtio_net driver, rather than e.g. a > > > bug in the kvm-userspace side. > > > > > > To try and narrow down what's happening, when the interface has hung, > > > try: > > > > > > - tcpdump on both eth0 in the guest and the tap device on the host > > > (tap5 in your example) > > > > > > On eth0 I see echo requests, but _no_ echo replies > On tap5 I see echo requests _and_ echo replies Okay, so the guest isn't receiving packets. > > > - look for anything unusual in the stats for both those interfaces, > > > e.g. /proc/net/dev, netstat -s etc. > > > > > Comparing with other guest without problems, the only difference is that this > tap (and only this one) reports "overruns": > > tap5 Link encap:Ethernet HWaddr 00:FF:AD:53:76:25 > inet6 addr: fe80::2ff:adff:fe53:7625/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:717737621 errors:0 dropped:0 overruns:0 frame:0 > TX packets:636626720 errors:0 dropped:0 overruns:317 carrier:0 > collisions:0 txqueuelen:500 > RX bytes:368973099756 (343.6 GiB) TX bytes:217917073227 (202.9 GiB) > > overruns seems to happen just when there is "hang", it doesn't seems to > increase when network is working properly. Right, the tap device tx queue is full because kvm-userspace isn't reading packets from it. This could be because kvm-userspace has just stopped noticing that there's data available from the tapfd or because virtio_net in the guest has stopped noticing that packets are available in the ring. One thing you could easily check is whether: ip link set eth0 down ip link set eth0 up in the host is sufficient to fix it? If it is, then it points to a guest driver issue. Cheers, Mark.