From mboxrd@z Thu Jan 1 00:00:00 1970 From: Emmanuel Lacour Subject: Re: virtio_net hang Date: Thu, 20 Nov 2008 12:36:50 +0100 Message-ID: <20081120113650.GE3717@easter-eggs.com> References: <20081113122709.GB14254@easter-eggs.com> <1226589153.19068.7.camel@blaa> <20081113152452.GI14254@easter-eggs.com> <20081114092339.GC11961@easter-eggs.com> <1226687204.9332.113.camel@blaa> <20081118183756.GP1897@easter-eggs.com> <1227100432.3698.47.camel@blaa> <1227121389.3698.136.camel@blaa> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii To: kvm@vger.kernel.org Return-path: Received: from roxane.home-dn.net ([88.191.11.98]:55224 "EHLO roxane.home-dn.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755032AbYKTLgx (ORCPT ); Thu, 20 Nov 2008 06:36:53 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by roxane.home-dn.net (Postfix) with ESMTP id 829712C058 for ; Thu, 20 Nov 2008 12:36:52 +0100 (CET) Received: from datura.easter-eggs.fr (unknown [IPv6:2001:7a8:115a:1:214:22ff:feb4:f4ea]) by roxane.home-dn.net (Postfix) with ESMTP id 592722C057 for ; Thu, 20 Nov 2008 12:36:51 +0100 (CET) Content-Disposition: inline In-Reply-To: <1227121389.3698.136.camel@blaa> Sender: kvm-owner@vger.kernel.org List-ID: On Wed, Nov 19, 2008 at 07:03:09PM +0000, Mark McLoughlin wrote: > > I had a look at Emmanuel's strace log and it shows that qemu isn't > selecting on the tapfd, presumably because virtio_net_can_receive() sees > that we've exhausted all available receive buffers. > > When qemu does poll the tapfd (after an ifdown/ifup in the guest), there > are a load of packets waiting in the queue and things proceed as normal. > > That still jives with the theory that we're somehow getting into a state > where NAPI polling is de-scheduled while guest rx interrupts are also > disabled. > > > Is it possible for you to try a newer guest kernel? > > If you can try a newer kernel, or even try some debugging patches, that > would help a lot. > The difficulty is that I can not always reproduce the bug. But another interesting think is that I switched to e1000 and I had another lock after that with same symptoms :( Like answered a few minutes ago, I will try a 2.6.27.6 in the guest today and let you know on the first problem I encounter if any ;)