From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:42095) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SEz4s-0005sK-9P for qemu-devel@nongnu.org; Tue, 03 Apr 2012 04:27:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SEz4l-0001RS-W0 for qemu-devel@nongnu.org; Tue, 03 Apr 2012 04:26:57 -0400 Received: from mail-lpp01m010-f45.google.com ([209.85.215.45]:37678) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SEz4l-0001R9-ML for qemu-devel@nongnu.org; Tue, 03 Apr 2012 04:26:51 -0400 Received: by lahe6 with SMTP id e6so4550111lah.4 for ; Tue, 03 Apr 2012 01:26:49 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20120403081313.GD1283@arachsys.com> References: <20120402153722.GA30499@arachsys.com> <20120403071328.GB27304@stefanha-thinkpad.localdomain> <20120403081313.GD1283@arachsys.com> Date: Tue, 3 Apr 2012 09:26:49 +0100 Message-ID: From: Stefan Hajnoczi Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Chris Webb Cc: qemu-devel@nongnu.org On Tue, Apr 3, 2012 at 9:13 AM, Chris Webb wrote: > Stefan Hajnoczi writes: >> On Mon, Apr 02, 2012 at 04:37:23PM +0100, Chris Webb wrote: >> It sounds like this is not the issue, but are you sure the bridge has >> forwarding delay set to 0 or Spanning Tree Protocol disabled? =A0With ST= P >> enabled no traffic will be forwarded by the bridge for a configured >> timeout, and depending on the timing of your VM bootup you could see >> weird things. =A0You can check with brctl showstp br0. > > No STP enabled, but the networking is permanently broken on these guests = in > any case, not just slow to get started. Usually they've been sat there fo= r > half an hour or more by the time I get back to the stopped reboot loop, a= nd > I left one broken over a weekend without it fixing itself. The network is > statically configured, so if it were down temporarily and came back, ping= s > would then start working fine. In a case like this it might be most effective to catch a VM in the bad state and then go in with gdb to see what is broken. The basic approach would be putting breakpoints on the e1000 device model's transmit/receive paths to see if the guest is giving us packets and whether the tap device is transmitting/receiving. If guest and host appear to be working then QEMU's e1000 model must be in a bad state and it's a question of looking at the tx/rx rings and other hardware emulation state to figure out what went wrong. Have you tried unloading the e1000 kernel module inside the guest and then modprobing it again? Does this "fix" the issue? Stefan