From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nikola Ciprich Subject: 8139cp problems - steps to reproduce Date: Mon, 8 Sep 2008 09:57:59 +0200 Message-ID: <20080908075759.GA27882@develbox.linuxbox.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: nikola.ciprich@linuxbox.cz, lfarkas@lfarkas.org To: KVM list , qemu-devel Return-path: Received: from gwu.lbox.cz ([62.245.111.132]:57217 "EHLO gwu.lbox.cz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750835AbYIHH6I (ORCPT ); Mon, 8 Sep 2008 03:58:08 -0400 Content-Disposition: inline Sender: kvm-owner@vger.kernel.org List-ID: Hello Avi and everybody, (and in advance, sorry for cross-posting). As it was already reported, some people (including me :)) have problems with network getting stuck from time to time in KVM guests. According to http://qemu-forum.ipi.fi/viewtopic.php?f=4&t=4563&start=0&st=0&sk=t&sd=a&sid=fcf252234991e017919ca7d0eb3799a3 the problem is maybe not KVM speciffic. I can confirm that the problem seems to be occuring after transmitting few gigabytes of data, so it can be simply reproduced by starting KVM guest, mounting some NFS in it, and then starting shell loop dd if=/mnt/nfs/bigimage.iso of=/dev/zero after some runs (in my case usually tens of GB), the problem occurs: [ 2159.614496] NETDEV WATCHDOG: eth0: transmit timed out [ 2159.614537] eth0: Transmit timeout, status d 2b 15 80ff The status " d 2b 15 80ff" is always the same, on all testing machines which according to 8139cp.c means Command register=d C+ command register=2b Interrupt status=15 Interrupt mask=80ff Particular bits are explained in 8139cp comments, unfortunately this didn't make me any smarter :(. The only thing I tried was disabling rx/tx checksumming for the interface (this was needed fox XEN domUs as well), but it didn't helped. What is important to note is, that this is simply reproducible this way for x86_32 guests (I'm using x86_64 host). For x86_64 guests, the problem is actually much WORSE, as it usually gets host machine into totally unusable state (it replies to pings, but that's all, no message in logs after reboot, etc). I'll try to investigate it further. Another important note is, that the problem is certainly NOT system-load related, it occurs even when the machine is idle (except from load caused by network dd) I'm using kvm-74 now, with 2.6.26 host and 2.6.24 guest, and bridged networking. I'll try using e1000 driver, but I think that 8139cp is ATM considered the most stable choice, right? So does somebody have an idea on where the problem could be? Of course I'll be glad to (try) to help debugging... Thanks a lot in advance! nik -- ------------------------------------- Nikola CIPRICH LinuxBox.cz, s.r.o. 28. rijna 168, 709 01 Ostrava tel.: +420 596 603 142 fax: +420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: servis@linuxbox.cz -------------------------------------