From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KcbdU-0005VO-SY for qemu-devel@nongnu.org; Mon, 08 Sep 2008 03:58:12 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KcbdR-0005QA-Dh for qemu-devel@nongnu.org; Mon, 08 Sep 2008 03:58:12 -0400 Received: from [199.232.76.173] (port=43229 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KcbdR-0005Po-5r for qemu-devel@nongnu.org; Mon, 08 Sep 2008 03:58:09 -0400 Received: from mx20.gnu.org ([199.232.41.8]:45551) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1KcbdQ-00073z-JR for qemu-devel@nongnu.org; Mon, 08 Sep 2008 03:58:08 -0400 Received: from gwu.lbox.cz ([62.245.111.132]) by mx20.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KcbdN-0001X1-VE for qemu-devel@nongnu.org; Mon, 08 Sep 2008 03:58:06 -0400 Date: Mon, 8 Sep 2008 09:57:59 +0200 From: Nikola Ciprich Message-ID: <20080908075759.GA27882@develbox.linuxbox.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: [Qemu-devel] 8139cp problems - steps to reproduce Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: KVM list , qemu-devel Cc: nikola.ciprich@linuxbox.cz, lfarkas@lfarkas.org Hello Avi and everybody, (and in advance, sorry for cross-posting). As it was already reported, some people (including me :)) have problems with network getting stuck from time to time in KVM guests. According to http://qemu-forum.ipi.fi/viewtopic.php?f=4&t=4563&start=0&st=0&sk=t&sd=a&sid=fcf252234991e017919ca7d0eb3799a3 the problem is maybe not KVM speciffic. I can confirm that the problem seems to be occuring after transmitting few gigabytes of data, so it can be simply reproduced by starting KVM guest, mounting some NFS in it, and then starting shell loop dd if=/mnt/nfs/bigimage.iso of=/dev/zero after some runs (in my case usually tens of GB), the problem occurs: [ 2159.614496] NETDEV WATCHDOG: eth0: transmit timed out [ 2159.614537] eth0: Transmit timeout, status d 2b 15 80ff The status " d 2b 15 80ff" is always the same, on all testing machines which according to 8139cp.c means Command register=d C+ command register=2b Interrupt status=15 Interrupt mask=80ff Particular bits are explained in 8139cp comments, unfortunately this didn't make me any smarter :(. The only thing I tried was disabling rx/tx checksumming for the interface (this was needed fox XEN domUs as well), but it didn't helped. What is important to note is, that this is simply reproducible this way for x86_32 guests (I'm using x86_64 host). For x86_64 guests, the problem is actually much WORSE, as it usually gets host machine into totally unusable state (it replies to pings, but that's all, no message in logs after reboot, etc). I'll try to investigate it further. Another important note is, that the problem is certainly NOT system-load related, it occurs even when the machine is idle (except from load caused by network dd) I'm using kvm-74 now, with 2.6.26 host and 2.6.24 guest, and bridged networking. I'll try using e1000 driver, but I think that 8139cp is ATM considered the most stable choice, right? So does somebody have an idea on where the problem could be? Of course I'll be glad to (try) to help debugging... Thanks a lot in advance! nik -- ------------------------------------- Nikola CIPRICH LinuxBox.cz, s.r.o. 28. rijna 168, 709 01 Ostrava tel.: +420 596 603 142 fax: +420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: servis@linuxbox.cz -------------------------------------