From: Chris Webb <chris@arachsys.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0
Date: Tue, 3 Apr 2012 13:42:18 +0100 [thread overview]
Message-ID: <20120403124217.GN1283@arachsys.com> (raw)
In-Reply-To: <CAJSP0QV53ceKOMvUhvsTsb5wOwKQYmLRdqnze8oeE0DhtiFh0w@mail.gmail.com>
Stefan Hajnoczi <stefanha@gmail.com> writes:
> In a case like this it might be most effective to catch a VM in the
> bad state and then go in with gdb to see what is broken. The basic
> approach would be putting breakpoints on the e1000 device model's
> transmit/receive paths to see if the guest is giving us packets and
> whether the tap device is transmitting/receiving. If guest and host
> appear to be working then QEMU's e1000 model must be in a bad state
> and it's a question of looking at the tx/rx rings and other hardware
> emulation state to figure out what went wrong.
Hi Stefan. I tried setting a breakpoint on start_xmit, but the qemu blew up
when I hit it:
(gdb) break /home/root/packages/qemu-kvm-1.0/src-hrw66F/hw/e1000.c:start_xmit
Function "start_xmit" not defined.
Make breakpoint pending on future shared library load? (y or [n]) n
(gdb) break /home/root/packages/qemu-kvm-1.0/src-hrw66F/hw/e1000.c:528
Breakpoint 1 at 0x46dcd6: file /home/root/packages/qemu-kvm-1.0/src-hrw66F/hw/e1000.c, line 528.
(gdb) cont
Continuing.
Program terminated with signal SIGTRAP, Trace/breakpoint trap.
The program no longer exists.
I assume this is some subtlety with breakpointing threaded code?
However, along these lines, I note that the guest appears to have received
packets, though this count is stuck at 1993 bytes. The TX count marches upwards
as I ping outbound from the guest.
If I attach a tcpdump to tap1 on the host, I see the ARP requests going out and
apparently no reply:
0024# tcpdump -i tap1
tcpdump: WARNING: tap1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tap1, link-type EN10MB (Ethernet), capture size 65535 bytes
12:08:35.654992 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28
12:08:36.654976 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28
12:08:37.654975 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28
12:08:38.670933 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28
12:08:39.670922 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28
12:08:40.670908 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28
Looking on br0, I do seem to see the replies:
12:12:53.509471 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 84.45.8.129 tell 84.45.8.242, length 28
12:12:53.509914 ARP, Ethernet (len 6), IPv4 (len 4), Reply 84.45.8.129 is-at 00:13:c3:35:a6:42 (oui Unknown), length 46
12:12:54.509455 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 84.45.8.129 tell 84.45.8.242, length 28
12:12:54.509875 ARP, Ethernet (len 6), IPv4 (len 4), Reply 84.45.8.129 is-at 00:13:c3:35:a6:42 (oui Unknown), length 46
12:12:55.509447 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 84.45.8.129 tell 84.45.8.242, length 28
12:12:55.509878 ARP, Ethernet (len 6), IPv4 (len 4), Reply 84.45.8.129 is-at 00:13:c3:35:a6:42 (oui Unknown), length 46
12:12:56.525424 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 84.45.8.129 tell 84.45.8.242, length 28
12:12:56.525854 ARP, Ethernet (len 6), IPv4 (len 4), Reply 84.45.8.129 is-at 00:13:c3:35:a6:42 (oui Unknown), length 46
12:12:57.525408 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 84.45.8.129 tell 84.45.8.242, length 28
12:12:57.525837 ARP, Ethernet (len 6), IPv4 (len 4), Reply 84.45.8.129 is-at 00:13:c3:35:a6:42 (oui Unknown), length 46
but they never get to tap1 despite STP being disabled and no bridge port
filtering:
# ebtables -L
Bridge table: filter
Bridge chain: INPUT, entries: 0, policy: ACCEPT
Bridge chain: FORWARD, entries: 0, policy: ACCEPT
Bridge chain: OUTPUT, entries: 0, policy: ACCEPT
# brctl show br0
bridge name bridge id STP enabled interfaces
br0 8000.002590224ffa no eth0
This looks uncannily like a kernel problem doesn't it? However, remove the
-usbdevice tablet, and it goes away, which is truly weird! I've just done a
hundred successful reboots without it once again to confirm to myself that I'm
definitely not imagining that behaviour.
> Have you tried unloading the e1000 kernel module inside the guest and
> then modprobing it again? Does this "fix" the issue?
Hadn't thought of that, but no, it apparently has no effect. It's still broken
after I rmmod it, modprobe it again, and reconfigure the networking.
Cheers,
Chris.
next prev parent reply other threads:[~2012-04-03 12:42 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-02 15:37 [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0 Chris Webb
2012-04-03 7:13 ` Stefan Hajnoczi
2012-04-03 8:13 ` Chris Webb
2012-04-03 8:26 ` Stefan Hajnoczi
2012-04-03 12:42 ` Chris Webb [this message]
2012-04-03 13:34 ` Stefan Hajnoczi
2012-04-03 13:41 ` Chris Webb
2012-04-03 15:36 ` Stefan Hajnoczi
2012-04-03 16:37 ` Chris Webb
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120403124217.GN1283@arachsys.com \
--to=chris@arachsys.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).