* PV on HVM network stops
@ 2007-05-09 15:56 Ryan Harper
2007-05-09 17:18 ` Ryan Harper
0 siblings, 1 reply; 7+ messages in thread
From: Ryan Harper @ 2007-05-09 15:56 UTC (permalink / raw)
To: xen-devel
I've been running some tests using PV drivers in a linux HVM domain and
have been unable to determine why after some period of time the network
connection just stops working. I'm using the default bridging setup,
I've seen this on xen-unstable changeset 15017, and all the way back to
14280. Guest and Host are pae. Any pointers on where to start
debugging this? Nothing interesting shows up in xm dmesg, dmesg in the
guest, none of the logs, nor any of the networking configuration output.
I've not be able to recreate this using just PV domains.
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PV on HVM network stops
2007-05-09 15:56 PV on HVM network stops Ryan Harper
@ 2007-05-09 17:18 ` Ryan Harper
2007-05-09 19:31 ` Ryan Harper
0 siblings, 1 reply; 7+ messages in thread
From: Ryan Harper @ 2007-05-09 17:18 UTC (permalink / raw)
To: xen-devel
* Ryan Harper <ryanh@us.ibm.com> [2007-05-09 11:07]:
> I've been running some tests using PV drivers in a linux HVM domain and
> have been unable to determine why after some period of time the network
> connection just stops working. I'm using the default bridging setup,
> I've seen this on xen-unstable changeset 15017, and all the way back to
> 14280. Guest and Host are pae. Any pointers on where to start
> debugging this? Nothing interesting shows up in xm dmesg, dmesg in the
> guest, none of the logs, nor any of the networking configuration output.
>
If I pause the domain and then unpause, networking comes back. Does
this help narrow down where I should be looking for debugging this
issue?
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PV on HVM network stops
2007-05-09 17:18 ` Ryan Harper
@ 2007-05-09 19:31 ` Ryan Harper
2007-05-09 22:59 ` Ian Pratt
0 siblings, 1 reply; 7+ messages in thread
From: Ryan Harper @ 2007-05-09 19:31 UTC (permalink / raw)
To: xen-devel
* Ryan Harper <ryanh@us.ibm.com> [2007-05-09 12:42]:
> * Ryan Harper <ryanh@us.ibm.com> [2007-05-09 11:07]:
> > I've been running some tests using PV drivers in a linux HVM domain and
> > have been unable to determine why after some period of time the network
> > connection just stops working. I'm using the default bridging setup,
> > I've seen this on xen-unstable changeset 15017, and all the way back to
> > 14280. Guest and Host are pae. Any pointers on where to start
> > debugging this? Nothing interesting shows up in xm dmesg, dmesg in the
> > guest, none of the logs, nor any of the networking configuration output.
> >
>
> If I pause the domain and then unpause, networking comes back. Does
> this help narrow down where I should be looking for debugging this
> issue?
Actually, what works more reliably is to ifdown vifX.0; and then
ifconfig vifX.0 0, which brings it back up, we get bridge topology state
changes, and then network traffic resumes.
Using tcpdump, I can see traffic arrive in the domain, but no traffic
leaves the guest.
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: PV on HVM network stops
2007-05-09 19:31 ` Ryan Harper
@ 2007-05-09 22:59 ` Ian Pratt
2007-05-09 23:14 ` Ryan Harper
0 siblings, 1 reply; 7+ messages in thread
From: Ian Pratt @ 2007-05-09 22:59 UTC (permalink / raw)
To: Ryan Harper, xen-devel
> > If I pause the domain and then unpause, networking comes back. Does
> > this help narrow down where I should be looking for debugging this
> > issue?
>
> Actually, what works more reliably is to ifdown vifX.0; and then
> ifconfig vifX.0 0, which brings it back up, we get bridge topology
> state changes, and then network traffic resumes.
Presumably taking the guest interface down makes no difference? (Not
sure you can unload the module, but have you tried?)
> Using tcpdump, I can see traffic arrive in the domain, but no traffic
> leaves the guest.
So, packets seem to be received by the guest, but if you tcpdump the
associated vifX.0 you don't see anything (whereas a tcpdump in the guest
indicates packets are being sent).
One way to debug this would be to add a dom0 sysrq key handler to dump
the producer consumer pointers, or otherwise export them via sysfs. Does
cat /proc/interrupts show rx interrupts on the vif?
Ian
> --
> Ryan Harper
> Software Engineer; Linux Technology Center
> IBM Corp., Austin, Tx
> (512) 838-9253 T/L: 678-9253
> ryanh@us.ibm.com
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PV on HVM network stops
2007-05-09 22:59 ` Ian Pratt
@ 2007-05-09 23:14 ` Ryan Harper
2007-05-09 23:44 ` Ian Pratt
0 siblings, 1 reply; 7+ messages in thread
From: Ryan Harper @ 2007-05-09 23:14 UTC (permalink / raw)
To: Ian Pratt; +Cc: Ryan Harper, xen-devel
* Ian Pratt <Ian.Pratt@cl.cam.ac.uk> [2007-05-09 18:00]:
> > > If I pause the domain and then unpause, networking comes back. Does
> > > this help narrow down where I should be looking for debugging this
> > > issue?
> >
> > Actually, what works more reliably is to ifdown vifX.0; and then
> > ifconfig vifX.0 0, which brings it back up, we get bridge topology
> > state changes, and then network traffic resumes.
>
> Presumably taking the guest interface down makes no difference? (Not
> sure you can unload the module, but have you tried?)
I tried. It doesn't completely work, I'll get the dmesg output again
for future reference. Reloading the module didn't help as it set the
device mac add to all nulls.
>
> > Using tcpdump, I can see traffic arrive in the domain, but no traffic
> > leaves the guest.
>
> So, packets seem to be received by the guest, but if you tcpdump the
> associated vifX.0 you don't see anything (whereas a tcpdump in the guest
> indicates packets are being sent).
tcpdump on vifX.0 shows traffic on the bridge, arps for the guest ip.
tcpdump in the guest showed it getting the arps, but no reply. ie, no
outgoing traffic.
I've worked around this issue by cycling the vif in the host.
What I am seeing now is that sometimes the guest just doesn't seem to be
making progress, no cpu time. xm console the guest hangs any new
processes don't seem to execute. For example, I can have a console
session connected and watch networking die, cycle the vif, pings start
working again, and running ps in the guest just blocks. xm list shows
the guest in the block state. At this point, the guest is pretty much
dead even though it will continue to process ICMP packets.
There isn't much output in the qemu-dm log file, but I'll toss that in
here to see if it rings any bells:
domid: 5
qemu: the number of cpus is 1
Watching /local/domain/5/logdirty/next-active
qemu_map_cache_init nr_buckets = 4000
shared page at pfn 1ffff
buffered io page at pfn 1fffd
Time offset set 0
xs_read(): vncpasswd get error. /vm/73c84d4e-220c-5e88-5cf4-2786f4ce5a44/vncpasswd.
char device redirected to /dev/pts/3
I/O request not ready: 0, ptr: 0, port: 0, data: 0, count: 0, size: 0
Triggered log-dirty buffer switch
xs_write(/vm/73c84d4e-220c-5e88-5cf4-2786f4ce5a44/rtc/timeoffset, rtc/timeoffset): write error
More details:
Host 32-bit pae, guest 32-bit, 1 vcpu, 512M ram
I've tried running with acpi=0 apic=0, and 1,1 respectively, but no change in behavior.
>
> One way to debug this would be to add a dom0 sysrq key handler to dump
> the producer consumer pointers, or otherwise export them via sysfs. Does
> cat /proc/interrupts show rx interrupts on the vif?
I'll give these a spin.
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: PV on HVM network stops
2007-05-09 23:14 ` Ryan Harper
@ 2007-05-09 23:44 ` Ian Pratt
2007-05-09 23:51 ` Ryan Harper
0 siblings, 1 reply; 7+ messages in thread
From: Ian Pratt @ 2007-05-09 23:44 UTC (permalink / raw)
To: Ryan Harper, Ian Pratt; +Cc: xen-devel
> > Presumably taking the guest interface down makes no difference? (Not
> > sure you can unload the module, but have you tried?)
>
> I tried. It doesn't completely work, I'll get the dmesg output again
> for future reference. Reloading the module didn't help as it set the
> device mac add to all nulls.
It should be able to read the MAC from xenstore. This must be a bug.
> > > Using tcpdump, I can see traffic arrive in the domain, but no
> traffic
> > > leaves the guest.
> >
> > So, packets seem to be received by the guest, but if you tcpdump the
> > associated vifX.0 you don't see anything (whereas a tcpdump in the
> guest
> > indicates packets are being sent).
>
> tcpdump on vifX.0 shows traffic on the bridge, arps for the guest ip.
> tcpdump in the guest showed it getting the arps, but no reply. ie, no
> outgoing traffic.
Hang on, you mean within the guest you don't see it sending a reply? If
true, that must be a guest issue and its hard to see how doing anything
in dom0 will help.
> I've worked around this issue by cycling the vif in the host.
>
> What I am seeing now is that sometimes the guest just doesn't seem to
> be making progress, no cpu time. xm console the guest hangs any new
> processes don't seem to execute. For example, I can have a console
> session connected and watch networking die, cycle the vif, pings start
> working again, and running ps in the guest just blocks. xm list shows
> the guest in the block state. At this point, the guest is pretty much
> dead even though it will continue to process ICMP packets.
That sounds like a symptom of the block devices being wedged. Are you
using a PV block device or emulated IDE?
Ian
>
> There isn't much output in the qemu-dm log file, but I'll toss that in
> here to see if it rings any bells:
>
> domid: 5
> qemu: the number of cpus is 1
> Watching /local/domain/5/logdirty/next-active
> qemu_map_cache_init nr_buckets = 4000
> shared page at pfn 1ffff
> buffered io page at pfn 1fffd
> Time offset set 0
> xs_read(): vncpasswd get error. /vm/73c84d4e-220c-5e88-5cf4-
> 2786f4ce5a44/vncpasswd.
> char device redirected to /dev/pts/3
> I/O request not ready: 0, ptr: 0, port: 0, data: 0, count: 0, size: 0
> Triggered log-dirty buffer switch
> xs_write(/vm/73c84d4e-220c-5e88-5cf4-2786f4ce5a44/rtc/timeoffset,
> rtc/timeoffset): write error
>
>
> More details:
>
> Host 32-bit pae, guest 32-bit, 1 vcpu, 512M ram
>
> I've tried running with acpi=0 apic=0, and 1,1 respectively, but no
> change in behavior.
>
> >
> > One way to debug this would be to add a dom0 sysrq key handler to
> dump
> > the producer consumer pointers, or otherwise export them via sysfs.
> Does
> > cat /proc/interrupts show rx interrupts on the vif?
>
> I'll give these a spin.
>
>
> --
> Ryan Harper
> Software Engineer; Linux Technology Center
> IBM Corp., Austin, Tx
> (512) 838-9253 T/L: 678-9253
> ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PV on HVM network stops
2007-05-09 23:44 ` Ian Pratt
@ 2007-05-09 23:51 ` Ryan Harper
0 siblings, 0 replies; 7+ messages in thread
From: Ryan Harper @ 2007-05-09 23:51 UTC (permalink / raw)
To: Ian Pratt; +Cc: Ryan Harper, xen-devel
* Ian Pratt <Ian.Pratt@cl.cam.ac.uk> [2007-05-09 18:46]:
> > > Presumably taking the guest interface down makes no difference? (Not
> > > sure you can unload the module, but have you tried?)
> >
> > I tried. It doesn't completely work, I'll get the dmesg output again
> > for future reference. Reloading the module didn't help as it set the
> > device mac add to all nulls.
>
> It should be able to read the MAC from xenstore. This must be a bug.
>
> > > > Using tcpdump, I can see traffic arrive in the domain, but no
> > traffic
> > > > leaves the guest.
> > >
> > > So, packets seem to be received by the guest, but if you tcpdump the
> > > associated vifX.0 you don't see anything (whereas a tcpdump in the
> > guest
> > > indicates packets are being sent).
> >
> > tcpdump on vifX.0 shows traffic on the bridge, arps for the guest ip.
> > tcpdump in the guest showed it getting the arps, but no reply. ie, no
> > outgoing traffic.
>
> Hang on, you mean within the guest you don't see it sending a reply? If
> true, that must be a guest issue and its hard to see how doing anything
> in dom0 will help.
I'm not sure if there the networking is the real problem. The trouble
is that in many cases when the networking chokes, the console is hosed
as well which makes it rather difficult to capture tcpdump from within.
The next time networking is down but console is up, I'll confirm the
tcpdump from the guest.
>
>
> > I've worked around this issue by cycling the vif in the host.
> >
> > What I am seeing now is that sometimes the guest just doesn't seem to
> > be making progress, no cpu time. xm console the guest hangs any new
> > processes don't seem to execute. For example, I can have a console
> > session connected and watch networking die, cycle the vif, pings start
> > working again, and running ps in the guest just blocks. xm list shows
> > the guest in the block state. At this point, the guest is pretty much
> > dead even though it will continue to process ICMP packets.
>
> That sounds like a symptom of the block devices being wedged. Are you
> using a PV block device or emulated IDE?
using PV block.
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253 T/L: 678-9253
ryanh@us.ibm.com
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-05-09 23:51 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-09 15:56 PV on HVM network stops Ryan Harper
2007-05-09 17:18 ` Ryan Harper
2007-05-09 19:31 ` Ryan Harper
2007-05-09 22:59 ` Ian Pratt
2007-05-09 23:14 ` Ryan Harper
2007-05-09 23:44 ` Ian Pratt
2007-05-09 23:51 ` Ryan Harper
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.