* help with bugs
@ 2005-08-04 15:04 Ian Pratt
2005-08-04 19:53 ` Anthony Liguori
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Ian Pratt @ 2005-08-04 15:04 UTC (permalink / raw)
To: xen-devel
I'd like to appeal for some help tracking down a couple of bugs that
we're struggling to reproduce:
BUG62 eth0 -> veth0 in network script can loose network
BUG130 time running fast bug
BUG76 shared irq's fail under high load
These are all pretty serious and it would be good to get fixed before
3.0-testing-r1
If you can make them exhibit frequently on your system it would be
useful to know.
Thanks,
Ian
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: help with bugs
2005-08-04 15:04 help with bugs Ian Pratt
@ 2005-08-04 19:53 ` Anthony Liguori
2005-08-04 20:18 ` Sean Dague
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: Anthony Liguori @ 2005-08-04 19:53 UTC (permalink / raw)
To: Ian Pratt; +Cc: xen-devel
Ian Pratt wrote:
> BUG76 shared irq's fail under high load
>
>
I can repro this very easily. If there's anything I can do to help
track this down let me know.
Regards,
Anthony Liguori
>These are all pretty serious and it would be good to get fixed before
>3.0-testing-r1
>
>If you can make them exhibit frequently on your system it would be
>useful to know.
>
>Thanks,
>Ian
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel
>
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: help with bugs
2005-08-04 15:04 help with bugs Ian Pratt
2005-08-04 19:53 ` Anthony Liguori
@ 2005-08-04 20:18 ` Sean Dague
2005-08-04 20:49 ` Nivedita Singhvi
2005-08-04 20:49 ` David F Barrera
2005-08-05 8:29 ` Gerd Knorr
3 siblings, 1 reply; 12+ messages in thread
From: Sean Dague @ 2005-08-04 20:18 UTC (permalink / raw)
To: Ian Pratt; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1354 bytes --]
On Thu, Aug 04, 2005 at 04:04:34PM +0100, Ian Pratt wrote:
>
>
> I'd like to appeal for some help tracking down a couple of bugs that
> we're struggling to reproduce:
>
> BUG62 eth0 -> veth0 in network script can loose network
I can make this bug come and go at will based on which of the 2 network
interfaces are part of the bridge. I added that information into the
bugzilla bug, hopefully that helps.
> BUG130 time running fast bug
> BUG76 shared irq's fail under high load
>
> These are all pretty serious and it would be good to get fixed before
> 3.0-testing-r1
>
> If you can make them exhibit frequently on your system it would be
> useful to know.
>
> Thanks,
> Ian
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
--
__________________________________________________________________
Sean Dague Mid-Hudson Valley
sean at dague dot net Linux Users Group
http://dague.net http://mhvlug.org
There is no silver bullet. Plus, werewolves make better neighbors
than zombies, and they tend to keep the vampire population down.
__________________________________________________________________
[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Re: help with bugs
2005-08-04 20:18 ` Sean Dague
@ 2005-08-04 20:49 ` Nivedita Singhvi
2005-08-04 20:54 ` Jerone Young
0 siblings, 1 reply; 12+ messages in thread
From: Nivedita Singhvi @ 2005-08-04 20:49 UTC (permalink / raw)
To: Sean Dague; +Cc: Ian Pratt, xen-devel
Sean Dague wrote:
> On Thu, Aug 04, 2005 at 04:04:34PM +0100, Ian Pratt wrote:
>
>>
>>I'd like to appeal for some help tracking down a couple of bugs that
>>we're struggling to reproduce:
>>
>> BUG62 eth0 -> veth0 in network script can loose network
Yep, working on this one. --Nivedita
I didn't see your original mail, so not sure if you listed
any others - but was workin on 103 which seems to have gone
awayin current code (we're trying to narrow it to the
NET GRANT code (not yet confirmed).
> I can make this bug come and go at will based on which of the 2 network
> interfaces are part of the bridge. I added that information into the
> bugzilla bug, hopefully that helps.
>> BUG130 time running fast bug
>> BUG76 shared irq's fail under high load
>>
thanks,
Nivedita
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Re: help with bugs
2005-08-04 20:49 ` Nivedita Singhvi
@ 2005-08-04 20:54 ` Jerone Young
2005-08-04 20:59 ` Nivedita Singhvi
0 siblings, 1 reply; 12+ messages in thread
From: Jerone Young @ 2005-08-04 20:54 UTC (permalink / raw)
To: Nivedita Singhvi; +Cc: Ian Pratt, xen-devel, Sean Dague
On Thu, 2005-08-04 at 13:49 -0700, Nivedita Singhvi wrote:
> Sean Dague wrote:
>
> > On Thu, Aug 04, 2005 at 04:04:34PM +0100, Ian Pratt wrote:
> >
> >>
> >>I'd like to appeal for some help tracking down a couple of bugs that
> >>we're struggling to reproduce:
> >>
> >> BUG62 eth0 -> veth0 in network script can loose network
>
> Yep, working on this one. --Nivedita
>
> I didn't see your original mail, so not sure if you listed
> any others - but was workin on 103 which seems to have gone
> awayin current code (we're trying to narrow it to the
> NET GRANT code (not yet confirmed).
Niv, the NETDEV GRANT code is not enabled in the x86-64 Dom0 kernel. So
that should not be the cause.
>
> > I can make this bug come and go at will based on which of the 2 network
> > interfaces are part of the bridge. I added that information into the
> > bugzilla bug, hopefully that helps.
>
>
> >> BUG130 time running fast bug
> >> BUG76 shared irq's fail under high load
> >>
>
> thanks,
> Nivedita
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
--
Jerone Young
IBM Linux Technology Center
jyoung5@us.ibm.com
512-838-1157 (T/L: 678-1157)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Re: help with bugs
2005-08-04 20:54 ` Jerone Young
@ 2005-08-04 20:59 ` Nivedita Singhvi
0 siblings, 0 replies; 12+ messages in thread
From: Nivedita Singhvi @ 2005-08-04 20:59 UTC (permalink / raw)
To: Jerone Young; +Cc: Ian Pratt, xen-devel, Sean Dague
Jerone Young wrote:
>>any others - but was workin on 103 which seems to have gone
>>awayin current code (we're trying to narrow it to the
>>NET GRANT code (not yet confirmed).
>
>
> Niv, the NETDEV GRANT code is not enabled in the x86-64 Dom0 kernel. So
> that should not be the cause.
It was accidentally enabled, we had thought, actually.
But yes, it's possibly unrelated.
thanks,
Nivedita
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: help with bugs
2005-08-04 15:04 help with bugs Ian Pratt
2005-08-04 19:53 ` Anthony Liguori
2005-08-04 20:18 ` Sean Dague
@ 2005-08-04 20:49 ` David F Barrera
2005-08-05 8:29 ` Gerd Knorr
3 siblings, 0 replies; 12+ messages in thread
From: David F Barrera @ 2005-08-04 20:49 UTC (permalink / raw)
To: Ian Pratt; +Cc: xen-devel
Ian Pratt wrote:
>I'd like to appeal for some help tracking down a couple of bugs that
>we're struggling to reproduce:
>
> BUG62 eth0 -> veth0 in network script can loose network
>
>
I've been able to reproduce this problem frequently on SLES 9 SP2 based
platforms, x86 and x86_64. It seems that when I first reported the
problem it happened very infrequently, and I could not reliably
reproduce it. Now, I seems to be happening all the time on my SLES 9
boxes; curiously, it does not seem to happen on my FC/RH boxes.
> BUG130 time running fast bug
> BUG76 shared irq's fail under high load
>
>These are all pretty serious and it would be good to get fixed before
>3.0-testing-r1
>
>If you can make them exhibit frequently on your system it would be
>useful to know.
>
>Thanks,
>Ian
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel
>
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: help with bugs
2005-08-04 15:04 help with bugs Ian Pratt
` (2 preceding siblings ...)
2005-08-04 20:49 ` David F Barrera
@ 2005-08-05 8:29 ` Gerd Knorr
3 siblings, 0 replies; 12+ messages in thread
From: Gerd Knorr @ 2005-08-05 8:29 UTC (permalink / raw)
To: Ian Pratt; +Cc: xen-devel
"Ian Pratt" <m+Ian.Pratt@cl.cam.ac.uk> writes:
> I'd like to appeal for some help tracking down a couple of bugs that
> we're struggling to reproduce:
>
> BUG62 eth0 -> veth0 in network script can loose network
I think the only sane way to fix this is to let the distribution tools
configure the network. Thats a bit harder to set up, but works more
reliable. Also the "if{up|down} <interface>" commands and the like
will work as usual then. Especially in case eth0 is configured via
dhcp the ip address copying is a bad idea. Unfortunaly it isn't very
good documented how all this works, especially the new veth0 thing.
IMHO it would be good if the network start script checks whenever any
bridges are already present in the system and don't touch the network
setup if that is the case. That should catch both network setup being
already done by the distro start scripts or by an earlier network
setup script run (when xend is restarted).
The setup I'm running looks like this (classic 2.x setup, no
veth0/vid0.0 used), in boot.local:
ip link set eth0 name hw-eth0
brctl addbr eth0
brctl addif eth0 hw-eth0
ip link set hw-eth0 up
ip link set eth0 up
Then let the network scripts setup eth0 (now a bridge) as usual and
tell xend that "eth0" is the bridge device it should add the vif
interfaces to.
cheers,
Gerd
--
panic("it works"); /* avoid being flooded with debug messages */
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: Re: help with bugs
@ 2005-08-04 22:00 Ian Pratt
0 siblings, 0 replies; 12+ messages in thread
From: Ian Pratt @ 2005-08-04 22:00 UTC (permalink / raw)
To: Nivedita Singhvi, Sean Dague; +Cc: xen-devel
> I didn't see your original mail, so not sure if you listed
> any others - but was workin on 103 which seems to have gone
> awayin current code (we're trying to narrow it to the NET
> GRANT code (not yet confirmed).
NET_GRANT is currently disabled by default because bugs were revealed
when we tried to enable it. Steve Hand is looking into this.
Best,
Ian
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: Re: help with bugs
@ 2005-08-04 23:48 Ian Pratt
2005-08-05 12:23 ` Sean Dague
0 siblings, 1 reply; 12+ messages in thread
From: Ian Pratt @ 2005-08-04 23:48 UTC (permalink / raw)
To: Sean Dague, xen-devel
> On Thu, Aug 04, 2005 at 10:48:50PM +0100, Ian Pratt wrote:
> > > > BUG62 eth0 -> veth0 in network script can loose network
> > > I can make this bug come and go at will based on which of the
> > > 2 network interfaces are part of the bridge. I added that
> > > information into the bugzilla bug, hopefully that helps.
> >
> > Are you changing the default 'netdev' at the top of the
> network script?
>
> No actually, I guess I'm not even clear why veth0 exists, as
> everything works quite nicely for me without it functioning.
If you're running services in dom0 that are used by other domains you
are liable to get head-of-line blocking or even deadlock of the domU's
networking unless you use veth0: all of the domU's skb's could end up
getting queued in dom0 socket buffers.
veth0 avoids this by copying packets destined for dom0 and giving the
buffer back to the domU.
Ian
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Re: help with bugs
2005-08-04 23:48 Ian Pratt
@ 2005-08-05 12:23 ` Sean Dague
0 siblings, 0 replies; 12+ messages in thread
From: Sean Dague @ 2005-08-05 12:23 UTC (permalink / raw)
To: Ian Pratt; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1692 bytes --]
On Fri, Aug 05, 2005 at 12:48:33AM +0100, Ian Pratt wrote:
> > On Thu, Aug 04, 2005 at 10:48:50PM +0100, Ian Pratt wrote:
> > > > > BUG62 eth0 -> veth0 in network script can loose network
> > > > I can make this bug come and go at will based on which of the
> > > > 2 network interfaces are part of the bridge. I added that
> > > > information into the bugzilla bug, hopefully that helps.
> > >
> > > Are you changing the default 'netdev' at the top of the
> > network script?
> >
> > No actually, I guess I'm not even clear why veth0 exists, as
> > everything works quite nicely for me without it functioning.
>
> If you're running services in dom0 that are used by other domains you
> are liable to get head-of-line blocking or even deadlock of the domU's
> networking unless you use veth0: all of the domU's skb's could end up
> getting queued in dom0 socket buffers.
>
> veth0 avoids this by copying packets destined for dom0 and giving the
> buffer back to the domU.
Is there a test case for this? I've been running some services in dom0 and
apparently running without veth0 for quite some time. It would be good to
have a test that shows this problem.
-Sean
--
__________________________________________________________________
Sean Dague Mid-Hudson Valley
sean at dague dot net Linux Users Group
http://dague.net http://mhvlug.org
There is no silver bullet. Plus, werewolves make better neighbors
than zombies, and they tend to keep the vampire population down.
__________________________________________________________________
[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: Re: help with bugs
@ 2005-08-05 14:09 Ian Pratt
0 siblings, 0 replies; 12+ messages in thread
From: Ian Pratt @ 2005-08-05 14:09 UTC (permalink / raw)
To: Sean Dague; +Cc: xen-devel
> > If you're running services in dom0 that are used by other
> domains you
> > are liable to get head-of-line blocking or even deadlock of
> the domU's
> > networking unless you use veth0: all of the domU's skb's
> could end up
> > getting queued in dom0 socket buffers.
> >
> > veth0 avoids this by copying packets destined for dom0 and
> giving the
> > buffer back to the domU.
>
> Is there a test case for this? I've been running some
> services in dom0 and apparently running without veth0 for
> quite some time. It would be good to have a test that shows
> this problem.
I haven't tried this, but I suspect it would work:
* run a ttcp receiver in dom0 with a very large socket buffer size
* connect to it with a ttcp transmitter in domU (again, large sock
buffer)
* with data in flight, ^Z the dom0 receiver
I'd expect to see the ntworking of the domU (and potentially other
domU's) start to run very slowly or even grind to a halt. You may need
multiple parallel tcp connections to trigger this. Using UDP makes it
happen much more easily.
If you're using veth0 you shouldn't have the problem.
[There are plans for making the backend buffer management more dynamic
that would mitigate the effect on other domU's, but this wouldn't
completely obviate the need for veth0 as a single domU could still end
up with all of its buffers being held by dom0.
There's a partial fix for TCP (not UDP) whereby we have dom0 release
it's mapping of the domU buffer as soon as its sent the TCP ACK, rather
than when it finally frees the skb when the client reads it.
Possibly the cleanest option would be to add a hook in the local receive
path that would enable us to copy&unmap any packets destined for local
delivery. Anyhow, not for 3.0.0 ...]
Ian
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2005-08-05 14:09 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-04 15:04 help with bugs Ian Pratt
2005-08-04 19:53 ` Anthony Liguori
2005-08-04 20:18 ` Sean Dague
2005-08-04 20:49 ` Nivedita Singhvi
2005-08-04 20:54 ` Jerone Young
2005-08-04 20:59 ` Nivedita Singhvi
2005-08-04 20:49 ` David F Barrera
2005-08-05 8:29 ` Gerd Knorr
-- strict thread matches above, loose matches on Subject: below --
2005-08-04 22:00 Ian Pratt
2005-08-04 23:48 Ian Pratt
2005-08-05 12:23 ` Sean Dague
2005-08-05 14:09 Ian Pratt
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.