All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: (repeatable) cross-domain networking failure
@ 2005-01-17 23:14 Ian Pratt
  2005-01-18  2:06 ` Adam Heath
  2005-01-18 11:05 ` Keir Fraser
  0 siblings, 2 replies; 17+ messages in thread
From: Ian Pratt @ 2005-01-17 23:14 UTC (permalink / raw)
  To: Ian Pratt, mukesh agrawal, Keir Fraser; +Cc: xen-devel, Nivedita Singhvi


OK, I have a good handle on the problem with UDP hangs into user-space
of domain 0.

It's down to message size: if the UDP payload size is less than 24
bytes, the buffer is not freed properly. Bizarre, but it explains why
our regression tests weren't picking it up as they all use larger
message sizes.

Anyhow, now we can reproduce, a fix should be forthcoming.

Ian


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt

^ permalink raw reply	[flat|nested] 17+ messages in thread
* RE: (repeatable) cross-domain networking failure
@ 2005-01-20 22:08 Ian Pratt
  0 siblings, 0 replies; 17+ messages in thread
From: Ian Pratt @ 2005-01-20 22:08 UTC (permalink / raw)
  To: Adam Heath, Keir Fraser; +Cc: xen-devel

 > > > It's down to message size: if the UDP payload size is less than
24
> > > bytes, the buffer is not freed properly. Bizarre, but it 
> explains why
> > > our regression tests weren't picking it up as they all use larger
> > > message sizes.
> > >
> > > Anyhow, now we can reproduce, a fix should be forthcoming.
> > >
> > > Ian
> >
> > This bug is now (hopefully) fixed in the testing and unstable trees.
> 
> Does this bug exist in the stable(2.0) tree?

Yes - it will be fixed in 2.0.4. It was pretty obscure (having been in
there ever since 1.3) so we're not rushing head long to doing a new
release.

Ian


-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl

^ permalink raw reply	[flat|nested] 17+ messages in thread
* RE: (repeatable) cross-domain networking failure
@ 2005-01-16 22:52 Ian Pratt
  2005-01-16 22:57 ` mukesh agrawal
  0 siblings, 1 reply; 17+ messages in thread
From: Ian Pratt @ 2005-01-16 22:52 UTC (permalink / raw)
  To: mukesh agrawal, Keir Fraser; +Cc: xen-devel, Nivedita Singhvi

> What (specific source files or documentation) would you 
> suggest starting 
> at, to see an example of how the destruction is supposed to 
> be done? I 
> guess the TCP receive code works properly, so maybe I should 
> compare that 
> to the UDP code?

Have you modified the config of your kernel at all? Can you reproduce
with one of the kernels compiled by us?

To debug this, I'd start off by instrumenting calls to skb_dequeue in
netback's net_rx_action, along with calls to skb_free and __kfree_skb in
skbuff.c

Ian


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt

^ permalink raw reply	[flat|nested] 17+ messages in thread
* (repeatable) cross-domain networking failure
@ 2005-01-15 16:40 mukesh agrawal
  2005-01-15 17:04 ` Keir Fraser
  2005-01-15 21:14 ` Nivedita Singhvi
  0 siblings, 2 replies; 17+ messages in thread
From: mukesh agrawal @ 2005-01-15 16:40 UTC (permalink / raw)
  To: xen-devel


Summary:

After sending some UDP traffic between two xen domains (Domain 0 and 
Domain 1) the networking between the domains fails. This failure is 100% 
repeatable.

In more detail:

I have two xen domains. They run the kernels from the 2.0.3 release. (I've run 
into the same problem with 2.0.1 as well.) Domain 0 has 5 physical ethernet 
interfaces, and a virtual interface to Domain 1. Domain 1 has just the virtual 
interface to Domain 0.

D0 is configured with IP address 192.168.0.1, and D1 with 192.168.1.1. The 
netmask is set to 255.255.0.0.

When I bring up D1, I can ping D1 from D0, ssh into D1, etc.

I then start a UDP server in D0, and a traffic generator in D1. After the 
traffic generator sends its 128-th packet, networking between the domains 
fails. The 128th packet is received successfully by the UDP server, but no 
later traffic arrives in D0. This includes UDP, TCP, ICMP, and ARP.

Looking at the interrupt counts in /proc/interrupts, I see that D0 no longer 
receives packets sent by D1. D1, however, does receive packets sent by D0. (To 
be clear, D0->D1 traffic is ICMP ping requests, unrelated to the UDP traffic. 
There is not UDP traffic sent from D0 to D1.)

(I suspect the stuff in this paragraph doesn't matter, but include it for 
completeness.) Eventually, D0's ARP cache entry for D1 expires. D0 ARPs for D1, 
and D1 replies. But D0 never receives these replies. And eventually, D1 stops 
replying to the ARPs entirely. (D1's sending behavior is observed via tcpdump 
running in the console connection to D1.)

Note that the networking failure only occurs if the UDP packets are delivered 
to a user-level process in D0. In particular, UDP traffic to D0's kernel NFS 
server does not induce the failure. Nor does traffic sent to D0 for which there 
is no user process to accept the packets. And neither does traffic which is 
forwarded on to other hosts via NAT. (I haven't tested the regular forwarding 
case.)

Also, for what it's worth, Domain 0's network connectivity on its other 
interfaces (which are connected to the world at large) are unaffected.

Looking through the mailing list archive, I saw a prior bug that seemed 
similar, but involved IP fragmentation. That is not the case here, as the UDP 
packets sent by D1 are small (<100 bytes).

Any suggestions for debugging this?

Thanks,
mukesh


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt

^ permalink raw reply	[flat|nested] 17+ messages in thread
* (repeatable) cross-domain networking failure
@ 2005-01-15  1:38 mukesh agrawal
  0 siblings, 0 replies; 17+ messages in thread
From: mukesh agrawal @ 2005-01-15  1:38 UTC (permalink / raw)
  To: xen-devel


Summary:

I'm running into a situation where, after sending some UDP traffic between 
two xen domains (Domain 0 and Domain 1) the networking between the 
domains fails. This failure is 100% repeatable.

In more detail:

I have two xen domains. They run the kernels from the 2.0.3 release. (I've 
run into the same problem with 2.0.1 as well.) Domain 0 has 5 physical 
ethernet interfaces, and a virtual interface to Domain 1. Domain 1 has 
just the virtual interface to Domain 0.

D0 is configured with IP address 192.168.0.1, and D1 with 192.168.1.1. The 
netmask is set to 255.255.0.0.

When I bring up D1, I can ping D1 from D0, ssh into D1, etc.

I then start a UDP server in D0, and a traffic generator in D1. After the 
traffic generator sends its 128-th packet, networking between the domains 
fails. The 128th packet is received successfully by the UDP server, but no 
later traffic arrives in D0. This includes UDP, TCP, ICMP, and ARP.

Looking at the interrupt counts in /proc/interrupts, I see that D0 no 
longer receives packets sent by D1. D1, however, does receive packets sent 
by D0. (To be clear, D0->D1 traffic is ICMP ping requests, unrelated to 
the UDP traffic. There is not UDP traffic sent from D0 to D1.)

(I suspect the stuff in this paragraph doesn't matter, but include it for 
completeness.) Eventually, D0's ARP cache entry for D1 expires. D0 ARPs 
for D1, and D1 replies. But D0 never receives these replies. And 
eventually, D1 stops replying to the ARPs entirely. (D1's sending behavior 
is observed via tcpdump running in the console connection to D1.)

Note that the networking failure only occurs if the UDP packets are 
delivered to a user-level process in D0. In particular, UDP traffic to 
D0's kernel NFS server does not induce the failure. Nor does traffic sent 
to D0 for which there is no user process to accept the packets. And 
neither does traffic which is forwarded on to other hosts via NAT. (I 
haven't tested the regular forwarding case.)

Also, for what it's worth, Domain 0's network connectivity on its other 
interfaces (which are connected to the world at large) are unaffected.

Looking through the mailing list archive, I saw a prior bug that seemed 
similar, but involved IP fragmentation. That is not the case here, as the 
UDP packets sent by D1 are small (<100 bytes).

Any suggestions for debugging this?

Thanks,
mukesh


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2005-01-20 22:08 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-17 23:14 (repeatable) cross-domain networking failure Ian Pratt
2005-01-18  2:06 ` Adam Heath
2005-01-18 11:05 ` Keir Fraser
2005-01-18 11:28   ` Keir Fraser
2005-01-18 16:04     ` Nivedita Singhvi
2005-01-20 19:11     ` Adam Heath
2005-01-19 23:17   ` mukesh agrawal
  -- strict thread matches above, loose matches on Subject: below --
2005-01-20 22:08 Ian Pratt
2005-01-16 22:52 Ian Pratt
2005-01-16 22:57 ` mukesh agrawal
2005-01-15 16:40 mukesh agrawal
2005-01-15 17:04 ` Keir Fraser
2005-01-15 21:14 ` Nivedita Singhvi
     [not found]   ` <e15e04f905011611313312b9f4@mail.gmail.com>
2005-01-16 20:49     ` mukesh agrawal
2005-01-16 21:09       ` Keir Fraser
2005-01-16 21:56         ` mukesh agrawal
2005-01-15  1:38 mukesh agrawal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.