* (repeatable) cross-domain networking failure
@ 2005-01-15 1:38 mukesh agrawal
0 siblings, 0 replies; 18+ messages in thread
From: mukesh agrawal @ 2005-01-15 1:38 UTC (permalink / raw)
To: xen-devel
Summary:
I'm running into a situation where, after sending some UDP traffic between
two xen domains (Domain 0 and Domain 1) the networking between the
domains fails. This failure is 100% repeatable.
In more detail:
I have two xen domains. They run the kernels from the 2.0.3 release. (I've
run into the same problem with 2.0.1 as well.) Domain 0 has 5 physical
ethernet interfaces, and a virtual interface to Domain 1. Domain 1 has
just the virtual interface to Domain 0.
D0 is configured with IP address 192.168.0.1, and D1 with 192.168.1.1. The
netmask is set to 255.255.0.0.
When I bring up D1, I can ping D1 from D0, ssh into D1, etc.
I then start a UDP server in D0, and a traffic generator in D1. After the
traffic generator sends its 128-th packet, networking between the domains
fails. The 128th packet is received successfully by the UDP server, but no
later traffic arrives in D0. This includes UDP, TCP, ICMP, and ARP.
Looking at the interrupt counts in /proc/interrupts, I see that D0 no
longer receives packets sent by D1. D1, however, does receive packets sent
by D0. (To be clear, D0->D1 traffic is ICMP ping requests, unrelated to
the UDP traffic. There is not UDP traffic sent from D0 to D1.)
(I suspect the stuff in this paragraph doesn't matter, but include it for
completeness.) Eventually, D0's ARP cache entry for D1 expires. D0 ARPs
for D1, and D1 replies. But D0 never receives these replies. And
eventually, D1 stops replying to the ARPs entirely. (D1's sending behavior
is observed via tcpdump running in the console connection to D1.)
Note that the networking failure only occurs if the UDP packets are
delivered to a user-level process in D0. In particular, UDP traffic to
D0's kernel NFS server does not induce the failure. Nor does traffic sent
to D0 for which there is no user process to accept the packets. And
neither does traffic which is forwarded on to other hosts via NAT. (I
haven't tested the regular forwarding case.)
Also, for what it's worth, Domain 0's network connectivity on its other
interfaces (which are connected to the world at large) are unaffected.
Looking through the mailing list archive, I saw a prior bug that seemed
similar, but involved IP fragmentation. That is not the case here, as the
UDP packets sent by D1 are small (<100 bytes).
Any suggestions for debugging this?
Thanks,
mukesh
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
^ permalink raw reply [flat|nested] 18+ messages in thread
* (repeatable) cross-domain networking failure
@ 2005-01-15 16:40 mukesh agrawal
2005-01-15 17:04 ` Keir Fraser
2005-01-15 21:14 ` Nivedita Singhvi
0 siblings, 2 replies; 18+ messages in thread
From: mukesh agrawal @ 2005-01-15 16:40 UTC (permalink / raw)
To: xen-devel
Summary:
After sending some UDP traffic between two xen domains (Domain 0 and
Domain 1) the networking between the domains fails. This failure is 100%
repeatable.
In more detail:
I have two xen domains. They run the kernels from the 2.0.3 release. (I've run
into the same problem with 2.0.1 as well.) Domain 0 has 5 physical ethernet
interfaces, and a virtual interface to Domain 1. Domain 1 has just the virtual
interface to Domain 0.
D0 is configured with IP address 192.168.0.1, and D1 with 192.168.1.1. The
netmask is set to 255.255.0.0.
When I bring up D1, I can ping D1 from D0, ssh into D1, etc.
I then start a UDP server in D0, and a traffic generator in D1. After the
traffic generator sends its 128-th packet, networking between the domains
fails. The 128th packet is received successfully by the UDP server, but no
later traffic arrives in D0. This includes UDP, TCP, ICMP, and ARP.
Looking at the interrupt counts in /proc/interrupts, I see that D0 no longer
receives packets sent by D1. D1, however, does receive packets sent by D0. (To
be clear, D0->D1 traffic is ICMP ping requests, unrelated to the UDP traffic.
There is not UDP traffic sent from D0 to D1.)
(I suspect the stuff in this paragraph doesn't matter, but include it for
completeness.) Eventually, D0's ARP cache entry for D1 expires. D0 ARPs for D1,
and D1 replies. But D0 never receives these replies. And eventually, D1 stops
replying to the ARPs entirely. (D1's sending behavior is observed via tcpdump
running in the console connection to D1.)
Note that the networking failure only occurs if the UDP packets are delivered
to a user-level process in D0. In particular, UDP traffic to D0's kernel NFS
server does not induce the failure. Nor does traffic sent to D0 for which there
is no user process to accept the packets. And neither does traffic which is
forwarded on to other hosts via NAT. (I haven't tested the regular forwarding
case.)
Also, for what it's worth, Domain 0's network connectivity on its other
interfaces (which are connected to the world at large) are unaffected.
Looking through the mailing list archive, I saw a prior bug that seemed
similar, but involved IP fragmentation. That is not the case here, as the UDP
packets sent by D1 are small (<100 bytes).
Any suggestions for debugging this?
Thanks,
mukesh
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: (repeatable) cross-domain networking failure
2005-01-15 16:40 (repeatable) cross-domain networking failure mukesh agrawal
@ 2005-01-15 17:04 ` Keir Fraser
[not found] ` <e15e04f9050115091893409f1@mail.gmail.com>
2005-01-15 21:14 ` Nivedita Singhvi
1 sibling, 1 reply; 18+ messages in thread
From: Keir Fraser @ 2005-01-15 17:04 UTC (permalink / raw)
To: mukesh agrawal; +Cc: xen-devel
Maybe add some tracing to the backend driver -- it's possible the
backend isn't sending responses for those packets back to domU, and so
things seize up for a while. If no responses are being generated it is
because the backend thinks the packets are still in flight, so there
would be some bug-hunting to find out why that is.
-- Keir
>
> Summary:
>
> After sending some UDP traffic between two xen domains (Domain 0 and
> Domain 1) the networking between the domains fails. This failure is 100%
> repeatable.
>
> In more detail:
>
> I have two xen domains. They run the kernels from the 2.0.3 release. (I've run
> into the same problem with 2.0.1 as well.) Domain 0 has 5 physical ethernet
> interfaces, and a virtual interface to Domain 1. Domain 1 has just the virtual
> interface to Domain 0.
>
> D0 is configured with IP address 192.168.0.1, and D1 with 192.168.1.1. The
> netmask is set to 255.255.0.0.
>
> When I bring up D1, I can ping D1 from D0, ssh into D1, etc.
>
> I then start a UDP server in D0, and a traffic generator in D1. After the
> traffic generator sends its 128-th packet, networking between the domains
> fails. The 128th packet is received successfully by the UDP server, but no
> later traffic arrives in D0. This includes UDP, TCP, ICMP, and ARP.
>
> Looking at the interrupt counts in /proc/interrupts, I see that D0 no longer
> receives packets sent by D1. D1, however, does receive packets sent by D0. (To
> be clear, D0->D1 traffic is ICMP ping requests, unrelated to the UDP traffic.
> There is not UDP traffic sent from D0 to D1.)
>
> (I suspect the stuff in this paragraph doesn't matter, but include it for
> completeness.) Eventually, D0's ARP cache entry for D1 expires. D0 ARPs for D1,
> and D1 replies. But D0 never receives these replies. And eventually, D1 stops
> replying to the ARPs entirely. (D1's sending behavior is observed via tcpdump
> running in the console connection to D1.)
>
> Note that the networking failure only occurs if the UDP packets are delivered
> to a user-level process in D0. In particular, UDP traffic to D0's kernel NFS
> server does not induce the failure. Nor does traffic sent to D0 for which there
> is no user process to accept the packets. And neither does traffic which is
> forwarded on to other hosts via NAT. (I haven't tested the regular forwarding
> case.)
>
> Also, for what it's worth, Domain 0's network connectivity on its other
> interfaces (which are connected to the world at large) are unaffected.
>
> Looking through the mailing list archive, I saw a prior bug that seemed
> similar, but involved IP fragmentation. That is not the case here, as the UDP
> packets sent by D1 are small (<100 bytes).
>
> Any suggestions for debugging this?
>
> Thanks,
> mukesh
>
>
> -------------------------------------------------------
> The SF.Net email is sponsored by: Beat the post-holiday blues
> Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
> It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel
>
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Fwd: (repeatable) cross-domain networking failure
[not found] ` <e15e04f9050115091893409f1@mail.gmail.com>
@ 2005-01-15 17:26 ` mukesh agrawal
0 siblings, 0 replies; 18+ messages in thread
From: mukesh agrawal @ 2005-01-15 17:26 UTC (permalink / raw)
To: xen-devel
> Maybe add some tracing to the backend driver -- it's possible the
> backend isn't sending responses for those packets back to domU, and so
> things seize up for a while. If no responses are being generated it is
> because the backend thinks the packets are still in flight, so there
> would be some bug-hunting to find out why that is.
I'm not at all familiar with the details of the networking implementation,
so please bear with my questions. (Feel free to point me at existing
documentation on the details that I may have overlooked.)
1. When you say "the backend", is there just one backend (running,
perhaps, in dom0)? Or is there a backend in each domain?
2. When you talk about responses not being generated, are you referring to
the ICMP and ARP traffic? (For the UDP traffic, there isn't expected to
be any packet sent back from dom0 back to domU.)
Thanks,
mukesh
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: (repeatable) cross-domain networking failure
2005-01-15 16:40 (repeatable) cross-domain networking failure mukesh agrawal
2005-01-15 17:04 ` Keir Fraser
@ 2005-01-15 21:14 ` Nivedita Singhvi
[not found] ` <e15e04f905011611313312b9f4@mail.gmail.com>
1 sibling, 1 reply; 18+ messages in thread
From: Nivedita Singhvi @ 2005-01-15 21:14 UTC (permalink / raw)
To: mukesh agrawal; +Cc: xen-devel
mukesh agrawal wrote:
>
> Summary:
>
> After sending some UDP traffic between two xen domains (Domain 0 and
> Domain 1) the networking between the domains fails. This failure is 100%
> repeatable.
I don't have boxes at the moment and can't reproduce till
Monday, but can you show us the output of netstat -uan and
netstat -s on both domains? Is there stuff in the receive
or send queues? And was all the udp traffic going to the
same port? i.e. any successful udp traffic to another
endpoint?
> I then start a UDP server in D0, and a traffic generator in D1. After
> the traffic generator sends its 128-th packet, networking between the
> domains fails. The 128th packet is received successfully by the UDP
> server, but no later traffic arrives in D0. This includes UDP, TCP,
> ICMP, and ARP.
What does ifconfig on dom0 show?
Are there any error messages in /var/log/messages?
> Looking at the interrupt counts in /proc/interrupts, I see that D0 no
> longer receives packets sent by D1. D1, however, does receive packets
> sent by D0. (To be clear, D0->D1 traffic is ICMP ping requests,
> unrelated to the UDP traffic. There is not UDP traffic sent from D0 to D1.)
Is there any other successful traffic from D0 -> D1 (tcp?)
thanks,
Nivedita
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: (repeatable) cross-domain networking failure
[not found] ` <e15e04f905011611313312b9f4@mail.gmail.com>
@ 2005-01-16 20:49 ` mukesh agrawal
2005-01-16 21:09 ` Keir Fraser
0 siblings, 1 reply; 18+ messages in thread
From: mukesh agrawal @ 2005-01-16 20:49 UTC (permalink / raw)
To: xen-devel, Nivedita Singhvi
[-- Attachment #1: Type: TEXT/PLAIN, Size: 5289 bytes --]
Nivedita Singhvi <niv@us.ibm.com> wrote:
> I don't have boxes at the moment and can't reproduce till
> Monday, but can you show us the output of netstat -uan and
> netstat -s on both domains? Is there stuff in the receive
> or send queues?
The detailed output of netstat follows. But their is neither anything in
the send queue on domU, nor anything in the receive queue on dom0. (The
UDP server in question is running on port 2000.)
On dom0:
$ netstat -uan
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
udp 0 0 0.0.0.0:1024 0.0.0.0:*
udp 0 0 0.0.0.0:2049 0.0.0.0:*
udp 0 0 0.0.0.0:514 0.0.0.0:*
udp 0 0 0.0.0.0:1027 0.0.0.0:*
udp 0 0 155.98.36.34:1028 155.98.32.70:8509 ESTABLISHED
udp 0 0 0.0.0.0:775 0.0.0.0:*
udp 0 0 0.0.0.0:653 0.0.0.0:*
udp 0 0 192.168.0.1:2000 192.168.1.1:1024 ESTABLISHED
udp 0 0 224.4.0.1:2917 0.0.0.0:*
udp 0 0 224.4.0.1:2917 0.0.0.0:*
udp 0 0 224.4.0.1:2917 0.0.0.0:*
udp 0 0 0.0.0.0:111 0.0.0.0:*
udp 0 0 0.0.0.0:759 0.0.0.0:*
On domU:
# netstat -uan
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
udp 0 0 192.168.1.1:1024 192.168.0.1:2000 ESTABLISHED
The netstat -s output is a bit long, so I've attached those, instead of
including them inline.
> And was all the udp traffic going to the same port? i.e. any successful
> udp traffic to another endpoint?
All the traffic was going to port 2000. Trying to send UDP traffic from
domU to a different port in dom0 (after the networking failure) does not
succeed. (If you're asking if traffic could be sent to multiple ports
while the networking is functional, I believe the answer is yes, but would
double check.)
> What does ifconfig on dom0 show?
> Are there any error messages in /var/log/messages?
$ ifconfig vif1.0
vif1.0 Link encap:Ethernet HWaddr AA:00:01:7B:92:C2
inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:134 errors:0 dropped:0 overruns:0 frame:0
TX packets:16 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:5884 (5.7 Kb) TX bytes:676 (676.0 b)
$ sudo tail /var/log/messages
Jan 16 19:34:09 node1 ntpd[993]: kernel time sync disabled 0041
Jan 16 19:35:15 node1 ntpd[993]: kernel time sync enabled 0001
Jan 16 19:39:29 node1 ntpd[993]: synchronized to 155.98.33.74, stratum=2
Jan 16 19:49:07 node1 ntpd[993]: time correction of -18001 seconds exceeds
sanity limit (1000); set clock manually to the correct UTC time.
Jan 16 19:59:15 node1 sshd(pam_unix)[1457]: session opened for user mukesh
by (uid=30245)
Jan 16 19:59:18 node1 sshd(pam_unix)[1486]: session opened for user mukesh
by (uid=30245)
Jan 16 19:59:30 node1 sshd(pam_unix)[1517]: session opened for user mukesh
by (uid=30245)
Jan 16 20:09:29 node1 modprobe: modprobe: Can't open dependencies file
/lib/modules/2.4.27-xen0/modules.dep (No such file or directory)
Jan 16 20:09:44 node1 last message repeated 2 times
Jan 16 20:16:02 node1 kernel: device vif1.0 entered promiscuous mode
>> Looking at the interrupt counts in /proc/interrupts, I see that D0 no
>> longer receives packets sent by D1. D1, however, does receive packets
>> sent by D0. (To be clear, D0->D1 traffic is ICMP ping requests,
>> unrelated to the UDP traffic. There is not UDP traffic sent from D0 to D1.)
>
> Is there any other successful traffic from D0 -> D1 (tcp?)
Any traffic is successful from D0->D1, even after the network stops
working. This includes ICMP, UDP, and TCP. (Sorry if my comment about
"There is not UDP traffic sent from D0 to D1" was confusing. What I meant
was that I wasn't sending and UDP traffic from D0 to D1. Not that such
traffic fails.)
This is subject to the limitation mentioned in my first message. Namely,
that dom0's ARP cache entry for domU eventually times out. At that point,
dom0 attempts to ARP for domU's MAC. domU sees this, and replies (as seen
by tcpdump on domU). But dom0 never gets the ARP replies, so eventually
D0->D1 traffic fails as well. (E.g. "telnet 192.168.1.1" returns "No route
to host".)
Also, let me add some more detail to my original report:
1. The networking fails after the 128th UDP packet received in dom0, even
if I restart domU. Specifically:
- If I send one UDP packet from domU to dom0, shut down domU, and
start a fresh domU, then I can only send 127 (rather than
128) UDP packets from the new domU before networking will fail.
- If I shut down domU after the networking failure, and start a
new domU, networking between the new domU and dom0 does not
work.
2. The server run in dom0 is
nc -l -u -p 2000
3. The traffic generator run in domU is
i=0; while true; do
((++i)); echo $i
echo $i | nc -u -w 1 192.168.0.1 2000
done &
thanks,
mukesh
[-- Attachment #2: netstat -s for domain0 --]
[-- Type: TEXT/plain, Size: 2109 bytes --]
$ netstat -s
Ip:
177642 total packets received
0 forwarded
0 incoming packets discarded
177538 incoming packets delivered
98742 requests sent out
Icmp:
0 ICMP messages received
0 input ICMP message failed.
ICMP input histogram:
0 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
Tcp:
86 active connections openings
10 passive connection openings
0 failed connection attempts
0 connection resets received
20 connections established
177122 segments received
98563 segments send out
0 segments retransmited
0 bad segments received.
0 resets sent
Udp:
290 packets received
0 packets to unknown port received.
0 packet receive errors
179 packets sent
TcpExt:
ArpFilter: 0
65 TCP sockets finished time wait in fast timer
1011 delayed acks sent
1 delayed acks further delayed because of locked socket
94 packets directly queued to recvmsg prequeue.
1644 packets directly received from backlog
4038 packets directly received from prequeue
170004 packets header predicted
83 packets header predicted and directly queued to user
TCPPureAcks: 2720
TCPHPAcks: 1773
TCPRenoRecovery: 0
TCPSackRecovery: 0
TCPSACKReneging: 0
TCPFACKReorder: 0
TCPSACKReorder: 0
TCPRenoReorder: 0
TCPTSReorder: 0
TCPFullUndo: 0
TCPPartialUndo: 0
TCPDSACKUndo: 0
TCPLossUndo: 0
TCPLoss: 0
TCPLostRetransmit: 0
TCPRenoFailures: 0
TCPSackFailures: 0
TCPLossFailures: 0
TCPFastRetrans: 0
TCPForwardRetrans: 0
TCPSlowStartRetrans: 0
TCPTimeouts: 0
TCPRenoRecoveryFail: 0
TCPSackRecoveryFail: 0
TCPSchedulerFailed: 0
TCPRcvCollapsed: 0
TCPDSACKOldSent: 0
TCPDSACKOfoSent: 0
TCPDSACKRecv: 0
TCPDSACKOfoRecv: 0
TCPAbortOnSyn: 0
TCPAbortOnData: 0
TCPAbortOnClose: 0
TCPAbortOnMemory: 0
TCPAbortOnTimeout: 0
TCPAbortOnLinger: 0
TCPAbortFailed: 0
TCPMemoryPressures: 0
[-- Attachment #3: netstat -s for domain1 --]
[-- Type: TEXT/plain, Size: 1717 bytes --]
# netstat -s
Ip:
4 total packets received
0 forwarded
0 incoming packets discarded
4 incoming packets delivered
275 requests sent out
Icmp:
0 ICMP messages received
0 input ICMP message failed.
ICMP input histogram:
0 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
Tcp:
0 active connections openings
0 passive connection openings
0 failed connection attempts
0 connection resets received
0 connections established
0 segments received
0 segments send out
0 segments retransmited
0 bad segments received.
0 resets sent
Udp:
4 packets received
0 packets to unknown port received.
0 packet receive errors
275 packets sent
TcpExt:
ArpFilter: 0
0 packets header predicted
TCPPureAcks: 0
TCPHPAcks: 0
TCPRenoRecovery: 0
TCPSackRecovery: 0
TCPSACKReneging: 0
TCPFACKReorder: 0
TCPSACKReorder: 0
TCPRenoReorder: 0
TCPTSReorder: 0
TCPFullUndo: 0
TCPPartialUndo: 0
TCPDSACKUndo: 0
TCPLossUndo: 0
TCPLoss: 0
TCPLostRetransmit: 0
TCPRenoFailures: 0
TCPSackFailures: 0
TCPLossFailures: 0
TCPFastRetrans: 0
TCPForwardRetrans: 0
TCPSlowStartRetrans: 0
TCPTimeouts: 0
TCPRenoRecoveryFail: 0
TCPSackRecoveryFail: 0
TCPSchedulerFailed: 0
TCPRcvCollapsed: 0
TCPDSACKOldSent: 0
TCPDSACKOfoSent: 0
TCPDSACKRecv: 0
TCPDSACKOfoRecv: 0
TCPAbortOnSyn: 0
TCPAbortOnData: 0
TCPAbortOnClose: 0
TCPAbortOnMemory: 0
TCPAbortOnTimeout: 0
TCPAbortOnLinger: 0
TCPAbortFailed: 0
TCPMemoryPressures: 0
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: (repeatable) cross-domain networking failure
2005-01-16 20:49 ` mukesh agrawal
@ 2005-01-16 21:09 ` Keir Fraser
2005-01-16 21:56 ` mukesh agrawal
0 siblings, 1 reply; 18+ messages in thread
From: Keir Fraser @ 2005-01-16 21:09 UTC (permalink / raw)
To: mukesh agrawal; +Cc: xen-devel, Nivedita Singhvi
> Also, let me add some more detail to my original report:
>
> 1. The networking fails after the 128th UDP packet received in dom0, even
> if I restart domU. Specifically:
>
> - If I send one UDP packet from domU to dom0, shut down domU, and
> start a fresh domU, then I can only send 127 (rather than
> 128) UDP packets from the new domU before networking will fail.
>
> - If I shut down domU after the networking failure, and start a
> new domU, networking between the new domU and dom0 does not
> work.
>
This corroborates my intial guess that the backend driver (in DOM0) is
sending the packets into the DOM0 networking layer, and never hearing
back when the packet is freed. Normally this would trigger a response
to be sent back to the domU and resources in the backend driver would
get freed up. This isn't happening and you eventually hit a limit on
the number of packets that the driver will simultaneously put in
flight.
Either those UDP packets are queued up somewhere in the DOM0 network
stack, or the destructor callback is not getting called for some
reason or has got overwritten(!).
-- Keir
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: (repeatable) cross-domain networking failure
2005-01-16 21:09 ` Keir Fraser
@ 2005-01-16 21:56 ` mukesh agrawal
0 siblings, 0 replies; 18+ messages in thread
From: mukesh agrawal @ 2005-01-16 21:56 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel, Nivedita Singhvi
On Sun, 16 Jan 2005, Keir Fraser wrote:
> This corroborates my intial guess that the backend driver (in DOM0) is
> sending the packets into the DOM0 networking layer, and never hearing
> back when the packet is freed. Normally this would trigger a response
> to be sent back to the domU and resources in the backend driver would
> get freed up. This isn't happening and you eventually hit a limit on
> the number of packets that the driver will simultaneously put in
> flight.
When you say "resources in the backend driver would get freed up", that's
the domU (sender) backend driver?
> Either those UDP packets are queued up somewhere in the DOM0 network
> stack, or the destructor callback is not getting called for some
> reason or has got overwritten(!).
Well, the packets aren't stuck in the dom0 network stack... They get
delivered all the way up to the application just fine (nc in the trivial
test case). So I think it must be the latter... After delivering the UDP
packet to the application, the destructor is not being called back.
Further, this seems to be specific to the receive path for packets
delivered to userspace (since traffic to the kernel NFS server doesn't
seem to trigger it, nor traffic to closed ports).
What (specific source files or documentation) would you suggest starting
at, to see an example of how the destruction is supposed to be done? I
guess the TCP receive code works properly, so maybe I should compare that
to the UDP code?
Thanks,
mukesh
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: (repeatable) cross-domain networking failure
@ 2005-01-16 22:52 Ian Pratt
2005-01-16 22:57 ` mukesh agrawal
0 siblings, 1 reply; 18+ messages in thread
From: Ian Pratt @ 2005-01-16 22:52 UTC (permalink / raw)
To: mukesh agrawal, Keir Fraser; +Cc: xen-devel, Nivedita Singhvi
> What (specific source files or documentation) would you
> suggest starting
> at, to see an example of how the destruction is supposed to
> be done? I
> guess the TCP receive code works properly, so maybe I should
> compare that
> to the UDP code?
Have you modified the config of your kernel at all? Can you reproduce
with one of the kernels compiled by us?
To debug this, I'd start off by instrumenting calls to skb_dequeue in
netback's net_rx_action, along with calls to skb_free and __kfree_skb in
skbuff.c
Ian
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: (repeatable) cross-domain networking failure
2005-01-16 22:52 Ian Pratt
@ 2005-01-16 22:57 ` mukesh agrawal
0 siblings, 0 replies; 18+ messages in thread
From: mukesh agrawal @ 2005-01-16 22:57 UTC (permalink / raw)
To: Ian Pratt; +Cc: Keir Fraser, xen-devel, Nivedita Singhvi
On Sun, 16 Jan 2005, Ian Pratt wrote:
> Have you modified the config of your kernel at all? Can you reproduce
> with one of the kernels compiled by us?
Yep. I've experienced these hangs with the kernels and hypervisor from
the Xen 2.0.3 release.
> To debug this, I'd start off by instrumenting calls to skb_dequeue in
> netback's net_rx_action, along with calls to skb_free and __kfree_skb in
> skbuff.c
Ok, will do.
Thanks,
mukesh
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: (repeatable) cross-domain networking failure
@ 2005-01-17 23:14 Ian Pratt
2005-01-18 2:06 ` Adam Heath
2005-01-18 11:05 ` Keir Fraser
0 siblings, 2 replies; 18+ messages in thread
From: Ian Pratt @ 2005-01-17 23:14 UTC (permalink / raw)
To: Ian Pratt, mukesh agrawal, Keir Fraser; +Cc: xen-devel, Nivedita Singhvi
OK, I have a good handle on the problem with UDP hangs into user-space
of domain 0.
It's down to message size: if the UDP payload size is less than 24
bytes, the buffer is not freed properly. Bizarre, but it explains why
our regression tests weren't picking it up as they all use larger
message sizes.
Anyhow, now we can reproduce, a fix should be forthcoming.
Ian
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: (repeatable) cross-domain networking failure
2005-01-17 23:14 Ian Pratt
@ 2005-01-18 2:06 ` Adam Heath
2005-01-18 11:05 ` Keir Fraser
1 sibling, 0 replies; 18+ messages in thread
From: Adam Heath @ 2005-01-18 2:06 UTC (permalink / raw)
To: Ian Pratt
Cc: mukesh agrawal, Keir Fraser, xen-devel@lists.sourceforge.net,
Nivedita Singhvi
On Mon, 17 Jan 2005, Ian Pratt wrote:
>
> OK, I have a good handle on the problem with UDP hangs into user-space
> of domain 0.
>
> It's down to message size: if the UDP payload size is less than 24
> bytes, the buffer is not freed properly. Bizarre, but it explains why
> our regression tests weren't picking it up as they all use larger
> message sizes.
>
> Anyhow, now we can reproduce, a fix should be forthcoming.
Is it possible for an nfs request/response to be less than 24 bytes in size?
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: (repeatable) cross-domain networking failure
2005-01-17 23:14 Ian Pratt
2005-01-18 2:06 ` Adam Heath
@ 2005-01-18 11:05 ` Keir Fraser
2005-01-18 11:28 ` Keir Fraser
2005-01-19 23:17 ` mukesh agrawal
1 sibling, 2 replies; 18+ messages in thread
From: Keir Fraser @ 2005-01-18 11:05 UTC (permalink / raw)
To: Ian Pratt; +Cc: mukesh agrawal, Keir Fraser, xen-devel, Nivedita Singhvi
>
> OK, I have a good handle on the problem with UDP hangs into user-space
> of domain 0.
>
> It's down to message size: if the UDP payload size is less than 24
> bytes, the buffer is not freed properly. Bizarre, but it explains why
> our regression tests weren't picking it up as they all use larger
> message sizes.
>
> Anyhow, now we can reproduce, a fix should be forthcoming.
>
> Ian
>
\x1f -=- MIME -=- \x1f\f
OK, I have a good handle on the problem with UDP hangs into user-space
of domain 0.
It's down to message size: if the UDP payload size is less than 24
bytes, the buffer is not freed properly. Bizarre, but it explains why
our regression tests weren't picking it up as they all use larger
message sizes.
Anyhow, now we can reproduce, a fix should be forthcoming.
Ian
This bug is now (hopefully) fixed in the testing and unstable trees.
-- Keir
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: (repeatable) cross-domain networking failure
2005-01-18 11:05 ` Keir Fraser
@ 2005-01-18 11:28 ` Keir Fraser
2005-01-18 16:04 ` Nivedita Singhvi
2005-01-20 19:11 ` Adam Heath
2005-01-19 23:17 ` mukesh agrawal
1 sibling, 2 replies; 18+ messages in thread
From: Keir Fraser @ 2005-01-18 11:28 UTC (permalink / raw)
To: xen-devel
> OK, I have a good handle on the problem with UDP hangs into user-space
> of domain 0.
>
> It's down to message size: if the UDP payload size is less than 24
> bytes, the buffer is not freed properly. Bizarre, but it explains why
> our regression tests weren't picking it up as they all use larger
> message sizes.
>
> Anyhow, now we can reproduce, a fix should be forthcoming.
>
> Ian
This bug is now (hopefully) fixed in the testing and unstable trees.
-- Keir
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: (repeatable) cross-domain networking failure
2005-01-18 11:28 ` Keir Fraser
@ 2005-01-18 16:04 ` Nivedita Singhvi
2005-01-20 19:11 ` Adam Heath
1 sibling, 0 replies; 18+ messages in thread
From: Nivedita Singhvi @ 2005-01-18 16:04 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel
Keir Fraser wrote:
>>Anyhow, now we can reproduce, a fix should be forthcoming.
>>
>>Ian
>
>
> This bug is now (hopefully) fixed in the testing and unstable trees.
Many thanks, Ian and Keir!
I know this was recently mentioned on a thread but I'm unable
to remember or locate it - but are your regression tests
available publicly? I'm currently assisting some engineers
to put some automated testing for this internally. The small
message test (a netperf with msg size going from say 1 byte
in steps to > ~64K) is very handy indeed, it has often
exposed problems. We'd be glad to throw some tests at you
as well.
thanks,
Nivedita
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: (repeatable) cross-domain networking failure
2005-01-18 11:05 ` Keir Fraser
2005-01-18 11:28 ` Keir Fraser
@ 2005-01-19 23:17 ` mukesh agrawal
1 sibling, 0 replies; 18+ messages in thread
From: mukesh agrawal @ 2005-01-19 23:17 UTC (permalink / raw)
To: Keir Fraser; +Cc: Ian Pratt, xen-devel
On Tue, 18 Jan 2005, Keir Fraser wrote:
> This bug is now (hopefully) fixed in the testing and unstable trees.
Yep, works for me now.
Thanks!
-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: (repeatable) cross-domain networking failure
2005-01-18 11:28 ` Keir Fraser
2005-01-18 16:04 ` Nivedita Singhvi
@ 2005-01-20 19:11 ` Adam Heath
1 sibling, 0 replies; 18+ messages in thread
From: Adam Heath @ 2005-01-20 19:11 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel@lists.sourceforge.net
On Tue, 18 Jan 2005, Keir Fraser wrote:
>
> > OK, I have a good handle on the problem with UDP hangs into user-space
> > of domain 0.
> >
> > It's down to message size: if the UDP payload size is less than 24
> > bytes, the buffer is not freed properly. Bizarre, but it explains why
> > our regression tests weren't picking it up as they all use larger
> > message sizes.
> >
> > Anyhow, now we can reproduce, a fix should be forthcoming.
> >
> > Ian
>
> This bug is now (hopefully) fixed in the testing and unstable trees.
Does this bug exist in the stable(2.0) tree?
-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: (repeatable) cross-domain networking failure
@ 2005-01-20 22:08 Ian Pratt
0 siblings, 0 replies; 18+ messages in thread
From: Ian Pratt @ 2005-01-20 22:08 UTC (permalink / raw)
To: Adam Heath, Keir Fraser; +Cc: xen-devel
> > > It's down to message size: if the UDP payload size is less than
24
> > > bytes, the buffer is not freed properly. Bizarre, but it
> explains why
> > > our regression tests weren't picking it up as they all use larger
> > > message sizes.
> > >
> > > Anyhow, now we can reproduce, a fix should be forthcoming.
> > >
> > > Ian
> >
> > This bug is now (hopefully) fixed in the testing and unstable trees.
>
> Does this bug exist in the stable(2.0) tree?
Yes - it will be fixed in 2.0.4. It was pretty obscure (having been in
there ever since 1.3) so we're not rushing head long to doing a new
release.
Ian
-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2005-01-20 22:08 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-15 16:40 (repeatable) cross-domain networking failure mukesh agrawal
2005-01-15 17:04 ` Keir Fraser
[not found] ` <e15e04f9050115091893409f1@mail.gmail.com>
2005-01-15 17:26 ` Fwd: " mukesh agrawal
2005-01-15 21:14 ` Nivedita Singhvi
[not found] ` <e15e04f905011611313312b9f4@mail.gmail.com>
2005-01-16 20:49 ` mukesh agrawal
2005-01-16 21:09 ` Keir Fraser
2005-01-16 21:56 ` mukesh agrawal
-- strict thread matches above, loose matches on Subject: below --
2005-01-20 22:08 Ian Pratt
2005-01-17 23:14 Ian Pratt
2005-01-18 2:06 ` Adam Heath
2005-01-18 11:05 ` Keir Fraser
2005-01-18 11:28 ` Keir Fraser
2005-01-18 16:04 ` Nivedita Singhvi
2005-01-20 19:11 ` Adam Heath
2005-01-19 23:17 ` mukesh agrawal
2005-01-16 22:52 Ian Pratt
2005-01-16 22:57 ` mukesh agrawal
2005-01-15 1:38 mukesh agrawal
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.