Very slow domU network performance

All of lore.kernel.org
 help / color / mirror / Atom feed

* Very slow domU network performance - Moved to xen-devel
       [not found]       ` <BA0FFF2C-49C1-48A7-B0DE-4C8B6DFB5B74@stdout.org>
@ 2006-04-05 17:11         ` Matt Ayres
  2006-04-06  6:40           ` Winston Chang
  0 siblings, 1 reply; 2+ messages in thread
From: Matt Ayres @ 2006-04-05 17:11 UTC (permalink / raw)
  To: Winston Chang; +Cc: soltesz, xen-devel@lists.xensource.com

Winston Chang wrote:
>>> I ran the test with the latest xen-unstable build.  The results are 
>>> the same.
>>> When I ran 'xm sched-sedf 0 0 0 0 1 1' to prevent  domU CPU 
>>> starvation, network performance was good.  The numbers in this case 
>>> are the same as in my other message where I detail the results using 
>>> the week-old xen build -- it could handle 90Mb/s with no datagram 
>>> loss.  So it looks like the checksum patches had no effect on this 
>>> phenomenon; the only thing that mattered was the scheduling.
>>
>> What was the previous weight of domain 0?  What is the weight assigned 
>> to the domU's and do the domU's have bursting enabled?
> 
> I'm not really sure the answer to either of these questions.  The weight 
> is whatever is the default is with Fedora Core 5 and xen-unstable.  I 
> don't know anything about bursting. How do you find out?
> 

I'd like to be corrected if I am wrong, but the last number (weight) is 
set to 0 for all domains by default.  By giving it a value of 1 you are 
giving dom0 more CPU. The second to last number is a boolean that 
decides whether a domain is hard locked to it's weight or if can burst 
using idle CPU cycles.  The 3 before that are generally set to 0 and the 
first number is the domain name.  I do not know of a way to grab the 
weights personally. It is documented in the Xen distribution tgz.

I ran my own tests. I have dom0 with a weight of 512 (double it's memory 
allocation) and each VM also has a weight equal to it's memory 
allocation.  My dom0 can transfer at 10MB/s+ over the LAN, but domU's 
with 100% CPU used on the host could only transfer over the LAN at a 
peak of 800KB/s.  When I gave dom0 a weight of 1 domU transfers 
decreased to a peak of 100KB/s over the "LAN" (quoted because due to 
proxy ARP the host acts as a router)

The problem does not matter if you use bridged or routed mode.

I would have to believe the problem is in the hypervisor itself and 
scheduling and CPU usage greatly affect it.  Network bandwidth should 
not be affected unless wanted (ie. by using the rate vif parameter).

Stephen Soltesz has experienced the same problem and has some graphs to 
back it up.  Stephen, will you share at least that one CPU + IPerf graph 
with the community and perhaps elaborate on your weight configuration 
(if any).

Thank you,
Matt Ayres

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Very slow domU network performance - Moved to xen-devel
  2006-04-05 17:11         ` Very slow domU network performance - Moved to xen-devel Matt Ayres
@ 2006-04-06  6:40           ` Winston Chang
  0 siblings, 0 replies; 2+ messages in thread
From: Winston Chang @ 2006-04-06  6:40 UTC (permalink / raw)
  To: Matt Ayres; +Cc: soltesz, xen-devel@lists.xensource.com

On Apr 5, 2006, at 1:11 PM, Matt Ayres wrote:
> Winston Chang wrote:
>>>> I ran the test with the latest xen-unstable build.  The results  
>>>> are the same.
>>>> When I ran 'xm sched-sedf 0 0 0 0 1 1' to prevent  domU CPU  
>>>> starvation, network performance was good.  The numbers in this  
>>>> case are the same as in my other message where I detail the  
>>>> results using the week-old xen build -- it could handle 90Mb/s  
>>>> with no datagram loss.  So it looks like the checksum patches  
>>>> had no effect on this phenomenon; the only thing that mattered  
>>>> was the scheduling.
>>>
>>> What was the previous weight of domain 0?  What is the weight  
>>> assigned to the domU's and do the domU's have bursting enabled?
>> I'm not really sure the answer to either of these questions.  The  
>> weight is whatever is the default is with Fedora Core 5 and xen- 
>> unstable.  I don't know anything about bursting. How do you find out?
>
> I'd like to be corrected if I am wrong, but the last number  
> (weight) is set to 0 for all domains by default.  By giving it a  
> value of 1 you are giving dom0 more CPU. The second to last number  
> is a boolean that decides whether a domain is hard locked to it's  
> weight or if can burst using idle CPU cycles.  The 3 before that  
> are generally set to 0 and the first number is the domain name.  I  
> do not know of a way to grab the weights personally. It is  
> documented in the Xen distribution tgz.

I can tell you the symptoms I had: whenever a process in dom0 grabs  
100% of the CPU, the domU console freezes.  After a little while, the  
domU console says "BUG: soft lockup detected on CPU#0!"  So I believe  
that with my default settings, dom0 always gets first priority, and  
domU gets the leftovers.

For those that have just seen this (this thread started on xen- 
users): I had very poor UDP performance using iperf with domU as the  
server and dom0 as the client.  I had 99.98% packet loss when running  
at 90Mb/s in this case, until I changed the scheduling, as above.   
Then packet loss dropped to 0.  In the reverse direction there was  
never a problem.

For more details, see the original thread here:
http://lists.xensource.com/archives/html/xen-users/2006-04/msg00096.html

It's possible that iperf is partially at fault here.  (I used version  
1.7.0 since 2.0.2 wouldn't compile on my iBook.)   I noticed that it  
takes 100% of CPU time when it's used as a UDP client, even when  
running at lower speeds -- I saw this at 4Mb/s.  I would wager that  
it uses a while loop to delay between sending datagrams.  Since iperf  
always wants all the CPU cycles and because domU has last priority in  
my default scheduling config, domU just wouldn't get enough CPU time  
to process the incoming datagrams.

A more general note about using iperf:
It seems to me that as long as iperf uses 100% of the CPU, it is not  
a good tool for testing dom0-domU or domU-domU network performance.   
This sort of timing loop would be fine for network tests using "real"  
hosts, but not ones in which CPU resources are shared and network I/O  
is CPU-bound, as is the case here.

I would guess that this would not occur on SMP machines (and maybe  
hyperthreaded ones also), since iperf's timing loop wouly use up only  
one CPU.

The other network issue I had was very slow TCP performance when domU  
was the iperf server and an external machine was the iperf client.  I  
had 2 Mb/s in this case, but about 90Mb/s in the other direction (on  
100Mbit ethernet).  This problem disappeared when I did the  
scheduling change above.

This issue is _not_ explained by the iperf hogging the CPU as I  
mentioned above.  No user-level process in dom0 should be involved;  
dom0 just does some low-level networking.  But if the cause of this  
TCP problem is that dom0 is taking all the CPU resources, then that  
would suggest that somewhere in the xen networking/bridging code, it  
is getting 100% CPU time, just to do bridging for the incoming data.   
Does this indicate a problem in the networking code?

Again, the TCP slowness does not occur in the reverse direction, when  
domU is sending to an external machine.  My guess is that, like the  
iperf UDP issue above, that this problem would not occur on SMP  
machines.

--Winston

> I ran my own tests. I have dom0 with a weight of 512 (double it's  
> memory allocation) and each VM also has a weight equal to it's  
> memory allocation.  My dom0 can transfer at 10MB/s+ over the LAN,  
> but domU's with 100% CPU used on the host could only transfer over  
> the LAN at a peak of 800KB/s.  When I gave dom0 a weight of 1 domU  
> transfers decreased to a peak of 100KB/s over the "LAN" (quoted  
> because due to proxy ARP the host acts as a router)
>
> The problem does not matter if you use bridged or routed mode.
>
> I would have to believe the problem is in the hypervisor itself and  
> scheduling and CPU usage greatly affect it.  Network bandwidth  
> should not be affected unless wanted (ie. by using the rate vif  
> parameter).
>
> Stephen Soltesz has experienced the same problem and has some  
> graphs to back it up.  Stephen, will you share at least that one  
> CPU + IPerf graph with the community and perhaps elaborate on your  
> weight configuration (if any).
>
> Thank you,
> Matt Ayres

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2006-04-06  6:40 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <0EDDFD7D-2C5D-4D47-880D-E7DC268EA149@stdout.org>
     [not found] ` <1144173588.3411.19.camel@orbit.scot.redhat.com>
     [not found]   ` <24147773-D20E-4280-8699-C1C163CAE9CF@stdout.org>
     [not found]     ` <4433EBA2.7000003@tektonic.net>
     [not found]       ` <BA0FFF2C-49C1-48A7-B0DE-4C8B6DFB5B74@stdout.org>
2006-04-05 17:11         ` Very slow domU network performance - Moved to xen-devel Matt Ayres
2006-04-06  6:40           ` Winston Chang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.