n/w performance degradation

All of lore.kernel.org
 help / color / mirror / Atom feed

* n/w performance degradation
@ 2005-12-05 23:11 Diwaker Gupta
  2005-12-05 23:23 ` Keir Fraser
  2005-12-06  0:09 ` Nivedita Singhvi
  0 siblings, 2 replies; 14+ messages in thread
From: Diwaker Gupta @ 2005-12-05 23:11 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 1058 bytes --]

Hi folks,

I was to post this over the weekend, but didn't get around to it, and
look in the meanwhile Xen 3.0.0 was released! Good work all around,
but I got some performance problems to report :)

I'm running Changeset 00c349d5b40d269da4fec9510f1dd7c6bb3b3327. This
is a dual CPU machine but I'm currently running with noht, nosmp. All
tests are done using iperf -- both endpoints are sitting on the same
switch on our cluster. The machines have Broadcom BCM5704 NICs.

N/W performance from dom0 seems fine (though I used to get 930+ until
a few days back):

[  6]  0.0-20.0 sec  1.95 GBytes    835 Mbits/sec

However, from a VM, the throughput is really bad:

[  5]  0.0-20.0 sec  1.05 GBytes    450 Mbits/sec

The above numbers are using the BVT scheduler. With the SEDF
scheduler, the numbers are even worse (a VM can't get more than
300Mbps in my tests). I can post concrete figures if people are
interested. I'm _not_ running pipelined netback.

Is anyone else observing such performance problems?

Diwaker
--
Web/Blog/Gallery: http://floatingsun.net

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: n/w performance degradation
  2005-12-05 23:11 n/w performance degradation Diwaker Gupta
@ 2005-12-05 23:23 ` Keir Fraser
  2005-12-06  0:04   ` Diwaker Gupta
  2005-12-06  0:09 ` Nivedita Singhvi
  1 sibling, 1 reply; 14+ messages in thread
From: Keir Fraser @ 2005-12-05 23:23 UTC (permalink / raw)
  To: Diwaker Gupta; +Cc: xen-devel

On 5 Dec 2005, at 23:11, Diwaker Gupta wrote:

> N/W performance from dom0 seems fine (though I used to get 930+ until
> a few days back):
>
> [  6]  0.0-20.0 sec  1.95 GBytes    835 Mbits/sec
>
> However, from a VM, the throughput is really bad:
>
> [  5]  0.0-20.0 sec  1.05 GBytes    450 Mbits/sec
>
> The above numbers are using the BVT scheduler. With the SEDF
> scheduler, the numbers are even worse (a VM can't get more than
> 300Mbps in my tests). I can post concrete figures if people are
> interested. I'm _not_ running pipelined netback.

We used to be able to saturate GigE with a single CPU, although 
admittedly burning quite a bit more CPU than using dom0 as the 
endpoint. I guess things have got out of tune, but there are a bunch of 
things we could do to encourage I/O batching ('x packets or y 
milliseconds' style receive batching, and transmitting batches of 
packets every x milliseconds or when the domU goes idle). This, 
together with scheduler tuning, should definitely get the performance 
back, although its a balancing act with one CPU to ensure no stage of 
the I/O processing pipeline gets starved.

  -- Keir

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: n/w performance degradation
  2005-12-05 23:23 ` Keir Fraser
@ 2005-12-06  0:04   ` Diwaker Gupta
  2005-12-06 11:20     ` Keir Fraser
  0 siblings, 1 reply; 14+ messages in thread
From: Diwaker Gupta @ 2005-12-06  0:04 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 1285 bytes --]

> We used to be able to saturate GigE with a single CPU, although

Same here.

> admittedly burning quite a bit more CPU than using dom0 as the
> endpoint. I guess things have got out of tune, but there are a bunch of
> things we could do to encourage I/O batching ('x packets or y
> milliseconds' style receive batching, and transmitting batches of
> packets every x milliseconds or when the domU goes idle). This,
> together with scheduler tuning, should definitely get the performance
> back, although its a balancing act with one CPU to ensure no stage of
> the I/O processing pipeline gets starved.

Look forward to it. Whats the deal with the pipelined backend? Whats
the target scenario there?

Meanwhile, though I agree that SMPs and hyperthreaded processors are
becoming the norm, it still doesn't solve this problem. Even on an SMP
machine, I can have dom0 co-located with a VM on the same CPU, and I'm
not sure how different that would be from the current situation.

On a related note, has anyone been working on the IDD stuff? Is it
possible to wrap up a device driver in its own domain? The last time 
I tried, it basically wasn't possible, but I'd really be interested in
helping out any which way to get it working.

Diwaker
--
Web/Blog/Gallery: http://floatingsun.net

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: n/w performance degradation
  2005-12-06  0:04   ` Diwaker Gupta
@ 2005-12-06 11:20     ` Keir Fraser
  0 siblings, 0 replies; 14+ messages in thread
From: Keir Fraser @ 2005-12-06 11:20 UTC (permalink / raw)
  To: Diwaker Gupta; +Cc: xen-devel

On 6 Dec 2005, at 00:04, Diwaker Gupta wrote:

 > Look forward to it. Whats the deal with the pipelined backend? Whats
 > the target scenario there?

It's a form of request-notification avoidance. It allows the frontend 
to avoid sending a notification (virtual ipi) to the backend if any 
previous requests are still in flight (no response received). In some 
situations, pipelining can mean you basically need to send no 
notifications, because the backend will pull down new requests as it 
responds to old ones.

Thing is, it doesn;t work for some forms of packet processing in 
domain0. For example, if you send fragmented IP datagrams and they are 
reassembled in domain0 there may be a dependency between packets. Hence 
you need to send notifications even if buffers are in flight because 
you won't get a response until the datagram is reassembled and 
forwarded.

> Meanwhile, though I agree that SMPs and hyperthreaded processors are
> becoming the norm, it still doesn't solve this problem. Even on an SMP
> machine, I can have dom0 co-located with a VM on the same CPU, and I'm
> not sure how different that would be from the current situation.

With enough hardware contexts, it'll soon become sensible to dedicate a 
context to domain0 or your IDD. It's certainly a very sensible use of a 
hyperthread (almost certainly a better use than multiprocessing within 
a guest, if you do a significant amount of i/o).

> On a related note, has anyone been working on the IDD stuff? Is it
> possible to wrap up a device driver in its own domain? The last time
> I tried, it basically wasn't possible, but I'd really be interested in
> helping out any which way to get it working.

Needs some PCI virtualization in Xen (managed by platform code in 
domain0). Nothing too tricky I think.

  -- Keir

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: n/w performance degradation
  2005-12-05 23:11 n/w performance degradation Diwaker Gupta
  2005-12-05 23:23 ` Keir Fraser
@ 2005-12-06  0:09 ` Nivedita Singhvi
  2005-12-06  1:32   ` Diwaker Gupta
  1 sibling, 1 reply; 14+ messages in thread
From: Nivedita Singhvi @ 2005-12-06  0:09 UTC (permalink / raw)
  To: Diwaker Gupta; +Cc: xen-devel

Diwaker Gupta wrote:

> Hi folks,
> 
> I was to post this over the weekend, but didn't get around to it, and
> look in the meanwhile Xen 3.0.0 was released! Good work all around,
> but I got some performance problems to report :)
> 
> I'm running Changeset 00c349d5b40d269da4fec9510f1dd7c6bb3b3327. This
> is a dual CPU machine but I'm currently running with noht, nosmp. All
> tests are done using iperf -- both endpoints are sitting on the same
> switch on our cluster. The machines have Broadcom BCM5704 NICs.
> 
> N/W performance from dom0 seems fine (though I used to get 930+ until
> a few days back):
> 
> [  6]  0.0-20.0 sec  1.95 GBytes    835 Mbits/sec
> 
> However, from a VM, the throughput is really bad:
> 
> [  5]  0.0-20.0 sec  1.05 GBytes    450 Mbits/sec
> 
> The above numbers are using the BVT scheduler. With the SEDF
> scheduler, the numbers are even worse (a VM can't get more than
> 300Mbps in my tests). I can post concrete figures if people are
> interested. I'm _not_ running pipelined netback.
> 
> Is anyone else observing such performance problems?

Diwaker,

Can you run oprofile and obtain any kind of breakdown
on that, if possible? Also, are you dedicating individual
CPUs to dom0 and guest? (Avoiding much of the context
switching on I/O). What are your memory allocations? How much
of a bump do you get if you increase memory?

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: n/w performance degradation
  2005-12-06  0:09 ` Nivedita Singhvi
@ 2005-12-06  1:32   ` Diwaker Gupta
  2005-12-06  2:15     ` Nivedita Singhvi
  0 siblings, 1 reply; 14+ messages in thread
From: Diwaker Gupta @ 2005-12-06  1:32 UTC (permalink / raw)
  To: Nivedita Singhvi; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 903 bytes --]

> Diwaker,
>
> Can you run oprofile and obtain any kind of breakdown
> on that, if possible?

Not today. Unfortunately I'm going to be offline for the next couple
of weeks. Anyhow, I'll try to see if I can post some more data in the
interim.

> Also, are you dedicating individual
> CPUs to dom0 and guest? (Avoiding much of the context
> switching on I/O)

Like I mentioned in my email, everything is on the same CPU right now
(nosmp, noht)

> What are your memory allocations? How much
> of a bump do you get if you increase memory?

Currently, both dom0 and the vm have 128MB. I rebooted with dom0
having 512MB and VM with 256 MB. Here are the numbers:

dom0:
[  5]  0.0-10.0 sec    987 MBytes    828 Mbits/sec

VM:
[  5]  0.0-10.0 sec    938 MBytes    787 Mbits/sec

So getting slightly better. I haven't run these with SEDF though,
above are using BVT.

--
Web/Blog/Gallery: http://floatingsun.net

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: n/w performance degradation
  2005-12-06  1:32   ` Diwaker Gupta
@ 2005-12-06  2:15     ` Nivedita Singhvi
  0 siblings, 0 replies; 14+ messages in thread
From: Nivedita Singhvi @ 2005-12-06  2:15 UTC (permalink / raw)
  To: Diwaker Gupta; +Cc: xen-devel

Diwaker Gupta wrote:

> Not today. Unfortunately I'm going to be offline for the next couple
> of weeks. Anyhow, I'll try to see if I can post some more data in the
> interim.

No problem, thanks for running your tests again..

>>What are your memory allocations? How much
>>of a bump do you get if you increase memory?
> 
> 
> Currently, both dom0 and the vm have 128MB. I rebooted with dom0
> having 512MB and VM with 256 MB. Here are the numbers:
> 
> dom0:
> [  5]  0.0-10.0 sec    987 MBytes    828 Mbits/sec
> 
> VM:
> [  5]  0.0-10.0 sec    938 MBytes    787 Mbits/sec
> 
> So getting slightly better. I haven't run these with SEDF though,
> above are using BVT.

Yep, the  bigger difference in Xen 3.0 is that it's
just more sensitive to memory - when I tested
earlier in the summer most of the difference
between dom0 and domU could be gained back by
350MB dom0 and 512MB domUs doing heavy network
traffic.

So you have:

128MB domU - 450 Mbits/sec
250MB domU - 787 Mbits/sec

Which is roughly the same amount off the linear that I
recall..

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: n/w performance degradation
@ 2005-12-05 23:49 Ian Pratt
  2005-12-06  0:06 ` Diwaker Gupta
  0 siblings, 1 reply; 14+ messages in thread
From: Ian Pratt @ 2005-12-05 23:49 UTC (permalink / raw)
  To: Diwaker Gupta, xen-devel

> I'm running Changeset 
> 00c349d5b40d269da4fec9510f1dd7c6bb3b3327. This is a dual CPU 
> machine but I'm currently running with noht, nosmp. All tests 
> are done using iperf -- both endpoints are sitting on the 
> same switch on our cluster. The machines have Broadcom BCM5704 NICs.
> 
> N/W performance from dom0 seems fine (though I used to get 
> 930+ until a few days back):
> 
> [  6]  0.0-20.0 sec  1.95 GBytes    835 Mbits/sec
> 
> However, from a VM, the throughput is really bad:
> 
> [  5]  0.0-20.0 sec  1.05 GBytes    450 Mbits/sec
> 
> The above numbers are using the BVT scheduler. With the SEDF 
> scheduler, the numbers are even worse (a VM can't get more 
> than 300Mbps in my tests). I can post concrete figures if 
> people are interested. I'm _not_ running pipelined netback.
> 
> Is anyone else observing such performance problems?

We haven't really done much tuing for the single CPU case recently as
the vast majority of platforms that Xen is used on are either
hyperthreaded, dual core or SMP.

The main focus of the 3.0.0 release has been corectness rather than
performance tuning. We plan to do some tweaking over the coming weeks to
address this. We used to get 900Mb/s with a single CPU core, and there's
absoloutely no reason why we shouldn't do so again -- in fact, we should
do better in terms of CPU usage than 2.0 as as we now have checksum
offload.

Now we have great performance monitoring tools like xen-oprofile,
xenperf, xenmon etc it should be wuite straightforward to optimize
things.  Let's just wait until we've delt with any critical bugs arising
from the release...

Ian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: n/w performance degradation
  2005-12-05 23:49 Ian Pratt
@ 2005-12-06  0:06 ` Diwaker Gupta
  0 siblings, 0 replies; 14+ messages in thread
From: Diwaker Gupta @ 2005-12-06  0:06 UTC (permalink / raw)
  To: Ian Pratt; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 830 bytes --]

> The main focus of the 3.0.0 release has been corectness rather than
> performance tuning. We plan to do some tweaking over the coming weeks to
> address this. We used to get 900Mb/s with a single CPU core, and there's
> absoloutely no reason why we shouldn't do so again -- in fact, we should
> do better in terms of CPU usage than 2.0 as as we now have checksum
> offload.

I completely agree. Now that 3.0.0 is out of the door, it should be
easier to focus on performance, rather than functionality and
correctness.

> Now we have great performance monitoring tools like xen-oprofile,
> xenperf, xenmon etc it should be wuite straightforward to optimize
> things.  Let's just wait until we've delt with any critical bugs arising
> from the release...

Sounds good, thanks.

Diwaker
--
Web/Blog/Gallery: http://floatingsun.net

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: n/w performance degradation
@ 2005-12-06  1:42 Ian Pratt
  2005-12-06  2:07 ` Diwaker Gupta
  2005-12-06  2:26 ` Nivedita Singhvi
  0 siblings, 2 replies; 14+ messages in thread
From: Ian Pratt @ 2005-12-06  1:42 UTC (permalink / raw)
  To: Diwaker Gupta, Nivedita Singhvi; +Cc: xen-devel

 > > What are your memory allocations? How much of a bump do you 
> get if you 
> > increase memory?
> 
> Currently, both dom0 and the vm have 128MB. I rebooted with 
> dom0 having 512MB and VM with 256 MB. Here are the numbers:
> 
> dom0:
> [  5]  0.0-10.0 sec    987 MBytes    828 Mbits/sec
> 
> VM:
> [  5]  0.0-10.0 sec    938 MBytes    787 Mbits/sec
> 
> So getting slightly better. I haven't run these with SEDF 
> though, above are using BVT.

Ah, I expect I know what's going on here.

Linux sizes the default socket buffer size based on how much 'system'
memory it has.

With a 128MB domU it probably defaults to just 64K. For 256MB it
probably steps up to 128KB. You can prove this by setting
/proc/sys/net/core/{r,w}mem_{max,default}.

For a gigabit network you need at least 128KB.

Ian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: n/w performance degradation
  2005-12-06  1:42 Ian Pratt
@ 2005-12-06  2:07 ` Diwaker Gupta
  2005-12-06  2:26 ` Nivedita Singhvi
  1 sibling, 0 replies; 14+ messages in thread
From: Diwaker Gupta @ 2005-12-06  2:07 UTC (permalink / raw)
  To: Ian Pratt; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 475 bytes --]

> Ah, I expect I know what's going on here.
>
> Linux sizes the default socket buffer size based on how much 'system'
> memory it has.

Yes it does. But I've managed to saturate gigabit links from my VMs
with the _exact_ same configuration earlier, hence my original email.

> For a gigabit network you need at least 128KB.

These are extremely low latency networks, so the requirements might
actually be lower than this.

Diwaker
--
Web/Blog/Gallery: http://floatingsun.net

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: n/w performance degradation
  2005-12-06  1:42 Ian Pratt
  2005-12-06  2:07 ` Diwaker Gupta
@ 2005-12-06  2:26 ` Nivedita Singhvi
  2005-12-06  8:02   ` Diwaker Gupta
  1 sibling, 1 reply; 14+ messages in thread
From: Nivedita Singhvi @ 2005-12-06  2:26 UTC (permalink / raw)
  To: Ian Pratt; +Cc: Diwaker Gupta, xen-devel

Ian Pratt wrote:

>  > > What are your memory allocations? How much of a bump do you 
> 
>>get if you 
>>
>>>increase memory?
>>
>>Currently, both dom0 and the vm have 128MB. I rebooted with 
>>dom0 having 512MB and VM with 256 MB. Here are the numbers:
>>
>>dom0:
>>[  5]  0.0-10.0 sec    987 MBytes    828 Mbits/sec
>>
>>VM:
>>[  5]  0.0-10.0 sec    938 MBytes    787 Mbits/sec
>>
>>So getting slightly better. I haven't run these with SEDF 
>>though, above are using BVT.
> 
> 
> Ah, I expect I know what's going on here.
> 
> Linux sizes the default socket buffer size based on how much 'system'
> memory it has.
> 
> With a 128MB domU it probably defaults to just 64K. For 256MB it
> probably steps up to 128KB. You can prove this by setting
> /proc/sys/net/core/{r,w}mem_{max,default}.

For TCP sockets, you'll also have to bump up
net/ipv4/tcp_rmem[1,2] and net/ipv4/tcp_wmem[1,2],
don't forget.

Fow low mem systems, the default size of the tcp read buffer
(tcp_rmem[1]) is 43689, and max size (tcp_rmem[2]) is
2*43689, which is really too low to do network heavy
lifting.

> For a gigabit network you need at least 128KB.
> 
> Ian

At least.

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: n/w performance degradation
  2005-12-06  2:26 ` Nivedita Singhvi
@ 2005-12-06  8:02   ` Diwaker Gupta
  0 siblings, 0 replies; 14+ messages in thread
From: Diwaker Gupta @ 2005-12-06  8:02 UTC (permalink / raw)
  To: Nivedita Singhvi; +Cc: Ian Pratt, xen-devel

[-- Attachment #1: Type: text/plain, Size: 979 bytes --]

> > Ah, I expect I know what's going on here.
> >
> > Linux sizes the default socket buffer size based on how much 'system'
> > memory it has.
> >
> > With a 128MB domU it probably defaults to just 64K. For 256MB it
> > probably steps up to 128KB. You can prove this by setting
> > /proc/sys/net/core/{r,w}mem_{max,default}.
>
> For TCP sockets, you'll also have to bump up
> net/ipv4/tcp_rmem[1,2] and net/ipv4/tcp_wmem[1,2],
> don't forget.
>
> Fow low mem systems, the default size of the tcp read buffer
> (tcp_rmem[1]) is 43689, and max size (tcp_rmem[2]) is
> 2*43689, which is really too low to do network heavy
> lifting.

Just as an aside, I wanted to point out that my dom0's were running at
the exact same configuration (memory, socket sizes) as the VM. And I
can mostly saturate a gig link from dom0. So while socket sizes might
certainly have an impact, there are still additional bottlenecks that
need to be fine tuned.

--
Web/Blog/Gallery: http://floatingsun.net

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: n/w performance degradation
@ 2005-12-06 10:42 Ian Pratt
  0 siblings, 0 replies; 14+ messages in thread
From: Ian Pratt @ 2005-12-06 10:42 UTC (permalink / raw)
  To: Diwaker Gupta, Nivedita Singhvi; +Cc: xen-devel

> > Fow low mem systems, the default size of the tcp read buffer
> > (tcp_rmem[1]) is 43689, and max size (tcp_rmem[2]) is 
> 2*43689, which 
> > is really too low to do network heavy lifting.
> 
> Just as an aside, I wanted to point out that my dom0's were 
> running at the exact same configuration (memory, socket 
> sizes) as the VM. And I can mostly saturate a gig link from 
> dom0. So while socket sizes might certainly have an impact, 
> there are still additional bottlenecks that need to be fine tuned.

Xen is certainly going to be more sensitive to small socket buffer sizes
when you're trying to run dom0 and the guest on the same CPU thread. If
you're running a single TCP connection the socket buffer size basically
determines how frequently you're forced to switch between domains.
Switching every 43KB at 1Gb/s amounts to thoudands of domain switches a
second which burns CPU. Doubling the socket buffer size halves the rate
of domain switches. Under Xen this would be a more sensible default.

Ian

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2005-12-06 11:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-05 23:11 n/w performance degradation Diwaker Gupta
2005-12-05 23:23 ` Keir Fraser
2005-12-06  0:04   ` Diwaker Gupta
2005-12-06 11:20     ` Keir Fraser
2005-12-06  0:09 ` Nivedita Singhvi
2005-12-06  1:32   ` Diwaker Gupta
2005-12-06  2:15     ` Nivedita Singhvi
  -- strict thread matches above, loose matches on Subject: below --
2005-12-05 23:49 Ian Pratt
2005-12-06  0:06 ` Diwaker Gupta
2005-12-06  1:42 Ian Pratt
2005-12-06  2:07 ` Diwaker Gupta
2005-12-06  2:26 ` Nivedita Singhvi
2005-12-06  8:02   ` Diwaker Gupta
2005-12-06 10:42 Ian Pratt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.