netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [lxc-devel] Poor bridging performance on 10 GbE
       [not found] <b30d1c3b0903180221h5175618eue162ffdec3817b4c@mail.gmail.com>
@ 2009-03-18 10:10 ` Daniel Lezcano
  2009-03-18 15:56   ` Ryousei Takano
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Lezcano @ 2009-03-18 10:10 UTC (permalink / raw)
  To: Ryousei Takano; +Cc: lxc-devel, Linux Containers, Linux Netdev List

Ryousei Takano wrote:
> Hi all,
> 
> I am evaluating the networking performance of lxc on 10 Gigabit Ethernet by
> using netperf benchmark.

Thanks for doing benchmarking.
I did two years ago similar tests and there is an analysis of the 
performances at:
http://lxc.sourceforge.net/network/benchs.php

It is not up to date, but that will give you some clues of what is 
happening with this overhead.

> Using a macvlan device, the throughput was 9.6 Gbps. But, using a veth device,
> the throughput was only 2.7 Gbps.

Yeah, definitively the macvlan interfaces is the best in terms of 
performances but with the restriction of not being able to communicate 
between containers on the same hosts.

There are some discussions around that:

http://marc.info/?l=linux-netdev&m=123643508124711&w=2

The veth is a virtual device hence it has not offloading. When the 
packet are sent out, the network stack looks at the nic offloading 
capability which is not present. So the kernel will compute the 
checksums instead of letting the nic to do that either if the packet is 
transmitted through the physical nic. This is a well known issue related 
to network virtualization and xen has developed a specific network driver:
http://www.cse.psu.edu/~bhuvan/teaching/spring06/papers/xen-net-opt.pdf

> I think this is because the overhead of bridging devices is high.

Yes, bridging adds some overhead and AFAIR bridging + netfilter does 
some skb copy.

> I also checked the host OS's performance when I used a veth device.
> I observed a strange phenomenon.
> 
> Before issuing lxc-start command, the throughput was 9.6 Gbps.
> Here is the output of brctl show:
> 	$ brctl show
> 	bridge name	bridge id		STP enabled	interfaces
> 	br0		8000.0060dd470d49	no		eth1
> 
> After issuing lxc-start command, the throughput decreased to 3.2 Gbps.
> Here is the output of brctl show:
> 	$ sudo brctl show
> 	bridge name	bridge id		STP enabled	interfaces
> 	br0		8000.0060dd470d49	no		eth1
> 								veth0_7573
> 
> I wonder why the performance is greatly influenced by adding a veth device
> to a bridge device.

Hmm, good question :)

> Here is my experimental setting:
> 	OS: Ubuntu server 8.10 amd64
> 	Kernel: 2.6.27-rc8 (checkout from the lxc git repository)

I would recommend to use the 2.6.29-rc8 vanilla because this kernel does 
no longer need patches, a lot of fixes were done in the network 
namespace and maybe the bridge has been improved in the meantime :)

> 	Userland tool: 0.6.0
> 	NIC: Myricom Myri-10G
> 
> Any comments and suggestions will be appreciate.
> If this list is not proper place to talk about this problem, can anyone tell me
> the proper one?

The performances question is more related to the network virtualization 
implementation and should be sent to netdev@ and containers@ (added in 
the Cc' of this email), of course people at lxc-devel@ will be 
interested by these aspects, so lxc-devel@ is the right mailing list too.

Thanks for your testings
   -- Daniel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [lxc-devel] Poor bridging performance on 10 GbE
  2009-03-18 10:10 ` [lxc-devel] Poor bridging performance on 10 GbE Daniel Lezcano
@ 2009-03-18 15:56   ` Ryousei Takano
  2009-03-19  0:50     ` Eric W. Biederman
  0 siblings, 1 reply; 6+ messages in thread
From: Ryousei Takano @ 2009-03-18 15:56 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: lxc-devel, Linux Containers, Linux Netdev List

Hi Daniel,

On Wed, Mar 18, 2009 at 7:10 PM, Daniel Lezcano <dlezcano@fr.ibm.com> wrote:
> Ryousei Takano wrote:
>>
>> Hi all,
>>
>> I am evaluating the networking performance of lxc on 10 Gigabit Ethernet
>> by
>> using netperf benchmark.
>
> Thanks for doing benchmarking.
> I did two years ago similar tests and there is an analysis of the
> performances at:
> http://lxc.sourceforge.net/network/benchs.php
>
> It is not up to date, but that will give you some clues of what is happening
> with this overhead.
>
I am using VServer because other virtualization mechanisms, including OpenVZ,
Xen, and KVM cannot fully utilize the network bandwidth of 10 GbE.

Here are the results of netperf bencmark:
	vanilla (2.6.27-9)		9525.94
	Vserver (2.6.27.10)	9521.79
	OpenVZ (2.6.27.10)	2049.89
	Xen (2.6.26.1)		1011.47
	KVM (2.6.27-9)		1022.42

Now I am interesting to use LXC instead of VServer.

>> Using a macvlan device, the throughput was 9.6 Gbps. But, using a veth
>> device,
>> the throughput was only 2.7 Gbps.
>
> Yeah, definitively the macvlan interfaces is the best in terms of
> performances but with the restriction of not being able to communicate
> between containers on the same hosts.
>
This restriction is not so big issue for my purpose.

> There are some discussions around that:
>
> http://marc.info/?l=linux-netdev&m=123643508124711&w=2
>
> The veth is a virtual device hence it has not offloading. When the packet
> are sent out, the network stack looks at the nic offloading capability which
> is not present. So the kernel will compute the checksums instead of letting
> the nic to do that either if the packet is transmitted through the physical
> nic. This is a well known issue related to network virtualization and xen
> has developed a specific network driver:
> http://www.cse.psu.edu/~bhuvan/teaching/spring06/papers/xen-net-opt.pdf
>
>> I think this is because the overhead of bridging devices is high.
>
> Yes, bridging adds some overhead and AFAIR bridging + netfilter does some
> skb copy.
>
Thank you for pointers.

>> I also checked the host OS's performance when I used a veth device.
>> I observed a strange phenomenon.
>>
>> Before issuing lxc-start command, the throughput was 9.6 Gbps.
>> Here is the output of brctl show:
>>        $ brctl show
>>        bridge name     bridge id               STP enabled     interfaces
>>        br0             8000.0060dd470d49       no              eth1
>>
>> After issuing lxc-start command, the throughput decreased to 3.2 Gbps.
>> Here is the output of brctl show:
>>        $ sudo brctl show
>>        bridge name     bridge id               STP enabled     interfaces
>>        br0             8000.0060dd470d49       no              eth1
>>                                                                veth0_7573
>>
>> I wonder why the performance is greatly influenced by adding a veth device
>> to a bridge device.
>
> Hmm, good question :)
>
>> Here is my experimental setting:
>>        OS: Ubuntu server 8.10 amd64
>>        Kernel: 2.6.27-rc8 (checkout from the lxc git repository)
>
> I would recommend to use the 2.6.29-rc8 vanilla because this kernel does no
> longer need patches, a lot of fixes were done in the network namespace and
> maybe the bridge has been improved in the meantime :)
>
I checked out the 2.6.29-rc8 vanilla kernel.
The performance after issuing lxc-start improved to 8.7 Gbps!
It's a big improvement, while some performance loss remains.
Can not we avoid this loss?

>>        Userland tool: 0.6.0
>>        NIC: Myricom Myri-10G
>>
>> Any comments and suggestions will be appreciate.
>> If this list is not proper place to talk about this problem, can anyone
>> tell me
>> the proper one?
>
> The performances question is more related to the network virtualization
> implementation and should be sent to netdev@ and containers@ (added in the
> Cc' of this email), of course people at lxc-devel@ will be interested by
> these aspects, so lxc-devel@ is the right mailing list too.
>
> Thanks for your testings
>  -- Daniel
>

Best regards,
Ryousei Takano

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [lxc-devel] Poor bridging performance on 10 GbE
  2009-03-18 15:56   ` Ryousei Takano
@ 2009-03-19  0:50     ` Eric W. Biederman
  2009-03-19  5:37       ` Ryousei Takano
  0 siblings, 1 reply; 6+ messages in thread
From: Eric W. Biederman @ 2009-03-19  0:50 UTC (permalink / raw)
  To: Ryousei Takano
  Cc: Daniel Lezcano, Linux Containers, Linux Netdev List, lxc-devel

Ryousei Takano <ryousei@gmail.com> writes:

> I am using VServer because other virtualization mechanisms, including OpenVZ,
> Xen, and KVM cannot fully utilize the network bandwidth of 10 GbE.
>
> Here are the results of netperf bencmark:
> 	vanilla (2.6.27-9)		9525.94
> 	Vserver (2.6.27.10)	9521.79
> 	OpenVZ (2.6.27.10)	2049.89
> 	Xen (2.6.26.1)		1011.47
> 	KVM (2.6.27-9)		1022.42
>
> Now I am interesting to use LXC instead of VServer.

A good argument.

>>> Using a macvlan device, the throughput was 9.6 Gbps. But, using a veth
>>> device,
>>> the throughput was only 2.7 Gbps.
>>
>> Yeah, definitively the macvlan interfaces is the best in terms of
>> performances but with the restriction of not being able to communicate
>> between containers on the same hosts.
>>
> This restriction is not so big issue for my purpose.

Right.  I have been trying to figure out what the best way to cope
with that restriction is.

>>> I also checked the host OS's performance when I used a veth device.
>>> I observed a strange phenomenon.
>>>
>>> Before issuing lxc-start command, the throughput was 9.6 Gbps.
>>> Here is the output of brctl show:
>>>        $ brctl show
>>>        bridge name     bridge id               STP enabled     interfaces
>>>        br0             8000.0060dd470d49       no              eth1
>>>
>>> After issuing lxc-start command, the throughput decreased to 3.2 Gbps.
>>> Here is the output of brctl show:
>>>        $ sudo brctl show
>>>        bridge name     bridge id               STP enabled     interfaces
>>>        br0             8000.0060dd470d49       no              eth1
>>>                                                                veth0_7573
>>>
>>> I wonder why the performance is greatly influenced by adding a veth device
>>> to a bridge device.
>>
>> Hmm, good question :)

Bridging last I looked uses the least common denominator of hardware
offloads.  Which likely explains why adding a veth decreased your
bridging performance.

>>> Here is my experimental setting:
>>>        OS: Ubuntu server 8.10 amd64
>>>        Kernel: 2.6.27-rc8 (checkout from the lxc git repository)
>>
>> I would recommend to use the 2.6.29-rc8 vanilla because this kernel does no
>> longer need patches, a lot of fixes were done in the network namespace and
>> maybe the bridge has been improved in the meantime :)
>>
> I checked out the 2.6.29-rc8 vanilla kernel.
> The performance after issuing lxc-start improved to 8.7 Gbps!
> It's a big improvement, while some performance loss remains.
> Can not we avoid this loss?

Good question.  Any chance you can profile this and see where the
performance loss seems to be coming from?

Eric

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [lxc-devel] Poor bridging performance on 10 GbE
  2009-03-19  0:50     ` Eric W. Biederman
@ 2009-03-19  5:37       ` Ryousei Takano
  2009-03-19  9:08         ` Daniel Lezcano
  0 siblings, 1 reply; 6+ messages in thread
From: Ryousei Takano @ 2009-03-19  5:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Daniel Lezcano, Linux Containers, Linux Netdev List, lxc-devel

Hi Eric,

On Thu, Mar 19, 2009 at 9:50 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:

[snip]

> Bridging last I looked uses the least common denominator of hardware
> offloads.  Which likely explains why adding a veth decreased your
> bridging performance.
>
At least now LRO cannot coexist bridging.
So I disable the LRO feature of the myri10ge driver.

>>>> Here is my experimental setting:
>>>>        OS: Ubuntu server 8.10 amd64
>>>>        Kernel: 2.6.27-rc8 (checkout from the lxc git repository)
>>>
>>> I would recommend to use the 2.6.29-rc8 vanilla because this kernel does no
>>> longer need patches, a lot of fixes were done in the network namespace and
>>> maybe the bridge has been improved in the meantime :)
>>>
>> I checked out the 2.6.29-rc8 vanilla kernel.
>> The performance after issuing lxc-start improved to 8.7 Gbps!
>> It's a big improvement, while some performance loss remains.
>> Can not we avoid this loss?
>
> Good question.  Any chance you can profile this and see where the
> performance loss seems to be coming from?
>
I found out this issue is caused by decreasing the MTU size.
Myri-10G's MTU size is 9000 bytes; the veth' MTU size is 1500 bytes.
After bridging veth, MTU size decreases from 9000 to 1500 bytes.
I changed the veth's MTU size to 9000 bytes, and then I confirmed
the throughput improved to 9.6 Gbps.

The throughput between LXC containers also improved to 4.9 Gbps
by changing the MTU sizes.

So I propose to add lxc.network.mtu into the LXC configuration.
How does that sound?

> Eric
>

Best regards,
Ryousei Takano

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [lxc-devel] Poor bridging performance on 10 GbE
  2009-03-19  5:37       ` Ryousei Takano
@ 2009-03-19  9:08         ` Daniel Lezcano
  2009-03-19 10:50           ` Ryousei Takano
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Lezcano @ 2009-03-19  9:08 UTC (permalink / raw)
  To: Ryousei Takano
  Cc: Eric W. Biederman, Linux Containers, Linux Netdev List, lxc-devel

Ryousei Takano wrote:
> Hi Eric,
> 
> On Thu, Mar 19, 2009 at 9:50 AM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
> 
> [snip]
> 
>> Bridging last I looked uses the least common denominator of hardware
>> offloads.  Which likely explains why adding a veth decreased your
>> bridging performance.
>>
> At least now LRO cannot coexist bridging.
> So I disable the LRO feature of the myri10ge driver.
> 
>>>>> Here is my experimental setting:
>>>>>        OS: Ubuntu server 8.10 amd64
>>>>>        Kernel: 2.6.27-rc8 (checkout from the lxc git repository)
>>>> I would recommend to use the 2.6.29-rc8 vanilla because this kernel does no
>>>> longer need patches, a lot of fixes were done in the network namespace and
>>>> maybe the bridge has been improved in the meantime :)
>>>>
>>> I checked out the 2.6.29-rc8 vanilla kernel.
>>> The performance after issuing lxc-start improved to 8.7 Gbps!
>>> It's a big improvement, while some performance loss remains.
>>> Can not we avoid this loss?
>> Good question.  Any chance you can profile this and see where the
>> performance loss seems to be coming from?
>>
> I found out this issue is caused by decreasing the MTU size.
> Myri-10G's MTU size is 9000 bytes; the veth' MTU size is 1500 bytes.
> After bridging veth, MTU size decreases from 9000 to 1500 bytes.
> I changed the veth's MTU size to 9000 bytes, and then I confirmed
> the throughput improved to 9.6 Gbps.
> 
> The throughput between LXC containers also improved to 4.9 Gbps
> by changing the MTU sizes.
> 
> So I propose to add lxc.network.mtu into the LXC configuration.
> How does that sound?

Sounds good :)
Do you plan to send a patch ?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [lxc-devel] Poor bridging performance on 10 GbE
  2009-03-19  9:08         ` Daniel Lezcano
@ 2009-03-19 10:50           ` Ryousei Takano
  0 siblings, 0 replies; 6+ messages in thread
From: Ryousei Takano @ 2009-03-19 10:50 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Eric W. Biederman, Linux Containers, Linux Netdev List, lxc-devel

Hi Daniel,

On Thu, Mar 19, 2009 at 6:08 PM, Daniel Lezcano <dlezcano@fr.ibm.com> wrote:

[snip]

>> So I propose to add lxc.network.mtu into the LXC configuration.
>> How does that sound?
>
> Sounds good :)
> Do you plan to send a patch ?
>

Yes, I will post a patch as soon as possible.

Best regards,
Ryousei Takano

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-03-19 10:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <b30d1c3b0903180221h5175618eue162ffdec3817b4c@mail.gmail.com>
2009-03-18 10:10 ` [lxc-devel] Poor bridging performance on 10 GbE Daniel Lezcano
2009-03-18 15:56   ` Ryousei Takano
2009-03-19  0:50     ` Eric W. Biederman
2009-03-19  5:37       ` Ryousei Takano
2009-03-19  9:08         ` Daniel Lezcano
2009-03-19 10:50           ` Ryousei Takano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).