From mboxrd@z Thu Jan 1 00:00:00 1970 From: Or Gerlitz Subject: Re: vxlan/veth performance issues on net.git + latest kernels Date: Sun, 8 Dec 2013 15:07:54 +0200 Message-ID: <52A46F2A.40101@mellanox.com> References: <52A197DF.5010806@mellanox.com> <20131208124352.GA7935@zed.ravello.local> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Cc: Joseph Gasparakis , Pravin B Shelar , Eric Dumazet , Jerry Chu , Eric Dumazet , Alexei Starovoitov , David Miller , netdev , "Kirsher, Jeffrey T" , John Fastabend To: Mike Rapoport , Or Gerlitz Return-path: Received: from eu1sys200aog111.obsmtp.com ([207.126.144.131]:47813 "EHLO eu1sys200aog111.obsmtp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752672Ab3LHNIN (ORCPT ); Sun, 8 Dec 2013 08:08:13 -0500 In-Reply-To: <20131208124352.GA7935@zed.ravello.local> Sender: netdev-owner@vger.kernel.org List-ID: On 08/12/2013 14:43, Mike Rapoport wrote: > On Fri, Dec 06, 2013 at 11:30:37AM +0200, Or Gerlitz wrote: >>> On 04/12/2013 11:41, Or Gerlitz wrote: >> BTW guys, I saw the issues with both bridge/openvswitch configuration >> - seems that we might have here somehow large breakage of the system >> w.r.t vxlan traffic for rates that go over few Gbs -- so would love to >> get feedback of any kind from the people that were involved with vxlan >> over the last months/year. > I've seen similar problems with vxlan traffic. In our scenario I had two VMs running on the same host and both VMs having the { veth --> bridge --> vlxan --> IP stack --> NIC } chain. How the VMs were connected to the veth NICs? what kernel were you using? > Running iperf on veth showed rate ~6 times slower than direct NIC <-> NIC. With a hack that forces large gso_size in vxlan's handle_offloads, I've got veth performing only slightly slower than NICs ... The explanation I thought of is that performing the split of the packet as late as possible reduces processing overhead and allows more data to be processed. thanks for the tip! few quick clarifications -- so you artificially enlarged the gso_size of the skb? can you provide the line you added here static int handle_offloads(struct sk_buff *skb) { if (skb_is_gso(skb)) { int err = skb_unclone(skb, GFP_ATOMIC); if (unlikely(err)) return err; skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL; } else if (skb->ip_summed != CHECKSUM_PARTIAL) skb->ip_summed = CHECKSUM_NONE; return 0; } also, why enlarging the gso size for skb's cause the actual segmentation to come into play lower in the stack? Or.