netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH net-next] veth: extend features to support tunneling
@ 2013-11-16 21:11 Or Gerlitz
  2013-11-16 21:40 ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Or Gerlitz @ 2013-11-16 21:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexei Starovoitov, David Miller, Eric Dumazet, Stephen Hemminger,
	netdev@vger.kernel.org, Michael S. Tsirkin, John Fastabend

On Sat, Oct 26, 2013 at 6:13 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2013-10-25 at 19:22 -0700, Alexei Starovoitov wrote:
>> On Fri, Oct 25, 2013 at 6:25 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > From: Eric Dumazet <edumazet@google.com>
>> >
>> > While investigating on a recent vxlan regression, I found veth
>> > was using a zero features set for vxlan tunnels.
>>
>> oneliner can be better :)
>
> Yes, I'll post the gso fix as well, of course ;)
>
>>
>> > We have to segment GSO frames, copy the payload, and do the checksum.
>> >
>> > This patch brings a ~200% performance increase
>> >
>> > We probably have to add hw_enc_features support
>> > on other virtual devices.
>> >
>> > Signed-off-by: Eric Dumazet <edumazet@google.com>
>> > Cc: Alexei Starovoitov <ast@plumgrid.com>
>> > ---
>>
>> iperf over veth with gre/vxlan tunneling is now ~4Gbps. 20x gain as advertised.
>> Thanks!
>
> Wow, such a difference might show another bug somewhere else...

Guys (thanks Eric for the clarification over the other vxlan thread),
with the latest networking code (e.g 3.12 or net-next)  do you expect
notable performance (throughput) difference between these two configs?

1. bridge --> vxlan --> NIC
2. veth --> bridge --> vxlan --> NIC

BTW #2 doesn't work when packets start to be large unless I manually
decrease the veth device pair MTU. E.g if the NIC MTU is 1500, vxlan
advertizes an MTU of 1450 (= 1500 - (14 + 20 + 8 + 8)) and the bridge
inherits that, but not the veth device. Should someone/somewhere here
generate an ICMP packet which will cause the stack to decreate the
path mtu for the neighbour created on the veth device? what about
para-virtualized guests which are plugged into this (or any host based
tunneling) scheme, e.g in this scheme

3. guest virtio NIC --> vhost  --> tap/macvtap --> bridge --> vxlan --> NIC

Who/how do we want the guest NIC mtu/path mtu to take into account the
tunneling over-head?

Or.

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: vxlan gso is broken by stackable gso_segment()
@ 2013-10-25  1:59 Alexei Starovoitov
  2013-10-25  4:06 ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2013-10-25  1:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, Stephen Hemminger, David S. Miller, netdev

gre seems to be fine.
packets seem to be segmented with wrong length and being dropped.
After client iperf is finished, in few seconds I see the warning:

[  329.669685] WARNING: CPU: 3 PID: 3817 at net/core/skbuff.c:3474
skb_try_coalesce+0x3a0/0x3f0()
[  329.669688] Modules linked in: vxlan ip_tunnel veth ip6table_filter
ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4
xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp
iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap
macvlan vhost kvm_intel kvm iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi dm_crypt hid_generic eeepc_wmi asus_wmi
sparse_keymap mxm_wmi dm_multipath psmouse serio_raw usbhid hid
parport_pc ppdev firewire_ohci e1000e firewire_core lpc_ich crc_itu_t
binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config
i2o_block video
[  329.669746] CPU: 3 PID: 3817 Comm: iperf Not tainted 3.12.0-rc6+ #81
[  329.669748] Hardware name: System manufacturer System Product
Name/P8Z77 WS, BIOS 3007 07/26/2012
[  329.669750]  0000000000000009 ffff88082fb839d8 ffffffff8175427a
0000000000000002
[  329.669756]  0000000000000000 ffff88082fb83a18 ffffffff8105206c
ffff880808f926f8
[  329.669760]  ffff8807ef122b00 ffff8807ef122a00 0000000000000576
ffff88082fb83a94
[  329.669765] Call Trace:
[  329.669767]  <IRQ>  [<ffffffff8175427a>] dump_stack+0x55/0x76
[  329.669779]  [<ffffffff8105206c>] warn_slowpath_common+0x8c/0xc0
[  329.669783]  [<ffffffff810520ba>] warn_slowpath_null+0x1a/0x20
[  329.669787]  [<ffffffff816150f0>] skb_try_coalesce+0x3a0/0x3f0
[  329.669793]  [<ffffffff8167bce4>] tcp_try_coalesce.part.44+0x34/0xa0
[  329.669797]  [<ffffffff8167d168>] tcp_queue_rcv+0x108/0x150
[  329.669801]  [<ffffffff8167f129>] tcp_data_queue+0x299/0xd00
[  329.669806]  [<ffffffff816822f4>] tcp_rcv_established+0x2d4/0x8f0
[  329.669809]  [<ffffffff8168d8b5>] tcp_v4_do_rcv+0x295/0x520
[  329.669813]  [<ffffffff8168fb08>] tcp_v4_rcv+0x888/0xc30
[  329.669818]  [<ffffffff816651d3>] ? ip_local_deliver_finish+0x43/0x480
[  329.669823]  [<ffffffff810cae04>] ? __lock_is_held+0x54/0x80
[  329.669827]  [<ffffffff816652fb>] ip_local_deliver_finish+0x16b/0x480
[  329.669831]  [<ffffffff816651d3>] ? ip_local_deliver_finish+0x43/0x480
[  329.669836]  [<ffffffff81666018>] ip_local_deliver+0x48/0x80
[  329.669840]  [<ffffffff81665770>] ip_rcv_finish+0x160/0x770
[  329.669845]  [<ffffffff816662f8>] ip_rcv+0x2a8/0x3e0
[  329.669849]  [<ffffffff81623d13>] __netif_receive_skb_core+0xa63/0xdb0
[  329.669853]  [<ffffffff816233b8>] ? __netif_receive_skb_core+0x108/0xdb0
[  329.669858]  [<ffffffff8175d37f>] ? _raw_spin_unlock_irqrestore+0x3f/0x70
[  329.669862]  [<ffffffff8162417b>] ? process_backlog+0xab/0x180
[  329.669866]  [<ffffffff81624081>] __netif_receive_skb+0x21/0x70
[  329.669869]  [<ffffffff81624184>] process_backlog+0xb4/0x180
[  329.669873]  [<ffffffff81626d08>] ? net_rx_action+0x98/0x350
[  329.669876]  [<ffffffff81626dca>] net_rx_action+0x15a/0x350
[  329.669882]  [<ffffffff81057f97>] __do_softirq+0xf7/0x3f0
[  329.669886]  [<ffffffff8176820c>] call_softirq+0x1c/0x30
[  329.669887]  <EOI>  [<ffffffff81004bed>] do_softirq+0x8d/0xc0
[  329.669896]  [<ffffffff8160de03>] ? release_sock+0x193/0x1f0
[  329.669901]  [<ffffffff81057a5b>] local_bh_enable_ip+0xdb/0xf0
[  329.669906]  [<ffffffff8175d2e4>] _raw_spin_unlock_bh+0x44/0x50
[  329.669910]  [<ffffffff8160de03>] release_sock+0x193/0x1f0
[  329.669914]  [<ffffffff81679237>] tcp_recvmsg+0x467/0x1030
[  329.669919]  [<ffffffff816ab424>] inet_recvmsg+0x134/0x230
[  329.669923]  [<ffffffff8160a17d>] sock_recvmsg+0xad/0xe0

to reproduce do:
$ sudo brctl addbr br0
$ sudo ifconfig br0 up
$ cat foo1.conf
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = br0
lxc.network.ipv4 = 10.2.3.5/24
$sudo lxc-start -n foo1 -f ./foo1.conf bash
#ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0
#ip addr add 192.168.99.1/24 dev vxlan0
#ip link set up dev vxlan0
#iperf -s

similar for another lxc with different IP
$sudo lxc-start -n foo2 -f ./foo2.conf bash
#ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0
#ip addr add 192.168.99.2/24 dev vxlan0
#ip link set up dev vxlan0
# iperf -c 192.168.99.1

I keep hitting it all the time.


On Thu, Oct 24, 2013 at 5:41 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2013-10-24 at 16:37 -0700, Alexei Starovoitov wrote:
>> Hi Eric, Stephen,
>>
>> it seems commit 3347c960 "ipv4: gso: make inet_gso_segment() stackable"
>> broke vxlan gso
>>
>> the way to reproduce:
>> start two lxc with veth and bridge between them
>> create vxlan dev in both containers
>> do iperf
>>
>> this setup on net-next does ~80 Mbps and a lot of tcp retransmits.
>> reverting 3347c960 and d3e5e006 gets performance back to ~230 Mbps
>>
>> I guess vxlan driver suppose to set encap_level ? Some other way?
>
> Hi Alexei
>
> Are the GRE tunnels broken as well for you ?
>
> In my testings, GRE was working, and it looks GRE and vxlan has quite
> similar gso implementation.
>
> Maybe you can capture some of the broken frames with tcpdump ?
>
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-11-19  3:51 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-16 21:11 [PATCH net-next] veth: extend features to support tunneling Or Gerlitz
2013-11-16 21:40 ` Eric Dumazet
2013-11-17  0:09   ` Eric Dumazet
2013-11-17  6:57   ` Or Gerlitz
2013-11-17  7:31   ` Alexei Starovoitov
2013-11-17 17:20     ` Eric Dumazet
2013-11-17 19:00       ` Alexei Starovoitov
2013-11-17 21:20         ` Or Gerlitz
2013-11-17 22:38         ` Eric Dumazet
2013-11-18 17:55     ` David Stevens
2013-11-18 22:57       ` Jesse Gross
2013-11-19  3:51       ` Alexei Starovoitov
  -- strict thread matches above, loose matches on Subject: below --
2013-10-25  1:59 vxlan gso is broken by stackable gso_segment() Alexei Starovoitov
2013-10-25  4:06 ` Eric Dumazet
2013-10-25  9:09   ` Eric Dumazet
2013-10-25 22:18     ` David Miller
2013-10-25 22:41       ` Alexei Starovoitov
2013-10-25 23:25         ` Eric Dumazet
2013-10-26  0:52           ` Eric Dumazet
2013-10-26  1:25             ` [PATCH net-next] veth: extend features to support tunneling Eric Dumazet
2013-10-26  2:22               ` Alexei Starovoitov
2013-10-26  4:13                 ` Eric Dumazet
2013-10-28  4:58               ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).