netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* vxlan gso is broken by stackable gso_segment()
@ 2013-10-24 23:37 Alexei Starovoitov
  2013-10-25  0:41 ` Eric Dumazet
  0 siblings, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2013-10-24 23:37 UTC (permalink / raw)
  To: Eric Dumazet, Stephen Hemminger; +Cc: David S. Miller, netdev

Hi Eric, Stephen,

it seems commit 3347c960 "ipv4: gso: make inet_gso_segment() stackable"
broke vxlan gso

the way to reproduce:
start two lxc with veth and bridge between them
create vxlan dev in both containers
do iperf

this setup on net-next does ~80 Mbps and a lot of tcp retransmits.
reverting 3347c960 and d3e5e006 gets performance back to ~230 Mbps

I guess vxlan driver suppose to set encap_level ? Some other way?

Thanks
Alexei

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: vxlan gso is broken by stackable gso_segment()
  2013-10-24 23:37 vxlan gso is broken by stackable gso_segment() Alexei Starovoitov
@ 2013-10-25  0:41 ` Eric Dumazet
  2013-10-25  1:59   ` Alexei Starovoitov
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Dumazet @ 2013-10-25  0:41 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric Dumazet, Stephen Hemminger, David S. Miller, netdev

On Thu, 2013-10-24 at 16:37 -0700, Alexei Starovoitov wrote:
> Hi Eric, Stephen,
> 
> it seems commit 3347c960 "ipv4: gso: make inet_gso_segment() stackable"
> broke vxlan gso
> 
> the way to reproduce:
> start two lxc with veth and bridge between them
> create vxlan dev in both containers
> do iperf
> 
> this setup on net-next does ~80 Mbps and a lot of tcp retransmits.
> reverting 3347c960 and d3e5e006 gets performance back to ~230 Mbps
> 
> I guess vxlan driver suppose to set encap_level ? Some other way?

Hi Alexei

Are the GRE tunnels broken as well for you ?

In my testings, GRE was working, and it looks GRE and vxlan has quite
similar gso implementation.

Maybe you can capture some of the broken frames with tcpdump ?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: vxlan gso is broken by stackable gso_segment()
  2013-10-25  0:41 ` Eric Dumazet
@ 2013-10-25  1:59   ` Alexei Starovoitov
  2013-10-25  4:06     ` Eric Dumazet
  0 siblings, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2013-10-25  1:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, Stephen Hemminger, David S. Miller, netdev

gre seems to be fine.
packets seem to be segmented with wrong length and being dropped.
After client iperf is finished, in few seconds I see the warning:

[  329.669685] WARNING: CPU: 3 PID: 3817 at net/core/skbuff.c:3474
skb_try_coalesce+0x3a0/0x3f0()
[  329.669688] Modules linked in: vxlan ip_tunnel veth ip6table_filter
ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4
xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp
iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap
macvlan vhost kvm_intel kvm iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi dm_crypt hid_generic eeepc_wmi asus_wmi
sparse_keymap mxm_wmi dm_multipath psmouse serio_raw usbhid hid
parport_pc ppdev firewire_ohci e1000e firewire_core lpc_ich crc_itu_t
binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config
i2o_block video
[  329.669746] CPU: 3 PID: 3817 Comm: iperf Not tainted 3.12.0-rc6+ #81
[  329.669748] Hardware name: System manufacturer System Product
Name/P8Z77 WS, BIOS 3007 07/26/2012
[  329.669750]  0000000000000009 ffff88082fb839d8 ffffffff8175427a
0000000000000002
[  329.669756]  0000000000000000 ffff88082fb83a18 ffffffff8105206c
ffff880808f926f8
[  329.669760]  ffff8807ef122b00 ffff8807ef122a00 0000000000000576
ffff88082fb83a94
[  329.669765] Call Trace:
[  329.669767]  <IRQ>  [<ffffffff8175427a>] dump_stack+0x55/0x76
[  329.669779]  [<ffffffff8105206c>] warn_slowpath_common+0x8c/0xc0
[  329.669783]  [<ffffffff810520ba>] warn_slowpath_null+0x1a/0x20
[  329.669787]  [<ffffffff816150f0>] skb_try_coalesce+0x3a0/0x3f0
[  329.669793]  [<ffffffff8167bce4>] tcp_try_coalesce.part.44+0x34/0xa0
[  329.669797]  [<ffffffff8167d168>] tcp_queue_rcv+0x108/0x150
[  329.669801]  [<ffffffff8167f129>] tcp_data_queue+0x299/0xd00
[  329.669806]  [<ffffffff816822f4>] tcp_rcv_established+0x2d4/0x8f0
[  329.669809]  [<ffffffff8168d8b5>] tcp_v4_do_rcv+0x295/0x520
[  329.669813]  [<ffffffff8168fb08>] tcp_v4_rcv+0x888/0xc30
[  329.669818]  [<ffffffff816651d3>] ? ip_local_deliver_finish+0x43/0x480
[  329.669823]  [<ffffffff810cae04>] ? __lock_is_held+0x54/0x80
[  329.669827]  [<ffffffff816652fb>] ip_local_deliver_finish+0x16b/0x480
[  329.669831]  [<ffffffff816651d3>] ? ip_local_deliver_finish+0x43/0x480
[  329.669836]  [<ffffffff81666018>] ip_local_deliver+0x48/0x80
[  329.669840]  [<ffffffff81665770>] ip_rcv_finish+0x160/0x770
[  329.669845]  [<ffffffff816662f8>] ip_rcv+0x2a8/0x3e0
[  329.669849]  [<ffffffff81623d13>] __netif_receive_skb_core+0xa63/0xdb0
[  329.669853]  [<ffffffff816233b8>] ? __netif_receive_skb_core+0x108/0xdb0
[  329.669858]  [<ffffffff8175d37f>] ? _raw_spin_unlock_irqrestore+0x3f/0x70
[  329.669862]  [<ffffffff8162417b>] ? process_backlog+0xab/0x180
[  329.669866]  [<ffffffff81624081>] __netif_receive_skb+0x21/0x70
[  329.669869]  [<ffffffff81624184>] process_backlog+0xb4/0x180
[  329.669873]  [<ffffffff81626d08>] ? net_rx_action+0x98/0x350
[  329.669876]  [<ffffffff81626dca>] net_rx_action+0x15a/0x350
[  329.669882]  [<ffffffff81057f97>] __do_softirq+0xf7/0x3f0
[  329.669886]  [<ffffffff8176820c>] call_softirq+0x1c/0x30
[  329.669887]  <EOI>  [<ffffffff81004bed>] do_softirq+0x8d/0xc0
[  329.669896]  [<ffffffff8160de03>] ? release_sock+0x193/0x1f0
[  329.669901]  [<ffffffff81057a5b>] local_bh_enable_ip+0xdb/0xf0
[  329.669906]  [<ffffffff8175d2e4>] _raw_spin_unlock_bh+0x44/0x50
[  329.669910]  [<ffffffff8160de03>] release_sock+0x193/0x1f0
[  329.669914]  [<ffffffff81679237>] tcp_recvmsg+0x467/0x1030
[  329.669919]  [<ffffffff816ab424>] inet_recvmsg+0x134/0x230
[  329.669923]  [<ffffffff8160a17d>] sock_recvmsg+0xad/0xe0

to reproduce do:
$ sudo brctl addbr br0
$ sudo ifconfig br0 up
$ cat foo1.conf
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = br0
lxc.network.ipv4 = 10.2.3.5/24
$sudo lxc-start -n foo1 -f ./foo1.conf bash
#ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0
#ip addr add 192.168.99.1/24 dev vxlan0
#ip link set up dev vxlan0
#iperf -s

similar for another lxc with different IP
$sudo lxc-start -n foo2 -f ./foo2.conf bash
#ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0
#ip addr add 192.168.99.2/24 dev vxlan0
#ip link set up dev vxlan0
# iperf -c 192.168.99.1

I keep hitting it all the time.


On Thu, Oct 24, 2013 at 5:41 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2013-10-24 at 16:37 -0700, Alexei Starovoitov wrote:
>> Hi Eric, Stephen,
>>
>> it seems commit 3347c960 "ipv4: gso: make inet_gso_segment() stackable"
>> broke vxlan gso
>>
>> the way to reproduce:
>> start two lxc with veth and bridge between them
>> create vxlan dev in both containers
>> do iperf
>>
>> this setup on net-next does ~80 Mbps and a lot of tcp retransmits.
>> reverting 3347c960 and d3e5e006 gets performance back to ~230 Mbps
>>
>> I guess vxlan driver suppose to set encap_level ? Some other way?
>
> Hi Alexei
>
> Are the GRE tunnels broken as well for you ?
>
> In my testings, GRE was working, and it looks GRE and vxlan has quite
> similar gso implementation.
>
> Maybe you can capture some of the broken frames with tcpdump ?
>
>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: vxlan gso is broken by stackable gso_segment()
  2013-10-25  1:59   ` Alexei Starovoitov
@ 2013-10-25  4:06     ` Eric Dumazet
  2013-10-25  9:09       ` Eric Dumazet
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Dumazet @ 2013-10-25  4:06 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric Dumazet, Stephen Hemminger, David S. Miller, netdev

On Thu, 2013-10-24 at 18:59 -0700, Alexei Starovoitov wrote:
> gre seems to be fine.
> packets seem to be segmented with wrong length and being dropped.
> After client iperf is finished, in few seconds I see the warning:
> 
> [  329.669685] WARNING: CPU: 3 PID: 3817 at net/core/skbuff.c:3474
> skb_try_coalesce+0x3a0/0x3f0()
> [  329.669688] Modules linked in: vxlan ip_tunnel veth ip6table_filter
> ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4
> xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp
> iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap
> macvlan vhost kvm_intel kvm iscsi_tcp libiscsi_tcp libiscsi
> scsi_transport_iscsi dm_crypt hid_generic eeepc_wmi asus_wmi
> sparse_keymap mxm_wmi dm_multipath psmouse serio_raw usbhid hid
> parport_pc ppdev firewire_ohci e1000e firewire_core lpc_ich crc_itu_t
> binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config
> i2o_block video
> [  329.669746] CPU: 3 PID: 3817 Comm: iperf Not tainted 3.12.0-rc6+ #81
> [  329.669748] Hardware name: System manufacturer System Product
> Name/P8Z77 WS, BIOS 3007 07/26/2012
> [  329.669750]  0000000000000009 ffff88082fb839d8 ffffffff8175427a
> 0000000000000002
> [  329.669756]  0000000000000000 ffff88082fb83a18 ffffffff8105206c
> ffff880808f926f8
> [  329.669760]  ffff8807ef122b00 ffff8807ef122a00 0000000000000576
> ffff88082fb83a94
> [  329.669765] Call Trace:
> [  329.669767]  <IRQ>  [<ffffffff8175427a>] dump_stack+0x55/0x76
> [  329.669779]  [<ffffffff8105206c>] warn_slowpath_common+0x8c/0xc0
> [  329.669783]  [<ffffffff810520ba>] warn_slowpath_null+0x1a/0x20
> [  329.669787]  [<ffffffff816150f0>] skb_try_coalesce+0x3a0/0x3f0
> [  329.669793]  [<ffffffff8167bce4>] tcp_try_coalesce.part.44+0x34/0xa0
> [  329.669797]  [<ffffffff8167d168>] tcp_queue_rcv+0x108/0x150
> [  329.669801]  [<ffffffff8167f129>] tcp_data_queue+0x299/0xd00
> [  329.669806]  [<ffffffff816822f4>] tcp_rcv_established+0x2d4/0x8f0
> [  329.669809]  [<ffffffff8168d8b5>] tcp_v4_do_rcv+0x295/0x520
> [  329.669813]  [<ffffffff8168fb08>] tcp_v4_rcv+0x888/0xc30
> [  329.669818]  [<ffffffff816651d3>] ? ip_local_deliver_finish+0x43/0x480
> [  329.669823]  [<ffffffff810cae04>] ? __lock_is_held+0x54/0x80
> [  329.669827]  [<ffffffff816652fb>] ip_local_deliver_finish+0x16b/0x480
> [  329.669831]  [<ffffffff816651d3>] ? ip_local_deliver_finish+0x43/0x480
> [  329.669836]  [<ffffffff81666018>] ip_local_deliver+0x48/0x80
> [  329.669840]  [<ffffffff81665770>] ip_rcv_finish+0x160/0x770
> [  329.669845]  [<ffffffff816662f8>] ip_rcv+0x2a8/0x3e0
> [  329.669849]  [<ffffffff81623d13>] __netif_receive_skb_core+0xa63/0xdb0
> [  329.669853]  [<ffffffff816233b8>] ? __netif_receive_skb_core+0x108/0xdb0
> [  329.669858]  [<ffffffff8175d37f>] ? _raw_spin_unlock_irqrestore+0x3f/0x70
> [  329.669862]  [<ffffffff8162417b>] ? process_backlog+0xab/0x180
> [  329.669866]  [<ffffffff81624081>] __netif_receive_skb+0x21/0x70
> [  329.669869]  [<ffffffff81624184>] process_backlog+0xb4/0x180
> [  329.669873]  [<ffffffff81626d08>] ? net_rx_action+0x98/0x350
> [  329.669876]  [<ffffffff81626dca>] net_rx_action+0x15a/0x350
> [  329.669882]  [<ffffffff81057f97>] __do_softirq+0xf7/0x3f0
> [  329.669886]  [<ffffffff8176820c>] call_softirq+0x1c/0x30
> [  329.669887]  <EOI>  [<ffffffff81004bed>] do_softirq+0x8d/0xc0
> [  329.669896]  [<ffffffff8160de03>] ? release_sock+0x193/0x1f0
> [  329.669901]  [<ffffffff81057a5b>] local_bh_enable_ip+0xdb/0xf0
> [  329.669906]  [<ffffffff8175d2e4>] _raw_spin_unlock_bh+0x44/0x50
> [  329.669910]  [<ffffffff8160de03>] release_sock+0x193/0x1f0
> [  329.669914]  [<ffffffff81679237>] tcp_recvmsg+0x467/0x1030
> [  329.669919]  [<ffffffff816ab424>] inet_recvmsg+0x134/0x230
> [  329.669923]  [<ffffffff8160a17d>] sock_recvmsg+0xad/0xe0
> 
> to reproduce do:
> $ sudo brctl addbr br0
> $ sudo ifconfig br0 up
> $ cat foo1.conf
> lxc.network.type = veth
> lxc.network.flags = up
> lxc.network.link = br0
> lxc.network.ipv4 = 10.2.3.5/24
> $sudo lxc-start -n foo1 -f ./foo1.conf bash
> #ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0
> #ip addr add 192.168.99.1/24 dev vxlan0
> #ip link set up dev vxlan0
> #iperf -s
> 
> similar for another lxc with different IP
> $sudo lxc-start -n foo2 -f ./foo2.conf bash
> #ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0
> #ip addr add 192.168.99.2/24 dev vxlan0
> #ip link set up dev vxlan0
> # iperf -c 192.168.99.1
> 
> I keep hitting it all the time.
> 
> 

Thanks for all these details.

I am in Edinburgh for the Kernel Summit, I'll take a look at this as
soon as possible.

> On Thu, Oct 24, 2013 at 5:41 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Thu, 2013-10-24 at 16:37 -0700, Alexei Starovoitov wrote:
> >> Hi Eric, Stephen,
> >>
> >> it seems commit 3347c960 "ipv4: gso: make inet_gso_segment() stackable"
> >> broke vxlan gso
> >>
> >> the way to reproduce:
> >> start two lxc with veth and bridge between them
> >> create vxlan dev in both containers
> >> do iperf
> >>
> >> this setup on net-next does ~80 Mbps and a lot of tcp retransmits.
> >> reverting 3347c960 and d3e5e006 gets performance back to ~230 Mbps
> >>
> >> I guess vxlan driver suppose to set encap_level ? Some other way?
> >
> > Hi Alexei
> >
> > Are the GRE tunnels broken as well for you ?
> >
> > In my testings, GRE was working, and it looks GRE and vxlan has quite
> > similar gso implementation.
> >
> > Maybe you can capture some of the broken frames with tcpdump ?
> >
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: vxlan gso is broken by stackable gso_segment()
  2013-10-25  4:06     ` Eric Dumazet
@ 2013-10-25  9:09       ` Eric Dumazet
  2013-10-25 22:18         ` David Miller
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Dumazet @ 2013-10-25  9:09 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric Dumazet, Stephen Hemminger, David S. Miller, netdev

On Thu, 2013-10-24 at 21:06 -0700, Eric Dumazet wrote:
> On Thu, 2013-10-24 at 18:59 -0700, Alexei Starovoitov wrote:
> > gre seems to be fine.
> > packets seem to be segmented with wrong length and being dropped.
> > After client iperf is finished, in few seconds I see the warning:
> > 
> > [  329.669685] WARNING: CPU: 3 PID: 3817 at net/core/skbuff.c:3474
> > skb_try_coalesce+0x3a0/0x3f0()
> > [  329.669688] Modules linked in: vxlan ip_tunnel veth ip6table_filter
> > ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4
> > xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp
> > iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap
> > macvlan vhost kvm_intel kvm iscsi_tcp libiscsi_tcp libiscsi
> > scsi_transport_iscsi dm_crypt hid_generic eeepc_wmi asus_wmi
> > sparse_keymap mxm_wmi dm_multipath psmouse serio_raw usbhid hid
> > parport_pc ppdev firewire_ohci e1000e firewire_core lpc_ich crc_itu_t
> > binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config
> > i2o_block video
> > [  329.669746] CPU: 3 PID: 3817 Comm: iperf Not tainted 3.12.0-rc6+ #81
> > [  329.669748] Hardware name: System manufacturer System Product
> > Name/P8Z77 WS, BIOS 3007 07/26/2012
> > [  329.669750]  0000000000000009 ffff88082fb839d8 ffffffff8175427a
> > 0000000000000002
> > [  329.669756]  0000000000000000 ffff88082fb83a18 ffffffff8105206c
> > ffff880808f926f8
> > [  329.669760]  ffff8807ef122b00 ffff8807ef122a00 0000000000000576
> > ffff88082fb83a94
> > [  329.669765] Call Trace:
> > [  329.669767]  <IRQ>  [<ffffffff8175427a>] dump_stack+0x55/0x76
> > [  329.669779]  [<ffffffff8105206c>] warn_slowpath_common+0x8c/0xc0
> > [  329.669783]  [<ffffffff810520ba>] warn_slowpath_null+0x1a/0x20
> > [  329.669787]  [<ffffffff816150f0>] skb_try_coalesce+0x3a0/0x3f0
> > [  329.669793]  [<ffffffff8167bce4>] tcp_try_coalesce.part.44+0x34/0xa0
> > [  329.669797]  [<ffffffff8167d168>] tcp_queue_rcv+0x108/0x150
> > [  329.669801]  [<ffffffff8167f129>] tcp_data_queue+0x299/0xd00
> > [  329.669806]  [<ffffffff816822f4>] tcp_rcv_established+0x2d4/0x8f0
> > [  329.669809]  [<ffffffff8168d8b5>] tcp_v4_do_rcv+0x295/0x520
> > [  329.669813]  [<ffffffff8168fb08>] tcp_v4_rcv+0x888/0xc30
> > [  329.669818]  [<ffffffff816651d3>] ? ip_local_deliver_finish+0x43/0x480
> > [  329.669823]  [<ffffffff810cae04>] ? __lock_is_held+0x54/0x80
> > [  329.669827]  [<ffffffff816652fb>] ip_local_deliver_finish+0x16b/0x480
> > [  329.669831]  [<ffffffff816651d3>] ? ip_local_deliver_finish+0x43/0x480
> > [  329.669836]  [<ffffffff81666018>] ip_local_deliver+0x48/0x80
> > [  329.669840]  [<ffffffff81665770>] ip_rcv_finish+0x160/0x770
> > [  329.669845]  [<ffffffff816662f8>] ip_rcv+0x2a8/0x3e0
> > [  329.669849]  [<ffffffff81623d13>] __netif_receive_skb_core+0xa63/0xdb0
> > [  329.669853]  [<ffffffff816233b8>] ? __netif_receive_skb_core+0x108/0xdb0
> > [  329.669858]  [<ffffffff8175d37f>] ? _raw_spin_unlock_irqrestore+0x3f/0x70
> > [  329.669862]  [<ffffffff8162417b>] ? process_backlog+0xab/0x180
> > [  329.669866]  [<ffffffff81624081>] __netif_receive_skb+0x21/0x70
> > [  329.669869]  [<ffffffff81624184>] process_backlog+0xb4/0x180
> > [  329.669873]  [<ffffffff81626d08>] ? net_rx_action+0x98/0x350
> > [  329.669876]  [<ffffffff81626dca>] net_rx_action+0x15a/0x350
> > [  329.669882]  [<ffffffff81057f97>] __do_softirq+0xf7/0x3f0
> > [  329.669886]  [<ffffffff8176820c>] call_softirq+0x1c/0x30
> > [  329.669887]  <EOI>  [<ffffffff81004bed>] do_softirq+0x8d/0xc0
> > [  329.669896]  [<ffffffff8160de03>] ? release_sock+0x193/0x1f0
> > [  329.669901]  [<ffffffff81057a5b>] local_bh_enable_ip+0xdb/0xf0
> > [  329.669906]  [<ffffffff8175d2e4>] _raw_spin_unlock_bh+0x44/0x50
> > [  329.669910]  [<ffffffff8160de03>] release_sock+0x193/0x1f0
> > [  329.669914]  [<ffffffff81679237>] tcp_recvmsg+0x467/0x1030
> > [  329.669919]  [<ffffffff816ab424>] inet_recvmsg+0x134/0x230
> > [  329.669923]  [<ffffffff8160a17d>] sock_recvmsg+0xad/0xe0
> > 
> > to reproduce do:
> > $ sudo brctl addbr br0
> > $ sudo ifconfig br0 up
> > $ cat foo1.conf
> > lxc.network.type = veth
> > lxc.network.flags = up
> > lxc.network.link = br0
> > lxc.network.ipv4 = 10.2.3.5/24
> > $sudo lxc-start -n foo1 -f ./foo1.conf bash
> > #ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0
> > #ip addr add 192.168.99.1/24 dev vxlan0
> > #ip link set up dev vxlan0
> > #iperf -s
> > 
> > similar for another lxc with different IP
> > $sudo lxc-start -n foo2 -f ./foo2.conf bash
> > #ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0
> > #ip addr add 192.168.99.2/24 dev vxlan0
> > #ip link set up dev vxlan0
> > # iperf -c 192.168.99.1
> > 
> > I keep hitting it all the time.
> > 
> > 
> 
> Thanks for all these details.
> 
> I am in Edinburgh for the Kernel Summit, I'll take a look at this as
> soon as possible.

Could you try following fix ?

Thanks !

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index f4a159e..17dd8320 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1252,6 +1252,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 	const struct net_offload *ops;
 	unsigned int offset = 0;
 	struct iphdr *iph;
+	bool udpfrag;
 	bool tunnel;
 	int proto;
 	int nhoff;
@@ -1306,10 +1307,11 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 	if (IS_ERR_OR_NULL(segs))
 		goto out;
 
+	udpfrag = !!skb->encapsulation && proto == IPPROTO_UDP;
 	skb = segs;
 	do {
 		iph = (struct iphdr *)(skb_mac_header(skb) + nhoff);
-		if (!tunnel && proto == IPPROTO_UDP) {
+		if (udpfrag) {
 			iph->id = htons(id);
 			iph->frag_off = htons(offset >> 3);
 			if (skb->next != NULL)

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: vxlan gso is broken by stackable gso_segment()
  2013-10-25  9:09       ` Eric Dumazet
@ 2013-10-25 22:18         ` David Miller
  2013-10-25 22:41           ` Alexei Starovoitov
  0 siblings, 1 reply; 35+ messages in thread
From: David Miller @ 2013-10-25 22:18 UTC (permalink / raw)
  To: eric.dumazet; +Cc: ast, edumazet, stephen, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 25 Oct 2013 02:09:00 -0700

> @@ -1252,6 +1252,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>  	const struct net_offload *ops;
>  	unsigned int offset = 0;
>  	struct iphdr *iph;
> +	bool udpfrag;
>  	bool tunnel;
>  	int proto;
>  	int nhoff;
> @@ -1306,10 +1307,11 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>  	if (IS_ERR_OR_NULL(segs))
>  		goto out;
>  
> +	udpfrag = !!skb->encapsulation && proto == IPPROTO_UDP;
>  	skb = segs;
>  	do {
>  		iph = (struct iphdr *)(skb_mac_header(skb) + nhoff);
> -		if (!tunnel && proto == IPPROTO_UDP) {
> +		if (udpfrag) {
>  			iph->id = htons(id);
>  			iph->frag_off = htons(offset >> 3);
>  			if (skb->next != NULL)
> 

The "tunnel" variable becomes unused once you do this, please remove it.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: vxlan gso is broken by stackable gso_segment()
  2013-10-25 22:18         ` David Miller
@ 2013-10-25 22:41           ` Alexei Starovoitov
  2013-10-25 23:10             ` David Miller
                               ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Alexei Starovoitov @ 2013-10-25 22:41 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, Eric Dumazet, stephen, netdev

On Fri, Oct 25, 2013 at 3:18 PM, David Miller <davem@davemloft.net> wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 25 Oct 2013 02:09:00 -0700
>
>> @@ -1252,6 +1252,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>>       const struct net_offload *ops;
>>       unsigned int offset = 0;
>>       struct iphdr *iph;
>> +     bool udpfrag;
>>       bool tunnel;
>>       int proto;
>>       int nhoff;
>> @@ -1306,10 +1307,11 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>>       if (IS_ERR_OR_NULL(segs))
>>               goto out;
>>
>> +     udpfrag = !!skb->encapsulation && proto == IPPROTO_UDP;
>>       skb = segs;
>>       do {
>>               iph = (struct iphdr *)(skb_mac_header(skb) + nhoff);
>> -             if (!tunnel && proto == IPPROTO_UDP) {
>> +             if (udpfrag) {
>>                       iph->id = htons(id);
>>                       iph->frag_off = htons(offset >> 3);
>>                       if (skb->next != NULL)
>>
>
> The "tunnel" variable becomes unused once you do this, please remove it.

'bool tunnel' actually still used to indicate encap_level > 0

Eric's fix brings back performance for vxlan and gre keeps working. Thx!

net/core/skbuff.c:3474 skb_try_coalesce() warning, I mentioned before,
is unrelated.
I still see it with this patch. Running either gre or vxlan tunnels.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: vxlan gso is broken by stackable gso_segment()
  2013-10-25 22:41           ` Alexei Starovoitov
@ 2013-10-25 23:10             ` David Miller
  2013-10-25 23:25             ` Eric Dumazet
  2013-10-28  1:18             ` [PATCH net-next] inet: restore gso for vxlan Eric Dumazet
  2 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2013-10-25 23:10 UTC (permalink / raw)
  To: ast; +Cc: eric.dumazet, edumazet, stephen, netdev

From: Alexei Starovoitov <ast@plumgrid.com>
Date: Fri, 25 Oct 2013 15:41:47 -0700

> 'bool tunnel' actually still used to indicate encap_level > 0

Good catch, I misread the code.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: vxlan gso is broken by stackable gso_segment()
  2013-10-25 22:41           ` Alexei Starovoitov
  2013-10-25 23:10             ` David Miller
@ 2013-10-25 23:25             ` Eric Dumazet
  2013-10-26  0:26               ` [PATCH net-next] tcp: gso: fix truesize tracking Eric Dumazet
  2013-10-26  0:52               ` vxlan gso is broken by stackable gso_segment() Eric Dumazet
  2013-10-28  1:18             ` [PATCH net-next] inet: restore gso for vxlan Eric Dumazet
  2 siblings, 2 replies; 35+ messages in thread
From: Eric Dumazet @ 2013-10-25 23:25 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Miller, Eric Dumazet, stephen, netdev

On Fri, 2013-10-25 at 15:41 -0700, Alexei Starovoitov wrote:

> 'bool tunnel' actually still used to indicate encap_level > 0
> 

Yes, I am studying if the setting of skb->encapsulation = 1 was really
needed in the :

if (tunnel) {
     skb_reset_inner_headers(skb);
     skb->encapsulation = 1;
}

And was planning to rename 'bool tunnel' by 'bool stacked' or
something... 

> Eric's fix brings back performance for vxlan and gre keeps working. Thx!

Please note the original performance is not that good, you mentioned 230
Mbps on lxc, while I get more than 5Gb/s on a 10G link.

This should be investigated ...

> 
> net/core/skbuff.c:3474 skb_try_coalesce() warning, I mentioned before,
> is unrelated.
> I still see it with this patch. Running either gre or vxlan tunnels.

I think this might be related to commit 6ff50cd55545 ("tcp: gso: do not
generate out of order packets")

I'll investigate this as well, thanks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH net-next] tcp: gso: fix truesize tracking
  2013-10-25 23:25             ` Eric Dumazet
@ 2013-10-26  0:26               ` Eric Dumazet
  2013-10-26  1:46                 ` Alexei Starovoitov
                                   ` (2 more replies)
  2013-10-26  0:52               ` vxlan gso is broken by stackable gso_segment() Eric Dumazet
  1 sibling, 3 replies; 35+ messages in thread
From: Eric Dumazet @ 2013-10-26  0:26 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Miller, Eric Dumazet, stephen, netdev

From: Eric Dumazet <edumazet@google.com>

commit 6ff50cd55545 ("tcp: gso: do not generate out of order packets")
had an heuristic that can trigger a warning in skb_try_coalesce(),
because skb->truesize of the gso segments were exactly set to mss.

This breaks the requirement that

skb->truesize >= skb->len + truesizeof(struct sk_buff);

It can trivially be reproduced by :

ifconfig lo mtu 1500
ethtool -K lo tso off
netperf

As the skbs are looped into the TCP networking stack, skb_try_coalesce()
warns us of these skb under-estimating their truesize.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexei Starovoitov <ast@plumgrid.com>
---
Based on net-next but applies as well on net tree with some fuzz.

David, we might backport this one to 3.12 and stable later, I let you
decide.

 net/ipv4/tcp_offload.c |   13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index a7a5583e..a2b68a1 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -18,6 +18,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 				netdev_features_t features)
 {
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
+	unsigned int sum_truesize = 0;
 	struct tcphdr *th;
 	unsigned int thlen;
 	unsigned int seq;
@@ -104,13 +105,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 		if (copy_destructor) {
 			skb->destructor = gso_skb->destructor;
 			skb->sk = gso_skb->sk;
-			/* {tcp|sock}_wfree() use exact truesize accounting :
-			 * sum(skb->truesize) MUST be exactly be gso_skb->truesize
-			 * So we account mss bytes of 'true size' for each segment.
-			 * The last segment will contain the remaining.
-			 */
-			skb->truesize = mss;
-			gso_skb->truesize -= mss;
+			sum_truesize += skb->truesize;
 		}
 		skb = skb->next;
 		th = tcp_hdr(skb);
@@ -127,7 +122,9 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 	if (copy_destructor) {
 		swap(gso_skb->sk, skb->sk);
 		swap(gso_skb->destructor, skb->destructor);
-		swap(gso_skb->truesize, skb->truesize);
+		sum_truesize += skb->truesize;
+		atomic_add(sum_truesize - gso_skb->truesize,
+			   &skb->sk->sk_wmem_alloc);
 	}
 
 	delta = htonl(oldlen + (skb_tail_pointer(skb) -

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: vxlan gso is broken by stackable gso_segment()
  2013-10-25 23:25             ` Eric Dumazet
  2013-10-26  0:26               ` [PATCH net-next] tcp: gso: fix truesize tracking Eric Dumazet
@ 2013-10-26  0:52               ` Eric Dumazet
  2013-10-26  1:25                 ` [PATCH net-next] veth: extend features to support tunneling Eric Dumazet
  1 sibling, 1 reply; 35+ messages in thread
From: Eric Dumazet @ 2013-10-26  0:52 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Miller, Eric Dumazet, stephen, netdev

On Fri, 2013-10-25 at 16:25 -0700, Eric Dumazet wrote:

> Please note the original performance is not that good, you mentioned 230
> Mbps on lxc, while I get more than 5Gb/s on a 10G link.
> 
> This should be investigated ...

This is probably trivial to increase performance :

veth currently do not support any kind of tunneling TSO :

tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]

I'll submit a patch for net-next

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH net-next] veth: extend features to support tunneling
  2013-10-26  0:52               ` vxlan gso is broken by stackable gso_segment() Eric Dumazet
@ 2013-10-26  1:25                 ` Eric Dumazet
  2013-10-26  2:22                   ` Alexei Starovoitov
  2013-10-28  4:58                   ` David Miller
  0 siblings, 2 replies; 35+ messages in thread
From: Eric Dumazet @ 2013-10-26  1:25 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Miller, Eric Dumazet, stephen, netdev

From: Eric Dumazet <edumazet@google.com>

While investigating on a recent vxlan regression, I found veth
was using a zero features set for vxlan tunnels.

We have to segment GSO frames, copy the payload, and do the checksum.

This patch brings a ~200% performance increase

We probably have to add hw_enc_features support
on other virtual devices.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
---
 drivers/net/veth.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index b2d0347..b24db7a 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -261,6 +261,8 @@ static const struct net_device_ops veth_netdev_ops = {
 
 #define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_ALL_TSO |    \
 		       NETIF_F_HW_CSUM | NETIF_F_RXCSUM | NETIF_F_HIGHDMA | \
+		       NETIF_F_GSO_GRE | NETIF_F_GSO_UDP_TUNNEL |	    \
+		       NETIF_F_GSO_IPIP | NETIF_F_GSO_SIT | NETIF_F_UFO	|   \
 		       NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX | \
 		       NETIF_F_HW_VLAN_STAG_TX | NETIF_F_HW_VLAN_STAG_RX )
 
@@ -279,6 +281,7 @@ static void veth_setup(struct net_device *dev)
 	dev->destructor = veth_dev_free;
 
 	dev->hw_features = VETH_FEATURES;
+	dev->hw_enc_features = VETH_FEATURES;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next] tcp: gso: fix truesize tracking
  2013-10-26  0:26               ` [PATCH net-next] tcp: gso: fix truesize tracking Eric Dumazet
@ 2013-10-26  1:46                 ` Alexei Starovoitov
  2013-10-28  4:58                 ` David Miller
  2013-10-29  4:05                 ` David Miller
  2 siblings, 0 replies; 35+ messages in thread
From: Alexei Starovoitov @ 2013-10-26  1:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Eric Dumazet, stephen, netdev

On Fri, Oct 25, 2013 at 5:26 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> commit 6ff50cd55545 ("tcp: gso: do not generate out of order packets")
> had an heuristic that can trigger a warning in skb_try_coalesce(),
> because skb->truesize of the gso segments were exactly set to mss.
>
> This breaks the requirement that
>
> skb->truesize >= skb->len + truesizeof(struct sk_buff);
>
> It can trivially be reproduced by :
>
> ifconfig lo mtu 1500
> ethtool -K lo tso off
> netperf
>
> As the skbs are looped into the TCP networking stack, skb_try_coalesce()
> warns us of these skb under-estimating their truesize.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Alexei Starovoitov <ast@plumgrid.com>
> ---

it fixed it in my setup. Thanks!

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next] veth: extend features to support tunneling
  2013-10-26  1:25                 ` [PATCH net-next] veth: extend features to support tunneling Eric Dumazet
@ 2013-10-26  2:22                   ` Alexei Starovoitov
  2013-10-26  4:13                     ` Eric Dumazet
  2013-10-28  4:58                   ` David Miller
  1 sibling, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2013-10-26  2:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Eric Dumazet, stephen, netdev

On Fri, Oct 25, 2013 at 6:25 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> While investigating on a recent vxlan regression, I found veth
> was using a zero features set for vxlan tunnels.

oneliner can be better :)

> We have to segment GSO frames, copy the payload, and do the checksum.
>
> This patch brings a ~200% performance increase
>
> We probably have to add hw_enc_features support
> on other virtual devices.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Alexei Starovoitov <ast@plumgrid.com>
> ---

iperf over veth with gre/vxlan tunneling is now ~4Gbps. 20x gain as advertised.
Thanks!

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next] veth: extend features to support tunneling
  2013-10-26  2:22                   ` Alexei Starovoitov
@ 2013-10-26  4:13                     ` Eric Dumazet
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Dumazet @ 2013-10-26  4:13 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Miller, Eric Dumazet, stephen, netdev

On Fri, 2013-10-25 at 19:22 -0700, Alexei Starovoitov wrote:
> On Fri, Oct 25, 2013 at 6:25 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > While investigating on a recent vxlan regression, I found veth
> > was using a zero features set for vxlan tunnels.
> 
> oneliner can be better :)

Yes, I'll post the gso fix as well, of course ;)

> 
> > We have to segment GSO frames, copy the payload, and do the checksum.
> >
> > This patch brings a ~200% performance increase
> >
> > We probably have to add hw_enc_features support
> > on other virtual devices.
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: Alexei Starovoitov <ast@plumgrid.com>
> > ---
> 
> iperf over veth with gre/vxlan tunneling is now ~4Gbps. 20x gain as advertised.
> Thanks!

Wow, such a difference might show another bug somewhere else...

Thanks !

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH net-next] inet: restore gso for vxlan
  2013-10-25 22:41           ` Alexei Starovoitov
  2013-10-25 23:10             ` David Miller
  2013-10-25 23:25             ` Eric Dumazet
@ 2013-10-28  1:18             ` Eric Dumazet
  2013-10-28  4:25               ` David Miller
  2013-11-07 22:33               ` Eric Dumazet
  2 siblings, 2 replies; 35+ messages in thread
From: Eric Dumazet @ 2013-10-28  1:18 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Miller, netdev

From: Eric Dumazet <edumazet@google.com>

Alexei reported a performance regression on vxlan, caused
by commit 3347c9602955 "ipv4: gso: make inet_gso_segment() stackable"

GSO vxlan packets were not properly segmented, adding IP fragments
while they were not expected.

Rename 'bool tunnel' to 'bool encap', and add a new boolean
to express the fact that UDP should be fragmented.
This fragmentation is triggered by skb->encapsulation being set.

Remove a "skb->encapsulation = 1" added in above commit,
as its not needed, as frags inherit skb->frag from original
GSO skb.
 
Reported-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Alexei Starovoitov <ast@plumgrid.com>
---
 net/ipv4/af_inet.c |   15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index f4a159e705c0..09d78d4a3cff 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1251,8 +1251,8 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	const struct net_offload *ops;
 	unsigned int offset = 0;
+	bool udpfrag, encap;
 	struct iphdr *iph;
-	bool tunnel;
 	int proto;
 	int nhoff;
 	int ihl;
@@ -1290,8 +1290,8 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 		goto out;
 	__skb_pull(skb, ihl);
 
-	tunnel = SKB_GSO_CB(skb)->encap_level > 0;
-	if (tunnel)
+	encap = SKB_GSO_CB(skb)->encap_level > 0;
+	if (encap)
 		features = skb->dev->hw_enc_features & netif_skb_features(skb);
 	SKB_GSO_CB(skb)->encap_level += ihl;
 
@@ -1306,24 +1306,23 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 	if (IS_ERR_OR_NULL(segs))
 		goto out;
 
+	udpfrag = !!skb->encapsulation && proto == IPPROTO_UDP;
 	skb = segs;
 	do {
 		iph = (struct iphdr *)(skb_mac_header(skb) + nhoff);
-		if (!tunnel && proto == IPPROTO_UDP) {
+		if (udpfrag) {
 			iph->id = htons(id);
 			iph->frag_off = htons(offset >> 3);
 			if (skb->next != NULL)
 				iph->frag_off |= htons(IP_MF);
 			offset += skb->len - nhoff - ihl;
-		} else  {
+		} else {
 			iph->id = htons(id++);
 		}
 		iph->tot_len = htons(skb->len - nhoff);
 		ip_send_check(iph);
-		if (tunnel) {
+		if (encap)
 			skb_reset_inner_headers(skb);
-			skb->encapsulation = 1;
-		}
 		skb->network_header = (u8 *)iph - skb->head;
 	} while ((skb = skb->next));
 

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next] inet: restore gso for vxlan
  2013-10-28  1:18             ` [PATCH net-next] inet: restore gso for vxlan Eric Dumazet
@ 2013-10-28  4:25               ` David Miller
  2013-11-07 22:33               ` Eric Dumazet
  1 sibling, 0 replies; 35+ messages in thread
From: David Miller @ 2013-10-28  4:25 UTC (permalink / raw)
  To: eric.dumazet; +Cc: ast, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 27 Oct 2013 18:18:16 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> Alexei reported a performance regression on vxlan, caused
> by commit 3347c9602955 "ipv4: gso: make inet_gso_segment() stackable"
> 
> GSO vxlan packets were not properly segmented, adding IP fragments
> while they were not expected.
> 
> Rename 'bool tunnel' to 'bool encap', and add a new boolean
> to express the fact that UDP should be fragmented.
> This fragmentation is triggered by skb->encapsulation being set.
> 
> Remove a "skb->encapsulation = 1" added in above commit,
> as its not needed, as frags inherit skb->frag from original
> GSO skb.
>  
> Reported-by: Alexei Starovoitov <ast@plumgrid.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Tested-by: Alexei Starovoitov <ast@plumgrid.com>

Applied, thanks!

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next] veth: extend features to support tunneling
  2013-10-26  1:25                 ` [PATCH net-next] veth: extend features to support tunneling Eric Dumazet
  2013-10-26  2:22                   ` Alexei Starovoitov
@ 2013-10-28  4:58                   ` David Miller
  1 sibling, 0 replies; 35+ messages in thread
From: David Miller @ 2013-10-28  4:58 UTC (permalink / raw)
  To: eric.dumazet; +Cc: ast, edumazet, stephen, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 25 Oct 2013 18:25:03 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> While investigating on a recent vxlan regression, I found veth
> was using a zero features set for vxlan tunnels.
> 
> We have to segment GSO frames, copy the payload, and do the checksum.
> 
> This patch brings a ~200% performance increase
> 
> We probably have to add hw_enc_features support
> on other virtual devices.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next] tcp: gso: fix truesize tracking
  2013-10-26  0:26               ` [PATCH net-next] tcp: gso: fix truesize tracking Eric Dumazet
  2013-10-26  1:46                 ` Alexei Starovoitov
@ 2013-10-28  4:58                 ` David Miller
  2013-10-29  4:05                 ` David Miller
  2 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2013-10-28  4:58 UTC (permalink / raw)
  To: eric.dumazet; +Cc: ast, edumazet, stephen, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 25 Oct 2013 17:26:17 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> commit 6ff50cd55545 ("tcp: gso: do not generate out of order packets")
> had an heuristic that can trigger a warning in skb_try_coalesce(),
> because skb->truesize of the gso segments were exactly set to mss.
> 
> This breaks the requirement that
> 
> skb->truesize >= skb->len + truesizeof(struct sk_buff);
> 
> It can trivially be reproduced by :
> 
> ifconfig lo mtu 1500
> ethtool -K lo tso off
> netperf
> 
> As the skbs are looped into the TCP networking stack, skb_try_coalesce()
> warns us of these skb under-estimating their truesize.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Alexei Starovoitov <ast@plumgrid.com>
> ---
> Based on net-next but applies as well on net tree with some fuzz.
> 
> David, we might backport this one to 3.12 and stable later, I let you
> decide.

I'm still thinking about what to do with this one.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next] tcp: gso: fix truesize tracking
  2013-10-26  0:26               ` [PATCH net-next] tcp: gso: fix truesize tracking Eric Dumazet
  2013-10-26  1:46                 ` Alexei Starovoitov
  2013-10-28  4:58                 ` David Miller
@ 2013-10-29  4:05                 ` David Miller
  2 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2013-10-29  4:05 UTC (permalink / raw)
  To: eric.dumazet; +Cc: ast, edumazet, stephen, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 25 Oct 2013 17:26:17 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> commit 6ff50cd55545 ("tcp: gso: do not generate out of order packets")
> had an heuristic that can trigger a warning in skb_try_coalesce(),
> because skb->truesize of the gso segments were exactly set to mss.
> 
> This breaks the requirement that
> 
> skb->truesize >= skb->len + truesizeof(struct sk_buff);
> 
> It can trivially be reproduced by :
> 
> ifconfig lo mtu 1500
> ethtool -K lo tso off
> netperf
> 
> As the skbs are looped into the TCP networking stack, skb_try_coalesce()
> warns us of these skb under-estimating their truesize.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Alexei Starovoitov <ast@plumgrid.com>

I decided to apply this to 'net' and queue it up for -stable,
thanks Eric!

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next] inet: restore gso for vxlan
  2013-10-28  1:18             ` [PATCH net-next] inet: restore gso for vxlan Eric Dumazet
  2013-10-28  4:25               ` David Miller
@ 2013-11-07 22:33               ` Eric Dumazet
  2013-11-07 23:27                 ` Alexei Starovoitov
  1 sibling, 1 reply; 35+ messages in thread
From: Eric Dumazet @ 2013-11-07 22:33 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Miller, netdev, Hannes Frederic Sowa

On Sun, 2013-10-27 at 18:18 -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Alexei reported a performance regression on vxlan, caused
> by commit 3347c9602955 "ipv4: gso: make inet_gso_segment() stackable"
> 
> GSO vxlan packets were not properly segmented, adding IP fragments
> while they were not expected.
> 
> Rename 'bool tunnel' to 'bool encap', and add a new boolean
> to express the fact that UDP should be fragmented.
> This fragmentation is triggered by skb->encapsulation being set.
> 
> Remove a "skb->encapsulation = 1" added in above commit,
> as its not needed, as frags inherit skb->frag from original
> GSO skb.
>  
> Reported-by: Alexei Starovoitov <ast@plumgrid.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Tested-by: Alexei Starovoitov <ast@plumgrid.com>
> ---
>  net/ipv4/af_inet.c |   15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index f4a159e705c0..09d78d4a3cff 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1251,8 +1251,8 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>  	struct sk_buff *segs = ERR_PTR(-EINVAL);
>  	const struct net_offload *ops;
>  	unsigned int offset = 0;
> +	bool udpfrag, encap;
>  	struct iphdr *iph;
> -	bool tunnel;
>  	int proto;
>  	int nhoff;
>  	int ihl;
> @@ -1290,8 +1290,8 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>  		goto out;
>  	__skb_pull(skb, ihl);
>  
> -	tunnel = SKB_GSO_CB(skb)->encap_level > 0;
> -	if (tunnel)
> +	encap = SKB_GSO_CB(skb)->encap_level > 0;
> +	if (encap)
>  		features = skb->dev->hw_enc_features & netif_skb_features(skb);
>  	SKB_GSO_CB(skb)->encap_level += ihl;
>  
> @@ -1306,24 +1306,23 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>  	if (IS_ERR_OR_NULL(segs))
>  		goto out;
>  
> +	udpfrag = !!skb->encapsulation && proto == IPPROTO_UDP;

It seems I did an error here, shouldn't it be

     udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;

instead ?

I am not sure UFO and vxlan are compatable...


>  	skb = segs;
>  	do {
>  		iph = (struct iphdr *)(skb_mac_header(skb) + nhoff);
> -		if (!tunnel && proto == IPPROTO_UDP) {
> +		if (udpfrag) {
>  			iph->id = htons(id);
>  			iph->frag_off = htons(offset >> 3);
>  			if (skb->next != NULL)
>  				iph->frag_off |= htons(IP_MF);
>  			offset += skb->len - nhoff - ihl;
> -		} else  {
> +		} else {
>  			iph->id = htons(id++);
>  		}
>  		iph->tot_len = htons(skb->len - nhoff);
>  		ip_send_check(iph);
> -		if (tunnel) {
> +		if (encap)
>  			skb_reset_inner_headers(skb);
> -			skb->encapsulation = 1;
> -		}
>  		skb->network_header = (u8 *)iph - skb->head;
>  	} while ((skb = skb->next));
>  

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next] inet: restore gso for vxlan
  2013-11-07 22:33               ` Eric Dumazet
@ 2013-11-07 23:27                 ` Alexei Starovoitov
  2013-11-08  0:44                   ` [PATCH net-next] inet: fix a UFO regression Eric Dumazet
  2013-11-16 20:23                   ` [PATCH net-next] inet: restore gso for vxlan Or Gerlitz
  0 siblings, 2 replies; 35+ messages in thread
From: Alexei Starovoitov @ 2013-11-07 23:27 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Hannes Frederic Sowa

On Thu, Nov 7, 2013 at 2:33 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sun, 2013-10-27 at 18:18 -0700, Eric Dumazet wrote:
>> From: Eric Dumazet <edumazet@google.com>
>>
>> Alexei reported a performance regression on vxlan, caused
>> by commit 3347c9602955 "ipv4: gso: make inet_gso_segment() stackable"
>>
>> GSO vxlan packets were not properly segmented, adding IP fragments
>> while they were not expected.
>>
>> Rename 'bool tunnel' to 'bool encap', and add a new boolean
>> to express the fact that UDP should be fragmented.
>> This fragmentation is triggered by skb->encapsulation being set.
>>
>> Remove a "skb->encapsulation = 1" added in above commit,
>> as its not needed, as frags inherit skb->frag from original
>> GSO skb.
>>
>> Reported-by: Alexei Starovoitov <ast@plumgrid.com>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Tested-by: Alexei Starovoitov <ast@plumgrid.com>
>> ---
>>  net/ipv4/af_inet.c |   15 +++++++--------
>>  1 file changed, 7 insertions(+), 8 deletions(-)
>>
>> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
>> index f4a159e705c0..09d78d4a3cff 100644
>> --- a/net/ipv4/af_inet.c
>> +++ b/net/ipv4/af_inet.c
>> @@ -1251,8 +1251,8 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>>       struct sk_buff *segs = ERR_PTR(-EINVAL);
>>       const struct net_offload *ops;
>>       unsigned int offset = 0;
>> +     bool udpfrag, encap;
>>       struct iphdr *iph;
>> -     bool tunnel;
>>       int proto;
>>       int nhoff;
>>       int ihl;
>> @@ -1290,8 +1290,8 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>>               goto out;
>>       __skb_pull(skb, ihl);
>>
>> -     tunnel = SKB_GSO_CB(skb)->encap_level > 0;
>> -     if (tunnel)
>> +     encap = SKB_GSO_CB(skb)->encap_level > 0;
>> +     if (encap)
>>               features = skb->dev->hw_enc_features & netif_skb_features(skb);
>>       SKB_GSO_CB(skb)->encap_level += ihl;
>>
>> @@ -1306,24 +1306,23 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>>       if (IS_ERR_OR_NULL(segs))
>>               goto out;
>>
>> +     udpfrag = !!skb->encapsulation && proto == IPPROTO_UDP;
>
> It seems I did an error here, shouldn't it be
>
>      udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
>
> instead ?

oops. yes.
the segs coming out of udp4_ufo_fragment() in case of udp_tunnel should not
be marked as ip frags.
ip frags are only for UFO.

Thank you for spotting it!

Alexei

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH net-next] inet: fix a UFO regression
  2013-11-07 23:27                 ` Alexei Starovoitov
@ 2013-11-08  0:44                   ` Eric Dumazet
  2013-11-08  1:39                     ` Alexei Starovoitov
  2013-11-08  2:32                     ` [PATCH v2 " Eric Dumazet
  2013-11-16 20:23                   ` [PATCH net-next] inet: restore gso for vxlan Or Gerlitz
  1 sibling, 2 replies; 35+ messages in thread
From: Eric Dumazet @ 2013-11-08  0:44 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Miller, netdev, Hannes Frederic Sowa

From: Eric Dumazet <edumazet@google.com>

While testing virtio_net and skb_segment() changes, Hannes reported
that UFO was sending wrong frames.

It appears this was introduced by a recent commit :
8c3a897bfab1 ("inet: restore gso for vxlan")

The old condition to perform IP frag was :

tunnel = !!skb->encapsulation;
...
	if (!tunnel && proto == IPPROTO_UDP) {

So the new one should be :

udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
...
	if (udpfrag) {

With help from Alexei Starovoitov

Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
---
 net/ipv4/af_inet.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 09d78d4a3cff..f17941486ab9 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1306,7 +1306,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 	if (IS_ERR_OR_NULL(segs))
 		goto out;
 
-	udpfrag = !!skb->encapsulation && proto == IPPROTO_UDP;
+	udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
 	skb = segs;
 	do {
 		iph = (struct iphdr *)(skb_mac_header(skb) + nhoff);

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next] inet: fix a UFO regression
  2013-11-08  0:44                   ` [PATCH net-next] inet: fix a UFO regression Eric Dumazet
@ 2013-11-08  1:39                     ` Alexei Starovoitov
  2013-11-08  2:08                       ` Eric Dumazet
  2013-11-08  2:32                     ` [PATCH v2 " Eric Dumazet
  1 sibling, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2013-11-08  1:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Hannes Frederic Sowa

On Thu, Nov 7, 2013 at 4:44 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> While testing virtio_net and skb_segment() changes, Hannes reported
> that UFO was sending wrong frames.
>
> It appears this was introduced by a recent commit :
> 8c3a897bfab1 ("inet: restore gso for vxlan")
>
> The old condition to perform IP frag was :
>
> tunnel = !!skb->encapsulation;
> ...
>         if (!tunnel && proto == IPPROTO_UDP) {
>
> So the new one should be :
>
> udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
> ...
>         if (udpfrag) {
>
> With help from Alexei Starovoitov
>
> Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Alexei Starovoitov <ast@plumgrid.com>
> ---
>  net/ipv4/af_inet.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 09d78d4a3cff..f17941486ab9 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1306,7 +1306,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>         if (IS_ERR_OR_NULL(segs))
>                 goto out;
>
> -       udpfrag = !!skb->encapsulation && proto == IPPROTO_UDP;
> +       udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;

as we discussed this will break vxlan :)
Please move udpfrag assignment before line:
                segs = ops->callbacks.gso_segment(skb, features);
since skb_udp_tunnel_segment() does skb->encapsulation = 0
then both ufo and vxlan should be fine

Thanks
Alexei

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next] inet: fix a UFO regression
  2013-11-08  1:39                     ` Alexei Starovoitov
@ 2013-11-08  2:08                       ` Eric Dumazet
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Dumazet @ 2013-11-08  2:08 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Miller, netdev, Hannes Frederic Sowa

On Thu, 2013-11-07 at 17:39 -0800, Alexei Starovoitov wrote:

> as we discussed this will break vxlan :)
> Please move udpfrag assignment before line:
>                 segs = ops->callbacks.gso_segment(skb, features);
> since skb_udp_tunnel_segment() does skb->encapsulation = 0
> then both ufo and vxlan should be fine

Oh that's right, thanks again, I'll send a V2

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 net-next] inet: fix a UFO regression
  2013-11-08  0:44                   ` [PATCH net-next] inet: fix a UFO regression Eric Dumazet
  2013-11-08  1:39                     ` Alexei Starovoitov
@ 2013-11-08  2:32                     ` Eric Dumazet
  2013-11-08  7:08                       ` David Miller
  1 sibling, 1 reply; 35+ messages in thread
From: Eric Dumazet @ 2013-11-08  2:32 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Miller, netdev, Hannes Frederic Sowa

From: Eric Dumazet <edumazet@google.com>

While testing virtio_net and skb_segment() changes, Hannes reported
that UFO was sending wrong frames.

It appears this was introduced by a recent commit :
8c3a897bfab1 ("inet: restore gso for vxlan")

The old condition to perform IP frag was :

tunnel = !!skb->encapsulation;
...
        if (!tunnel && proto == IPPROTO_UDP) {

So the new one should be :

udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
...
        if (udpfrag) {

Initialization of udpfrag must be done before call
to ops->callbacks.gso_segment(skb, features), as
skb_udp_tunnel_segment() clears skb->encapsulation

(We want udpfrag to be true for UFO, false for VXLAN)

With help from Alexei Starovoitov

Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
---
v2: move the udpfrag init, as spotted by Alexei.

 net/ipv4/af_inet.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 09d78d4a3cff..68af9aac91d0 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1299,6 +1299,9 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 
 	segs = ERR_PTR(-EPROTONOSUPPORT);
 
+	/* Note : following gso_segment() might change skb->encapsulation */
+	udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
+
 	ops = rcu_dereference(inet_offloads[proto]);
 	if (likely(ops && ops->callbacks.gso_segment))
 		segs = ops->callbacks.gso_segment(skb, features);
@@ -1306,7 +1309,6 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 	if (IS_ERR_OR_NULL(segs))
 		goto out;
 
-	udpfrag = !!skb->encapsulation && proto == IPPROTO_UDP;
 	skb = segs;
 	do {
 		iph = (struct iphdr *)(skb_mac_header(skb) + nhoff);

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 net-next] inet: fix a UFO regression
@ 2013-11-08  5:44 Alexei Starovoitov
  2013-11-08  5:50 ` Eric Dumazet
  2013-11-08  6:41 ` Sridhar Samudrala
  0 siblings, 2 replies; 35+ messages in thread
From: Alexei Starovoitov @ 2013-11-08  5:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Hannes Frederic Sowa

On Thu, Nov 7, 2013 at 6:32 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> While testing virtio_net and skb_segment() changes, Hannes reported
> that UFO was sending wrong frames.
>
> It appears this was introduced by a recent commit :
> 8c3a897bfab1 ("inet: restore gso for vxlan")
>
> The old condition to perform IP frag was :
>
> tunnel = !!skb->encapsulation;
> ...
>         if (!tunnel && proto == IPPROTO_UDP) {
>
> So the new one should be :
>
> udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
> ...
>         if (udpfrag) {
>
> Initialization of udpfrag must be done before call
> to ops->callbacks.gso_segment(skb, features), as
> skb_udp_tunnel_segment() clears skb->encapsulation
>
> (We want udpfrag to be true for UFO, false for VXLAN)
>
> With help from Alexei Starovoitov
>
> Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Alexei Starovoitov <ast@plumgrid.com>
> ---

vxlan looks good through namespaces with and without gso
and between physical machines via 10G nics

Tested-by: Alexei Starovoitov <ast@plumgrid.com>

Thanks Eric!

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 net-next] inet: fix a UFO regression
  2013-11-08  5:44 [PATCH v2 net-next] inet: fix a UFO regression Alexei Starovoitov
@ 2013-11-08  5:50 ` Eric Dumazet
  2013-11-08  6:41 ` Sridhar Samudrala
  1 sibling, 0 replies; 35+ messages in thread
From: Eric Dumazet @ 2013-11-08  5:50 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Miller, netdev, Hannes Frederic Sowa

On Thu, 2013-11-07 at 21:44 -0800, Alexei Starovoitov wrote:

> vxlan looks good through namespaces with and without gso
> and between physical machines via 10G nics
> 
> Tested-by: Alexei Starovoitov <ast@plumgrid.com>
> 
> Thanks Eric!

And I tested the UFO part as well.

Thanks Alexei !

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 net-next] inet: fix a UFO regression
  2013-11-08  5:44 [PATCH v2 net-next] inet: fix a UFO regression Alexei Starovoitov
  2013-11-08  5:50 ` Eric Dumazet
@ 2013-11-08  6:41 ` Sridhar Samudrala
  2013-11-08 12:18   ` Eric Dumazet
  1 sibling, 1 reply; 35+ messages in thread
From: Sridhar Samudrala @ 2013-11-08  6:41 UTC (permalink / raw)
  To: Alexei Starovoitov, Eric Dumazet
  Cc: David Miller, netdev, Hannes Frederic Sowa

On 11/7/2013 9:44 PM, Alexei Starovoitov wrote:
> On Thu, Nov 7, 2013 at 6:32 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> From: Eric Dumazet <edumazet@google.com>
>>
>> While testing virtio_net and skb_segment() changes, Hannes reported
>> that UFO was sending wrong frames.
>>
>> It appears this was introduced by a recent commit :
>> 8c3a897bfab1 ("inet: restore gso for vxlan")
>>
>> The old condition to perform IP frag was :
>>
>> tunnel = !!skb->encapsulation;
>> ...
>>          if (!tunnel && proto == IPPROTO_UDP) {
>>
>> So the new one should be :
>>
>> udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
>> ...
>>          if (udpfrag) {
>>
>> Initialization of udpfrag must be done before call
>> to ops->callbacks.gso_segment(skb, features), as
>> skb_udp_tunnel_segment() clears skb->encapsulation
>>
>> (We want udpfrag to be true for UFO, false for VXLAN)
>>
>> With help from Alexei Starovoitov
>>
>> Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Cc: Alexei Starovoitov <ast@plumgrid.com>
>> ---
> vxlan looks good through namespaces with and without gso
> and between physical machines via 10G nics

Does this fix also help vxlan performance between 2 VMs on 2 different 
physical m/cs across 10G NIC?
What is the throughput you are seeing via 10G nics?
With linux 3.12, i am only seeing around 2Gbps (iperf TCP_STREAM with 
16K messages)  between 2 VMs when using vxlan across a 10G nic.
Is this due to the overhead of software GSO and not doing GRO at the 
receiver?

Thanks
Sridhar
>
> Tested-by: Alexei Starovoitov <ast@plumgrid.com>
>
> Thanks Eric!
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 net-next] inet: fix a UFO regression
  2013-11-08  2:32                     ` [PATCH v2 " Eric Dumazet
@ 2013-11-08  7:08                       ` David Miller
  0 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2013-11-08  7:08 UTC (permalink / raw)
  To: eric.dumazet; +Cc: ast, netdev, hannes

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 07 Nov 2013 18:32:06 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> While testing virtio_net and skb_segment() changes, Hannes reported
> that UFO was sending wrong frames.
> 
> It appears this was introduced by a recent commit :
> 8c3a897bfab1 ("inet: restore gso for vxlan")
> 
> The old condition to perform IP frag was :
> 
> tunnel = !!skb->encapsulation;
> ...
>         if (!tunnel && proto == IPPROTO_UDP) {
> 
> So the new one should be :
> 
> udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
> ...
>         if (udpfrag) {
> 
> Initialization of udpfrag must be done before call
> to ops->callbacks.gso_segment(skb, features), as
> skb_udp_tunnel_segment() clears skb->encapsulation
> 
> (We want udpfrag to be true for UFO, false for VXLAN)
> 
> With help from Alexei Starovoitov
> 
> Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks a lot Eric.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 net-next] inet: fix a UFO regression
  2013-11-08  6:41 ` Sridhar Samudrala
@ 2013-11-08 12:18   ` Eric Dumazet
  2013-11-08 20:20     ` Sridhar Samudrala
  0 siblings, 1 reply; 35+ messages in thread
From: Eric Dumazet @ 2013-11-08 12:18 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: Alexei Starovoitov, David Miller, netdev, Hannes Frederic Sowa

On Thu, 2013-11-07 at 22:41 -0800, Sridhar Samudrala wrote:

> Does this fix also help vxlan performance between 2 VMs on 2 different 
> physical m/cs across 10G NIC?
> What is the throughput you are seeing via 10G nics?
> With linux 3.12, i am only seeing around 2Gbps (iperf TCP_STREAM with 
> 16K messages)  between 2 VMs when using vxlan across a 10G nic.
> Is this due to the overhead of software GSO and not doing GRO at the 
> receiver?

What NIC is used at sender, what NIC is used at receiver ?

If you use GRE instead of VXLAN, do you get any difference in speed ?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 net-next] inet: fix a UFO regression
  2013-11-08 12:18   ` Eric Dumazet
@ 2013-11-08 20:20     ` Sridhar Samudrala
  2013-11-08 20:57       ` Eric Dumazet
  0 siblings, 1 reply; 35+ messages in thread
From: Sridhar Samudrala @ 2013-11-08 20:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexei Starovoitov, David Miller, netdev, Hannes Frederic Sowa

On 11/8/2013 4:18 AM, Eric Dumazet wrote:
> On Thu, 2013-11-07 at 22:41 -0800, Sridhar Samudrala wrote:
>
>> Does this fix also help vxlan performance between 2 VMs on 2 different
>> physical m/cs across 10G NIC?
>> What is the throughput you are seeing via 10G nics?
>> With linux 3.12, i am only seeing around 2Gbps (iperf TCP_STREAM with
>> 16K messages)  between 2 VMs when using vxlan across a 10G nic.
>> Is this due to the overhead of software GSO and not doing GRO at the
>> receiver?
> What NIC is used at sender, what NIC is used at receiver ?
I am seeing almost similar results with both Emulex and Intel 10Gb NICs 
at both sender and receiver.
>
> If you use GRE instead of VXLAN, do you get any difference in speed ?
With GRE, i am seeing slightly better results. 3.2Gb/s compared to 
1.8Gb/s with vxlan.


Thanks
Sridhar

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 net-next] inet: fix a UFO regression
  2013-11-08 20:20     ` Sridhar Samudrala
@ 2013-11-08 20:57       ` Eric Dumazet
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Dumazet @ 2013-11-08 20:57 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: Alexei Starovoitov, David Miller, netdev, Hannes Frederic Sowa

On Fri, 2013-11-08 at 12:20 -0800, Sridhar Samudrala wrote:

> I am seeing almost similar results with both Emulex and Intel 10Gb NICs 
> at both sender and receiver.
> >
> > If you use GRE instead of VXLAN, do you get any difference in speed ?
> With GRE, i am seeing slightly better results. 3.2Gb/s compared to 
> 1.8Gb/s with vxlan.

Thanks, it seems to point a bottleneck at receiver, as we have GRO
support for GRE, but not yet for VXLAN.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next] inet: restore gso for vxlan
  2013-11-07 23:27                 ` Alexei Starovoitov
  2013-11-08  0:44                   ` [PATCH net-next] inet: fix a UFO regression Eric Dumazet
@ 2013-11-16 20:23                   ` Or Gerlitz
  2013-11-16 20:50                     ` Eric Dumazet
  1 sibling, 1 reply; 35+ messages in thread
From: Or Gerlitz @ 2013-11-16 20:23 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric Dumazet, David Miller, netdev@vger.kernel.org,
	Hannes Frederic Sowa

On Fri, Nov 8, 2013,  Alexei Starovoitov <ast@plumgrid.com> wrote:
> On Thu, Nov 7, 2013 , Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Sun, 2013-10-27 , Eric Dumazet wrote:
>>> From: Eric Dumazet <edumazet@google.com>


>>> Alexei reported a performance regression on vxlan, caused
>>> by commit 3347c9602955 "ipv4: gso: make inet_gso_segment() stackable"

>>> GSO vxlan packets were not properly segmented, adding IP fragments
>>> while they were not expected.

>>> Rename 'bool tunnel' to 'bool encap', and add a new boolean
>>> to express the fact that UDP should be fragmented.
>>> This fragmentation is triggered by skb->encapsulation being set.
>>>
>>> Remove a "skb->encapsulation = 1" added in above commit,
>>> as its not needed, as frags inherit skb->frag from original
>>> GSO skb.
>>>
>>> Reported-by: Alexei Starovoitov <ast@plumgrid.com>
>>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>>> Tested-by: Alexei Starovoitov <ast@plumgrid.com>
>>> ---
>>>  net/ipv4/af_inet.c |   15 +++++++--------
>>>  1 file changed, 7 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
>>> index f4a159e705c0..09d78d4a3cff 100644
>>> --- a/net/ipv4/af_inet.c
>>> +++ b/net/ipv4/af_inet.c
>>> @@ -1251,8 +1251,8 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>>>       struct sk_buff *segs = ERR_PTR(-EINVAL);
>>>       const struct net_offload *ops;
>>>       unsigned int offset = 0;
>>> +     bool udpfrag, encap;
>>>       struct iphdr *iph;
>>> -     bool tunnel;
>>>       int proto;
>>>       int nhoff;
>>>       int ihl;
>>> @@ -1290,8 +1290,8 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>>>               goto out;
>>>       __skb_pull(skb, ihl);
>>>
>>> -     tunnel = SKB_GSO_CB(skb)->encap_level > 0;
>>> -     if (tunnel)
>>> +     encap = SKB_GSO_CB(skb)->encap_level > 0;
>>> +     if (encap)
>>>               features = skb->dev->hw_enc_features & netif_skb_features(skb);
>>>       SKB_GSO_CB(skb)->encap_level += ihl;
>>>
>>> @@ -1306,24 +1306,23 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>>>       if (IS_ERR_OR_NULL(segs))
>>>               goto out;
>>>
>>> +     udpfrag = !!skb->encapsulation && proto == IPPROTO_UDP;
>>
>> It seems I did an error here, shouldn't it be
>>
>>      udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
>>
>> instead ?
>
> oops. yes.
> the segs coming out of udp4_ufo_fragment() in case of udp_tunnel should not
> be marked as ip frags. ip frags are only for UFO.
>
> Thank you for spotting it!

Guys, just to clarify, is this fixed by now? where, net-next?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next] inet: restore gso for vxlan
  2013-11-16 20:23                   ` [PATCH net-next] inet: restore gso for vxlan Or Gerlitz
@ 2013-11-16 20:50                     ` Eric Dumazet
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Dumazet @ 2013-11-16 20:50 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Alexei Starovoitov, David Miller, netdev@vger.kernel.org,
	Hannes Frederic Sowa

On Sat, 2013-11-16 at 22:23 +0200, Or Gerlitz wrote:

> Guys, just to clarify, is this fixed by now? where, net-next?

Should be fixed on all trees :

commit dcd607718385d02ce3741de225927a57f528f93b
("inet: fix a UFO regression")

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2013-11-16 20:50 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-24 23:37 vxlan gso is broken by stackable gso_segment() Alexei Starovoitov
2013-10-25  0:41 ` Eric Dumazet
2013-10-25  1:59   ` Alexei Starovoitov
2013-10-25  4:06     ` Eric Dumazet
2013-10-25  9:09       ` Eric Dumazet
2013-10-25 22:18         ` David Miller
2013-10-25 22:41           ` Alexei Starovoitov
2013-10-25 23:10             ` David Miller
2013-10-25 23:25             ` Eric Dumazet
2013-10-26  0:26               ` [PATCH net-next] tcp: gso: fix truesize tracking Eric Dumazet
2013-10-26  1:46                 ` Alexei Starovoitov
2013-10-28  4:58                 ` David Miller
2013-10-29  4:05                 ` David Miller
2013-10-26  0:52               ` vxlan gso is broken by stackable gso_segment() Eric Dumazet
2013-10-26  1:25                 ` [PATCH net-next] veth: extend features to support tunneling Eric Dumazet
2013-10-26  2:22                   ` Alexei Starovoitov
2013-10-26  4:13                     ` Eric Dumazet
2013-10-28  4:58                   ` David Miller
2013-10-28  1:18             ` [PATCH net-next] inet: restore gso for vxlan Eric Dumazet
2013-10-28  4:25               ` David Miller
2013-11-07 22:33               ` Eric Dumazet
2013-11-07 23:27                 ` Alexei Starovoitov
2013-11-08  0:44                   ` [PATCH net-next] inet: fix a UFO regression Eric Dumazet
2013-11-08  1:39                     ` Alexei Starovoitov
2013-11-08  2:08                       ` Eric Dumazet
2013-11-08  2:32                     ` [PATCH v2 " Eric Dumazet
2013-11-08  7:08                       ` David Miller
2013-11-16 20:23                   ` [PATCH net-next] inet: restore gso for vxlan Or Gerlitz
2013-11-16 20:50                     ` Eric Dumazet
  -- strict thread matches above, loose matches on Subject: below --
2013-11-08  5:44 [PATCH v2 net-next] inet: fix a UFO regression Alexei Starovoitov
2013-11-08  5:50 ` Eric Dumazet
2013-11-08  6:41 ` Sridhar Samudrala
2013-11-08 12:18   ` Eric Dumazet
2013-11-08 20:20     ` Sridhar Samudrala
2013-11-08 20:57       ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).