* Re: [PATCH v2 net-next] inet: fix a UFO regression
@ 2013-11-08 5:44 Alexei Starovoitov
2013-11-08 5:50 ` Eric Dumazet
2013-11-08 6:41 ` Sridhar Samudrala
0 siblings, 2 replies; 8+ messages in thread
From: Alexei Starovoitov @ 2013-11-08 5:44 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Hannes Frederic Sowa
On Thu, Nov 7, 2013 at 6:32 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> While testing virtio_net and skb_segment() changes, Hannes reported
> that UFO was sending wrong frames.
>
> It appears this was introduced by a recent commit :
> 8c3a897bfab1 ("inet: restore gso for vxlan")
>
> The old condition to perform IP frag was :
>
> tunnel = !!skb->encapsulation;
> ...
> if (!tunnel && proto == IPPROTO_UDP) {
>
> So the new one should be :
>
> udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
> ...
> if (udpfrag) {
>
> Initialization of udpfrag must be done before call
> to ops->callbacks.gso_segment(skb, features), as
> skb_udp_tunnel_segment() clears skb->encapsulation
>
> (We want udpfrag to be true for UFO, false for VXLAN)
>
> With help from Alexei Starovoitov
>
> Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Alexei Starovoitov <ast@plumgrid.com>
> ---
vxlan looks good through namespaces with and without gso
and between physical machines via 10G nics
Tested-by: Alexei Starovoitov <ast@plumgrid.com>
Thanks Eric!
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 net-next] inet: fix a UFO regression
2013-11-08 5:44 [PATCH v2 net-next] inet: fix a UFO regression Alexei Starovoitov
@ 2013-11-08 5:50 ` Eric Dumazet
2013-11-08 6:41 ` Sridhar Samudrala
1 sibling, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2013-11-08 5:50 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: David Miller, netdev, Hannes Frederic Sowa
On Thu, 2013-11-07 at 21:44 -0800, Alexei Starovoitov wrote:
> vxlan looks good through namespaces with and without gso
> and between physical machines via 10G nics
>
> Tested-by: Alexei Starovoitov <ast@plumgrid.com>
>
> Thanks Eric!
And I tested the UFO part as well.
Thanks Alexei !
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 net-next] inet: fix a UFO regression
2013-11-08 5:44 [PATCH v2 net-next] inet: fix a UFO regression Alexei Starovoitov
2013-11-08 5:50 ` Eric Dumazet
@ 2013-11-08 6:41 ` Sridhar Samudrala
2013-11-08 12:18 ` Eric Dumazet
1 sibling, 1 reply; 8+ messages in thread
From: Sridhar Samudrala @ 2013-11-08 6:41 UTC (permalink / raw)
To: Alexei Starovoitov, Eric Dumazet
Cc: David Miller, netdev, Hannes Frederic Sowa
On 11/7/2013 9:44 PM, Alexei Starovoitov wrote:
> On Thu, Nov 7, 2013 at 6:32 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> From: Eric Dumazet <edumazet@google.com>
>>
>> While testing virtio_net and skb_segment() changes, Hannes reported
>> that UFO was sending wrong frames.
>>
>> It appears this was introduced by a recent commit :
>> 8c3a897bfab1 ("inet: restore gso for vxlan")
>>
>> The old condition to perform IP frag was :
>>
>> tunnel = !!skb->encapsulation;
>> ...
>> if (!tunnel && proto == IPPROTO_UDP) {
>>
>> So the new one should be :
>>
>> udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
>> ...
>> if (udpfrag) {
>>
>> Initialization of udpfrag must be done before call
>> to ops->callbacks.gso_segment(skb, features), as
>> skb_udp_tunnel_segment() clears skb->encapsulation
>>
>> (We want udpfrag to be true for UFO, false for VXLAN)
>>
>> With help from Alexei Starovoitov
>>
>> Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Cc: Alexei Starovoitov <ast@plumgrid.com>
>> ---
> vxlan looks good through namespaces with and without gso
> and between physical machines via 10G nics
Does this fix also help vxlan performance between 2 VMs on 2 different
physical m/cs across 10G NIC?
What is the throughput you are seeing via 10G nics?
With linux 3.12, i am only seeing around 2Gbps (iperf TCP_STREAM with
16K messages) between 2 VMs when using vxlan across a 10G nic.
Is this due to the overhead of software GSO and not doing GRO at the
receiver?
Thanks
Sridhar
>
> Tested-by: Alexei Starovoitov <ast@plumgrid.com>
>
> Thanks Eric!
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 net-next] inet: fix a UFO regression
2013-11-08 6:41 ` Sridhar Samudrala
@ 2013-11-08 12:18 ` Eric Dumazet
2013-11-08 20:20 ` Sridhar Samudrala
0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2013-11-08 12:18 UTC (permalink / raw)
To: Sridhar Samudrala
Cc: Alexei Starovoitov, David Miller, netdev, Hannes Frederic Sowa
On Thu, 2013-11-07 at 22:41 -0800, Sridhar Samudrala wrote:
> Does this fix also help vxlan performance between 2 VMs on 2 different
> physical m/cs across 10G NIC?
> What is the throughput you are seeing via 10G nics?
> With linux 3.12, i am only seeing around 2Gbps (iperf TCP_STREAM with
> 16K messages) between 2 VMs when using vxlan across a 10G nic.
> Is this due to the overhead of software GSO and not doing GRO at the
> receiver?
What NIC is used at sender, what NIC is used at receiver ?
If you use GRE instead of VXLAN, do you get any difference in speed ?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 net-next] inet: fix a UFO regression
2013-11-08 12:18 ` Eric Dumazet
@ 2013-11-08 20:20 ` Sridhar Samudrala
2013-11-08 20:57 ` Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: Sridhar Samudrala @ 2013-11-08 20:20 UTC (permalink / raw)
To: Eric Dumazet
Cc: Alexei Starovoitov, David Miller, netdev, Hannes Frederic Sowa
On 11/8/2013 4:18 AM, Eric Dumazet wrote:
> On Thu, 2013-11-07 at 22:41 -0800, Sridhar Samudrala wrote:
>
>> Does this fix also help vxlan performance between 2 VMs on 2 different
>> physical m/cs across 10G NIC?
>> What is the throughput you are seeing via 10G nics?
>> With linux 3.12, i am only seeing around 2Gbps (iperf TCP_STREAM with
>> 16K messages) between 2 VMs when using vxlan across a 10G nic.
>> Is this due to the overhead of software GSO and not doing GRO at the
>> receiver?
> What NIC is used at sender, what NIC is used at receiver ?
I am seeing almost similar results with both Emulex and Intel 10Gb NICs
at both sender and receiver.
>
> If you use GRE instead of VXLAN, do you get any difference in speed ?
With GRE, i am seeing slightly better results. 3.2Gb/s compared to
1.8Gb/s with vxlan.
Thanks
Sridhar
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 net-next] inet: fix a UFO regression
2013-11-08 20:20 ` Sridhar Samudrala
@ 2013-11-08 20:57 ` Eric Dumazet
0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2013-11-08 20:57 UTC (permalink / raw)
To: Sridhar Samudrala
Cc: Alexei Starovoitov, David Miller, netdev, Hannes Frederic Sowa
On Fri, 2013-11-08 at 12:20 -0800, Sridhar Samudrala wrote:
> I am seeing almost similar results with both Emulex and Intel 10Gb NICs
> at both sender and receiver.
> >
> > If you use GRE instead of VXLAN, do you get any difference in speed ?
> With GRE, i am seeing slightly better results. 3.2Gb/s compared to
> 1.8Gb/s with vxlan.
Thanks, it seems to point a bottleneck at receiver, as we have GRO
support for GRE, but not yet for VXLAN.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: vxlan gso is broken by stackable gso_segment()
@ 2013-10-25 1:59 Alexei Starovoitov
2013-10-25 4:06 ` Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: Alexei Starovoitov @ 2013-10-25 1:59 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Eric Dumazet, Stephen Hemminger, David S. Miller, netdev
gre seems to be fine.
packets seem to be segmented with wrong length and being dropped.
After client iperf is finished, in few seconds I see the warning:
[ 329.669685] WARNING: CPU: 3 PID: 3817 at net/core/skbuff.c:3474
skb_try_coalesce+0x3a0/0x3f0()
[ 329.669688] Modules linked in: vxlan ip_tunnel veth ip6table_filter
ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4
xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp
iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap
macvlan vhost kvm_intel kvm iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi dm_crypt hid_generic eeepc_wmi asus_wmi
sparse_keymap mxm_wmi dm_multipath psmouse serio_raw usbhid hid
parport_pc ppdev firewire_ohci e1000e firewire_core lpc_ich crc_itu_t
binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config
i2o_block video
[ 329.669746] CPU: 3 PID: 3817 Comm: iperf Not tainted 3.12.0-rc6+ #81
[ 329.669748] Hardware name: System manufacturer System Product
Name/P8Z77 WS, BIOS 3007 07/26/2012
[ 329.669750] 0000000000000009 ffff88082fb839d8 ffffffff8175427a
0000000000000002
[ 329.669756] 0000000000000000 ffff88082fb83a18 ffffffff8105206c
ffff880808f926f8
[ 329.669760] ffff8807ef122b00 ffff8807ef122a00 0000000000000576
ffff88082fb83a94
[ 329.669765] Call Trace:
[ 329.669767] <IRQ> [<ffffffff8175427a>] dump_stack+0x55/0x76
[ 329.669779] [<ffffffff8105206c>] warn_slowpath_common+0x8c/0xc0
[ 329.669783] [<ffffffff810520ba>] warn_slowpath_null+0x1a/0x20
[ 329.669787] [<ffffffff816150f0>] skb_try_coalesce+0x3a0/0x3f0
[ 329.669793] [<ffffffff8167bce4>] tcp_try_coalesce.part.44+0x34/0xa0
[ 329.669797] [<ffffffff8167d168>] tcp_queue_rcv+0x108/0x150
[ 329.669801] [<ffffffff8167f129>] tcp_data_queue+0x299/0xd00
[ 329.669806] [<ffffffff816822f4>] tcp_rcv_established+0x2d4/0x8f0
[ 329.669809] [<ffffffff8168d8b5>] tcp_v4_do_rcv+0x295/0x520
[ 329.669813] [<ffffffff8168fb08>] tcp_v4_rcv+0x888/0xc30
[ 329.669818] [<ffffffff816651d3>] ? ip_local_deliver_finish+0x43/0x480
[ 329.669823] [<ffffffff810cae04>] ? __lock_is_held+0x54/0x80
[ 329.669827] [<ffffffff816652fb>] ip_local_deliver_finish+0x16b/0x480
[ 329.669831] [<ffffffff816651d3>] ? ip_local_deliver_finish+0x43/0x480
[ 329.669836] [<ffffffff81666018>] ip_local_deliver+0x48/0x80
[ 329.669840] [<ffffffff81665770>] ip_rcv_finish+0x160/0x770
[ 329.669845] [<ffffffff816662f8>] ip_rcv+0x2a8/0x3e0
[ 329.669849] [<ffffffff81623d13>] __netif_receive_skb_core+0xa63/0xdb0
[ 329.669853] [<ffffffff816233b8>] ? __netif_receive_skb_core+0x108/0xdb0
[ 329.669858] [<ffffffff8175d37f>] ? _raw_spin_unlock_irqrestore+0x3f/0x70
[ 329.669862] [<ffffffff8162417b>] ? process_backlog+0xab/0x180
[ 329.669866] [<ffffffff81624081>] __netif_receive_skb+0x21/0x70
[ 329.669869] [<ffffffff81624184>] process_backlog+0xb4/0x180
[ 329.669873] [<ffffffff81626d08>] ? net_rx_action+0x98/0x350
[ 329.669876] [<ffffffff81626dca>] net_rx_action+0x15a/0x350
[ 329.669882] [<ffffffff81057f97>] __do_softirq+0xf7/0x3f0
[ 329.669886] [<ffffffff8176820c>] call_softirq+0x1c/0x30
[ 329.669887] <EOI> [<ffffffff81004bed>] do_softirq+0x8d/0xc0
[ 329.669896] [<ffffffff8160de03>] ? release_sock+0x193/0x1f0
[ 329.669901] [<ffffffff81057a5b>] local_bh_enable_ip+0xdb/0xf0
[ 329.669906] [<ffffffff8175d2e4>] _raw_spin_unlock_bh+0x44/0x50
[ 329.669910] [<ffffffff8160de03>] release_sock+0x193/0x1f0
[ 329.669914] [<ffffffff81679237>] tcp_recvmsg+0x467/0x1030
[ 329.669919] [<ffffffff816ab424>] inet_recvmsg+0x134/0x230
[ 329.669923] [<ffffffff8160a17d>] sock_recvmsg+0xad/0xe0
to reproduce do:
$ sudo brctl addbr br0
$ sudo ifconfig br0 up
$ cat foo1.conf
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = br0
lxc.network.ipv4 = 10.2.3.5/24
$sudo lxc-start -n foo1 -f ./foo1.conf bash
#ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0
#ip addr add 192.168.99.1/24 dev vxlan0
#ip link set up dev vxlan0
#iperf -s
similar for another lxc with different IP
$sudo lxc-start -n foo2 -f ./foo2.conf bash
#ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0
#ip addr add 192.168.99.2/24 dev vxlan0
#ip link set up dev vxlan0
# iperf -c 192.168.99.1
I keep hitting it all the time.
On Thu, Oct 24, 2013 at 5:41 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2013-10-24 at 16:37 -0700, Alexei Starovoitov wrote:
>> Hi Eric, Stephen,
>>
>> it seems commit 3347c960 "ipv4: gso: make inet_gso_segment() stackable"
>> broke vxlan gso
>>
>> the way to reproduce:
>> start two lxc with veth and bridge between them
>> create vxlan dev in both containers
>> do iperf
>>
>> this setup on net-next does ~80 Mbps and a lot of tcp retransmits.
>> reverting 3347c960 and d3e5e006 gets performance back to ~230 Mbps
>>
>> I guess vxlan driver suppose to set encap_level ? Some other way?
>
> Hi Alexei
>
> Are the GRE tunnels broken as well for you ?
>
> In my testings, GRE was working, and it looks GRE and vxlan has quite
> similar gso implementation.
>
> Maybe you can capture some of the broken frames with tcpdump ?
>
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: vxlan gso is broken by stackable gso_segment()
2013-10-25 1:59 vxlan gso is broken by stackable gso_segment() Alexei Starovoitov
@ 2013-10-25 4:06 ` Eric Dumazet
2013-10-25 9:09 ` Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2013-10-25 4:06 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Eric Dumazet, Stephen Hemminger, David S. Miller, netdev
On Thu, 2013-10-24 at 18:59 -0700, Alexei Starovoitov wrote:
> gre seems to be fine.
> packets seem to be segmented with wrong length and being dropped.
> After client iperf is finished, in few seconds I see the warning:
>
> [ 329.669685] WARNING: CPU: 3 PID: 3817 at net/core/skbuff.c:3474
> skb_try_coalesce+0x3a0/0x3f0()
> [ 329.669688] Modules linked in: vxlan ip_tunnel veth ip6table_filter
> ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4
> xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp
> iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap
> macvlan vhost kvm_intel kvm iscsi_tcp libiscsi_tcp libiscsi
> scsi_transport_iscsi dm_crypt hid_generic eeepc_wmi asus_wmi
> sparse_keymap mxm_wmi dm_multipath psmouse serio_raw usbhid hid
> parport_pc ppdev firewire_ohci e1000e firewire_core lpc_ich crc_itu_t
> binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config
> i2o_block video
> [ 329.669746] CPU: 3 PID: 3817 Comm: iperf Not tainted 3.12.0-rc6+ #81
> [ 329.669748] Hardware name: System manufacturer System Product
> Name/P8Z77 WS, BIOS 3007 07/26/2012
> [ 329.669750] 0000000000000009 ffff88082fb839d8 ffffffff8175427a
> 0000000000000002
> [ 329.669756] 0000000000000000 ffff88082fb83a18 ffffffff8105206c
> ffff880808f926f8
> [ 329.669760] ffff8807ef122b00 ffff8807ef122a00 0000000000000576
> ffff88082fb83a94
> [ 329.669765] Call Trace:
> [ 329.669767] <IRQ> [<ffffffff8175427a>] dump_stack+0x55/0x76
> [ 329.669779] [<ffffffff8105206c>] warn_slowpath_common+0x8c/0xc0
> [ 329.669783] [<ffffffff810520ba>] warn_slowpath_null+0x1a/0x20
> [ 329.669787] [<ffffffff816150f0>] skb_try_coalesce+0x3a0/0x3f0
> [ 329.669793] [<ffffffff8167bce4>] tcp_try_coalesce.part.44+0x34/0xa0
> [ 329.669797] [<ffffffff8167d168>] tcp_queue_rcv+0x108/0x150
> [ 329.669801] [<ffffffff8167f129>] tcp_data_queue+0x299/0xd00
> [ 329.669806] [<ffffffff816822f4>] tcp_rcv_established+0x2d4/0x8f0
> [ 329.669809] [<ffffffff8168d8b5>] tcp_v4_do_rcv+0x295/0x520
> [ 329.669813] [<ffffffff8168fb08>] tcp_v4_rcv+0x888/0xc30
> [ 329.669818] [<ffffffff816651d3>] ? ip_local_deliver_finish+0x43/0x480
> [ 329.669823] [<ffffffff810cae04>] ? __lock_is_held+0x54/0x80
> [ 329.669827] [<ffffffff816652fb>] ip_local_deliver_finish+0x16b/0x480
> [ 329.669831] [<ffffffff816651d3>] ? ip_local_deliver_finish+0x43/0x480
> [ 329.669836] [<ffffffff81666018>] ip_local_deliver+0x48/0x80
> [ 329.669840] [<ffffffff81665770>] ip_rcv_finish+0x160/0x770
> [ 329.669845] [<ffffffff816662f8>] ip_rcv+0x2a8/0x3e0
> [ 329.669849] [<ffffffff81623d13>] __netif_receive_skb_core+0xa63/0xdb0
> [ 329.669853] [<ffffffff816233b8>] ? __netif_receive_skb_core+0x108/0xdb0
> [ 329.669858] [<ffffffff8175d37f>] ? _raw_spin_unlock_irqrestore+0x3f/0x70
> [ 329.669862] [<ffffffff8162417b>] ? process_backlog+0xab/0x180
> [ 329.669866] [<ffffffff81624081>] __netif_receive_skb+0x21/0x70
> [ 329.669869] [<ffffffff81624184>] process_backlog+0xb4/0x180
> [ 329.669873] [<ffffffff81626d08>] ? net_rx_action+0x98/0x350
> [ 329.669876] [<ffffffff81626dca>] net_rx_action+0x15a/0x350
> [ 329.669882] [<ffffffff81057f97>] __do_softirq+0xf7/0x3f0
> [ 329.669886] [<ffffffff8176820c>] call_softirq+0x1c/0x30
> [ 329.669887] <EOI> [<ffffffff81004bed>] do_softirq+0x8d/0xc0
> [ 329.669896] [<ffffffff8160de03>] ? release_sock+0x193/0x1f0
> [ 329.669901] [<ffffffff81057a5b>] local_bh_enable_ip+0xdb/0xf0
> [ 329.669906] [<ffffffff8175d2e4>] _raw_spin_unlock_bh+0x44/0x50
> [ 329.669910] [<ffffffff8160de03>] release_sock+0x193/0x1f0
> [ 329.669914] [<ffffffff81679237>] tcp_recvmsg+0x467/0x1030
> [ 329.669919] [<ffffffff816ab424>] inet_recvmsg+0x134/0x230
> [ 329.669923] [<ffffffff8160a17d>] sock_recvmsg+0xad/0xe0
>
> to reproduce do:
> $ sudo brctl addbr br0
> $ sudo ifconfig br0 up
> $ cat foo1.conf
> lxc.network.type = veth
> lxc.network.flags = up
> lxc.network.link = br0
> lxc.network.ipv4 = 10.2.3.5/24
> $sudo lxc-start -n foo1 -f ./foo1.conf bash
> #ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0
> #ip addr add 192.168.99.1/24 dev vxlan0
> #ip link set up dev vxlan0
> #iperf -s
>
> similar for another lxc with different IP
> $sudo lxc-start -n foo2 -f ./foo2.conf bash
> #ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0
> #ip addr add 192.168.99.2/24 dev vxlan0
> #ip link set up dev vxlan0
> # iperf -c 192.168.99.1
>
> I keep hitting it all the time.
>
>
Thanks for all these details.
I am in Edinburgh for the Kernel Summit, I'll take a look at this as
soon as possible.
> On Thu, Oct 24, 2013 at 5:41 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Thu, 2013-10-24 at 16:37 -0700, Alexei Starovoitov wrote:
> >> Hi Eric, Stephen,
> >>
> >> it seems commit 3347c960 "ipv4: gso: make inet_gso_segment() stackable"
> >> broke vxlan gso
> >>
> >> the way to reproduce:
> >> start two lxc with veth and bridge between them
> >> create vxlan dev in both containers
> >> do iperf
> >>
> >> this setup on net-next does ~80 Mbps and a lot of tcp retransmits.
> >> reverting 3347c960 and d3e5e006 gets performance back to ~230 Mbps
> >>
> >> I guess vxlan driver suppose to set encap_level ? Some other way?
> >
> > Hi Alexei
> >
> > Are the GRE tunnels broken as well for you ?
> >
> > In my testings, GRE was working, and it looks GRE and vxlan has quite
> > similar gso implementation.
> >
> > Maybe you can capture some of the broken frames with tcpdump ?
> >
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: vxlan gso is broken by stackable gso_segment()
2013-10-25 4:06 ` Eric Dumazet
@ 2013-10-25 9:09 ` Eric Dumazet
2013-10-25 22:18 ` David Miller
0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2013-10-25 9:09 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Eric Dumazet, Stephen Hemminger, David S. Miller, netdev
On Thu, 2013-10-24 at 21:06 -0700, Eric Dumazet wrote:
> On Thu, 2013-10-24 at 18:59 -0700, Alexei Starovoitov wrote:
> > gre seems to be fine.
> > packets seem to be segmented with wrong length and being dropped.
> > After client iperf is finished, in few seconds I see the warning:
> >
> > [ 329.669685] WARNING: CPU: 3 PID: 3817 at net/core/skbuff.c:3474
> > skb_try_coalesce+0x3a0/0x3f0()
> > [ 329.669688] Modules linked in: vxlan ip_tunnel veth ip6table_filter
> > ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4
> > xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp
> > iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap
> > macvlan vhost kvm_intel kvm iscsi_tcp libiscsi_tcp libiscsi
> > scsi_transport_iscsi dm_crypt hid_generic eeepc_wmi asus_wmi
> > sparse_keymap mxm_wmi dm_multipath psmouse serio_raw usbhid hid
> > parport_pc ppdev firewire_ohci e1000e firewire_core lpc_ich crc_itu_t
> > binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config
> > i2o_block video
> > [ 329.669746] CPU: 3 PID: 3817 Comm: iperf Not tainted 3.12.0-rc6+ #81
> > [ 329.669748] Hardware name: System manufacturer System Product
> > Name/P8Z77 WS, BIOS 3007 07/26/2012
> > [ 329.669750] 0000000000000009 ffff88082fb839d8 ffffffff8175427a
> > 0000000000000002
> > [ 329.669756] 0000000000000000 ffff88082fb83a18 ffffffff8105206c
> > ffff880808f926f8
> > [ 329.669760] ffff8807ef122b00 ffff8807ef122a00 0000000000000576
> > ffff88082fb83a94
> > [ 329.669765] Call Trace:
> > [ 329.669767] <IRQ> [<ffffffff8175427a>] dump_stack+0x55/0x76
> > [ 329.669779] [<ffffffff8105206c>] warn_slowpath_common+0x8c/0xc0
> > [ 329.669783] [<ffffffff810520ba>] warn_slowpath_null+0x1a/0x20
> > [ 329.669787] [<ffffffff816150f0>] skb_try_coalesce+0x3a0/0x3f0
> > [ 329.669793] [<ffffffff8167bce4>] tcp_try_coalesce.part.44+0x34/0xa0
> > [ 329.669797] [<ffffffff8167d168>] tcp_queue_rcv+0x108/0x150
> > [ 329.669801] [<ffffffff8167f129>] tcp_data_queue+0x299/0xd00
> > [ 329.669806] [<ffffffff816822f4>] tcp_rcv_established+0x2d4/0x8f0
> > [ 329.669809] [<ffffffff8168d8b5>] tcp_v4_do_rcv+0x295/0x520
> > [ 329.669813] [<ffffffff8168fb08>] tcp_v4_rcv+0x888/0xc30
> > [ 329.669818] [<ffffffff816651d3>] ? ip_local_deliver_finish+0x43/0x480
> > [ 329.669823] [<ffffffff810cae04>] ? __lock_is_held+0x54/0x80
> > [ 329.669827] [<ffffffff816652fb>] ip_local_deliver_finish+0x16b/0x480
> > [ 329.669831] [<ffffffff816651d3>] ? ip_local_deliver_finish+0x43/0x480
> > [ 329.669836] [<ffffffff81666018>] ip_local_deliver+0x48/0x80
> > [ 329.669840] [<ffffffff81665770>] ip_rcv_finish+0x160/0x770
> > [ 329.669845] [<ffffffff816662f8>] ip_rcv+0x2a8/0x3e0
> > [ 329.669849] [<ffffffff81623d13>] __netif_receive_skb_core+0xa63/0xdb0
> > [ 329.669853] [<ffffffff816233b8>] ? __netif_receive_skb_core+0x108/0xdb0
> > [ 329.669858] [<ffffffff8175d37f>] ? _raw_spin_unlock_irqrestore+0x3f/0x70
> > [ 329.669862] [<ffffffff8162417b>] ? process_backlog+0xab/0x180
> > [ 329.669866] [<ffffffff81624081>] __netif_receive_skb+0x21/0x70
> > [ 329.669869] [<ffffffff81624184>] process_backlog+0xb4/0x180
> > [ 329.669873] [<ffffffff81626d08>] ? net_rx_action+0x98/0x350
> > [ 329.669876] [<ffffffff81626dca>] net_rx_action+0x15a/0x350
> > [ 329.669882] [<ffffffff81057f97>] __do_softirq+0xf7/0x3f0
> > [ 329.669886] [<ffffffff8176820c>] call_softirq+0x1c/0x30
> > [ 329.669887] <EOI> [<ffffffff81004bed>] do_softirq+0x8d/0xc0
> > [ 329.669896] [<ffffffff8160de03>] ? release_sock+0x193/0x1f0
> > [ 329.669901] [<ffffffff81057a5b>] local_bh_enable_ip+0xdb/0xf0
> > [ 329.669906] [<ffffffff8175d2e4>] _raw_spin_unlock_bh+0x44/0x50
> > [ 329.669910] [<ffffffff8160de03>] release_sock+0x193/0x1f0
> > [ 329.669914] [<ffffffff81679237>] tcp_recvmsg+0x467/0x1030
> > [ 329.669919] [<ffffffff816ab424>] inet_recvmsg+0x134/0x230
> > [ 329.669923] [<ffffffff8160a17d>] sock_recvmsg+0xad/0xe0
> >
> > to reproduce do:
> > $ sudo brctl addbr br0
> > $ sudo ifconfig br0 up
> > $ cat foo1.conf
> > lxc.network.type = veth
> > lxc.network.flags = up
> > lxc.network.link = br0
> > lxc.network.ipv4 = 10.2.3.5/24
> > $sudo lxc-start -n foo1 -f ./foo1.conf bash
> > #ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0
> > #ip addr add 192.168.99.1/24 dev vxlan0
> > #ip link set up dev vxlan0
> > #iperf -s
> >
> > similar for another lxc with different IP
> > $sudo lxc-start -n foo2 -f ./foo2.conf bash
> > #ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0
> > #ip addr add 192.168.99.2/24 dev vxlan0
> > #ip link set up dev vxlan0
> > # iperf -c 192.168.99.1
> >
> > I keep hitting it all the time.
> >
> >
>
> Thanks for all these details.
>
> I am in Edinburgh for the Kernel Summit, I'll take a look at this as
> soon as possible.
Could you try following fix ?
Thanks !
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index f4a159e..17dd8320 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1252,6 +1252,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
const struct net_offload *ops;
unsigned int offset = 0;
struct iphdr *iph;
+ bool udpfrag;
bool tunnel;
int proto;
int nhoff;
@@ -1306,10 +1307,11 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
if (IS_ERR_OR_NULL(segs))
goto out;
+ udpfrag = !!skb->encapsulation && proto == IPPROTO_UDP;
skb = segs;
do {
iph = (struct iphdr *)(skb_mac_header(skb) + nhoff);
- if (!tunnel && proto == IPPROTO_UDP) {
+ if (udpfrag) {
iph->id = htons(id);
iph->frag_off = htons(offset >> 3);
if (skb->next != NULL)
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: vxlan gso is broken by stackable gso_segment()
2013-10-25 9:09 ` Eric Dumazet
@ 2013-10-25 22:18 ` David Miller
2013-10-25 22:41 ` Alexei Starovoitov
0 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2013-10-25 22:18 UTC (permalink / raw)
To: eric.dumazet; +Cc: ast, edumazet, stephen, netdev
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 25 Oct 2013 02:09:00 -0700
> @@ -1252,6 +1252,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
> const struct net_offload *ops;
> unsigned int offset = 0;
> struct iphdr *iph;
> + bool udpfrag;
> bool tunnel;
> int proto;
> int nhoff;
> @@ -1306,10 +1307,11 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
> if (IS_ERR_OR_NULL(segs))
> goto out;
>
> + udpfrag = !!skb->encapsulation && proto == IPPROTO_UDP;
> skb = segs;
> do {
> iph = (struct iphdr *)(skb_mac_header(skb) + nhoff);
> - if (!tunnel && proto == IPPROTO_UDP) {
> + if (udpfrag) {
> iph->id = htons(id);
> iph->frag_off = htons(offset >> 3);
> if (skb->next != NULL)
>
The "tunnel" variable becomes unused once you do this, please remove it.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: vxlan gso is broken by stackable gso_segment()
2013-10-25 22:18 ` David Miller
@ 2013-10-25 22:41 ` Alexei Starovoitov
2013-10-28 1:18 ` [PATCH net-next] inet: restore gso for vxlan Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: Alexei Starovoitov @ 2013-10-25 22:41 UTC (permalink / raw)
To: David Miller; +Cc: eric.dumazet, Eric Dumazet, stephen, netdev
On Fri, Oct 25, 2013 at 3:18 PM, David Miller <davem@davemloft.net> wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 25 Oct 2013 02:09:00 -0700
>
>> @@ -1252,6 +1252,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>> const struct net_offload *ops;
>> unsigned int offset = 0;
>> struct iphdr *iph;
>> + bool udpfrag;
>> bool tunnel;
>> int proto;
>> int nhoff;
>> @@ -1306,10 +1307,11 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>> if (IS_ERR_OR_NULL(segs))
>> goto out;
>>
>> + udpfrag = !!skb->encapsulation && proto == IPPROTO_UDP;
>> skb = segs;
>> do {
>> iph = (struct iphdr *)(skb_mac_header(skb) + nhoff);
>> - if (!tunnel && proto == IPPROTO_UDP) {
>> + if (udpfrag) {
>> iph->id = htons(id);
>> iph->frag_off = htons(offset >> 3);
>> if (skb->next != NULL)
>>
>
> The "tunnel" variable becomes unused once you do this, please remove it.
'bool tunnel' actually still used to indicate encap_level > 0
Eric's fix brings back performance for vxlan and gre keeps working. Thx!
net/core/skbuff.c:3474 skb_try_coalesce() warning, I mentioned before,
is unrelated.
I still see it with this patch. Running either gre or vxlan tunnels.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net-next] inet: restore gso for vxlan
2013-10-25 22:41 ` Alexei Starovoitov
@ 2013-10-28 1:18 ` Eric Dumazet
2013-11-07 22:33 ` Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2013-10-28 1:18 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: David Miller, netdev
From: Eric Dumazet <edumazet@google.com>
Alexei reported a performance regression on vxlan, caused
by commit 3347c9602955 "ipv4: gso: make inet_gso_segment() stackable"
GSO vxlan packets were not properly segmented, adding IP fragments
while they were not expected.
Rename 'bool tunnel' to 'bool encap', and add a new boolean
to express the fact that UDP should be fragmented.
This fragmentation is triggered by skb->encapsulation being set.
Remove a "skb->encapsulation = 1" added in above commit,
as its not needed, as frags inherit skb->frag from original
GSO skb.
Reported-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Alexei Starovoitov <ast@plumgrid.com>
---
net/ipv4/af_inet.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index f4a159e705c0..09d78d4a3cff 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1251,8 +1251,8 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
struct sk_buff *segs = ERR_PTR(-EINVAL);
const struct net_offload *ops;
unsigned int offset = 0;
+ bool udpfrag, encap;
struct iphdr *iph;
- bool tunnel;
int proto;
int nhoff;
int ihl;
@@ -1290,8 +1290,8 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
goto out;
__skb_pull(skb, ihl);
- tunnel = SKB_GSO_CB(skb)->encap_level > 0;
- if (tunnel)
+ encap = SKB_GSO_CB(skb)->encap_level > 0;
+ if (encap)
features = skb->dev->hw_enc_features & netif_skb_features(skb);
SKB_GSO_CB(skb)->encap_level += ihl;
@@ -1306,24 +1306,23 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
if (IS_ERR_OR_NULL(segs))
goto out;
+ udpfrag = !!skb->encapsulation && proto == IPPROTO_UDP;
skb = segs;
do {
iph = (struct iphdr *)(skb_mac_header(skb) + nhoff);
- if (!tunnel && proto == IPPROTO_UDP) {
+ if (udpfrag) {
iph->id = htons(id);
iph->frag_off = htons(offset >> 3);
if (skb->next != NULL)
iph->frag_off |= htons(IP_MF);
offset += skb->len - nhoff - ihl;
- } else {
+ } else {
iph->id = htons(id++);
}
iph->tot_len = htons(skb->len - nhoff);
ip_send_check(iph);
- if (tunnel) {
+ if (encap)
skb_reset_inner_headers(skb);
- skb->encapsulation = 1;
- }
skb->network_header = (u8 *)iph - skb->head;
} while ((skb = skb->next));
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] inet: restore gso for vxlan
2013-10-28 1:18 ` [PATCH net-next] inet: restore gso for vxlan Eric Dumazet
@ 2013-11-07 22:33 ` Eric Dumazet
2013-11-07 23:27 ` Alexei Starovoitov
0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2013-11-07 22:33 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: David Miller, netdev, Hannes Frederic Sowa
On Sun, 2013-10-27 at 18:18 -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Alexei reported a performance regression on vxlan, caused
> by commit 3347c9602955 "ipv4: gso: make inet_gso_segment() stackable"
>
> GSO vxlan packets were not properly segmented, adding IP fragments
> while they were not expected.
>
> Rename 'bool tunnel' to 'bool encap', and add a new boolean
> to express the fact that UDP should be fragmented.
> This fragmentation is triggered by skb->encapsulation being set.
>
> Remove a "skb->encapsulation = 1" added in above commit,
> as its not needed, as frags inherit skb->frag from original
> GSO skb.
>
> Reported-by: Alexei Starovoitov <ast@plumgrid.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Tested-by: Alexei Starovoitov <ast@plumgrid.com>
> ---
> net/ipv4/af_inet.c | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index f4a159e705c0..09d78d4a3cff 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1251,8 +1251,8 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
> struct sk_buff *segs = ERR_PTR(-EINVAL);
> const struct net_offload *ops;
> unsigned int offset = 0;
> + bool udpfrag, encap;
> struct iphdr *iph;
> - bool tunnel;
> int proto;
> int nhoff;
> int ihl;
> @@ -1290,8 +1290,8 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
> goto out;
> __skb_pull(skb, ihl);
>
> - tunnel = SKB_GSO_CB(skb)->encap_level > 0;
> - if (tunnel)
> + encap = SKB_GSO_CB(skb)->encap_level > 0;
> + if (encap)
> features = skb->dev->hw_enc_features & netif_skb_features(skb);
> SKB_GSO_CB(skb)->encap_level += ihl;
>
> @@ -1306,24 +1306,23 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
> if (IS_ERR_OR_NULL(segs))
> goto out;
>
> + udpfrag = !!skb->encapsulation && proto == IPPROTO_UDP;
It seems I did an error here, shouldn't it be
udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
instead ?
I am not sure UFO and vxlan are compatable...
> skb = segs;
> do {
> iph = (struct iphdr *)(skb_mac_header(skb) + nhoff);
> - if (!tunnel && proto == IPPROTO_UDP) {
> + if (udpfrag) {
> iph->id = htons(id);
> iph->frag_off = htons(offset >> 3);
> if (skb->next != NULL)
> iph->frag_off |= htons(IP_MF);
> offset += skb->len - nhoff - ihl;
> - } else {
> + } else {
> iph->id = htons(id++);
> }
> iph->tot_len = htons(skb->len - nhoff);
> ip_send_check(iph);
> - if (tunnel) {
> + if (encap)
> skb_reset_inner_headers(skb);
> - skb->encapsulation = 1;
> - }
> skb->network_header = (u8 *)iph - skb->head;
> } while ((skb = skb->next));
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] inet: restore gso for vxlan
2013-11-07 22:33 ` Eric Dumazet
@ 2013-11-07 23:27 ` Alexei Starovoitov
2013-11-08 0:44 ` [PATCH net-next] inet: fix a UFO regression Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: Alexei Starovoitov @ 2013-11-07 23:27 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Hannes Frederic Sowa
On Thu, Nov 7, 2013 at 2:33 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sun, 2013-10-27 at 18:18 -0700, Eric Dumazet wrote:
>> From: Eric Dumazet <edumazet@google.com>
>>
>> Alexei reported a performance regression on vxlan, caused
>> by commit 3347c9602955 "ipv4: gso: make inet_gso_segment() stackable"
>>
>> GSO vxlan packets were not properly segmented, adding IP fragments
>> while they were not expected.
>>
>> Rename 'bool tunnel' to 'bool encap', and add a new boolean
>> to express the fact that UDP should be fragmented.
>> This fragmentation is triggered by skb->encapsulation being set.
>>
>> Remove a "skb->encapsulation = 1" added in above commit,
>> as its not needed, as frags inherit skb->frag from original
>> GSO skb.
>>
>> Reported-by: Alexei Starovoitov <ast@plumgrid.com>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Tested-by: Alexei Starovoitov <ast@plumgrid.com>
>> ---
>> net/ipv4/af_inet.c | 15 +++++++--------
>> 1 file changed, 7 insertions(+), 8 deletions(-)
>>
>> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
>> index f4a159e705c0..09d78d4a3cff 100644
>> --- a/net/ipv4/af_inet.c
>> +++ b/net/ipv4/af_inet.c
>> @@ -1251,8 +1251,8 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>> struct sk_buff *segs = ERR_PTR(-EINVAL);
>> const struct net_offload *ops;
>> unsigned int offset = 0;
>> + bool udpfrag, encap;
>> struct iphdr *iph;
>> - bool tunnel;
>> int proto;
>> int nhoff;
>> int ihl;
>> @@ -1290,8 +1290,8 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>> goto out;
>> __skb_pull(skb, ihl);
>>
>> - tunnel = SKB_GSO_CB(skb)->encap_level > 0;
>> - if (tunnel)
>> + encap = SKB_GSO_CB(skb)->encap_level > 0;
>> + if (encap)
>> features = skb->dev->hw_enc_features & netif_skb_features(skb);
>> SKB_GSO_CB(skb)->encap_level += ihl;
>>
>> @@ -1306,24 +1306,23 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>> if (IS_ERR_OR_NULL(segs))
>> goto out;
>>
>> + udpfrag = !!skb->encapsulation && proto == IPPROTO_UDP;
>
> It seems I did an error here, shouldn't it be
>
> udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
>
> instead ?
oops. yes.
the segs coming out of udp4_ufo_fragment() in case of udp_tunnel should not
be marked as ip frags.
ip frags are only for UFO.
Thank you for spotting it!
Alexei
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net-next] inet: fix a UFO regression
2013-11-07 23:27 ` Alexei Starovoitov
@ 2013-11-08 0:44 ` Eric Dumazet
2013-11-08 2:32 ` [PATCH v2 " Eric Dumazet
0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2013-11-08 0:44 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: David Miller, netdev, Hannes Frederic Sowa
From: Eric Dumazet <edumazet@google.com>
While testing virtio_net and skb_segment() changes, Hannes reported
that UFO was sending wrong frames.
It appears this was introduced by a recent commit :
8c3a897bfab1 ("inet: restore gso for vxlan")
The old condition to perform IP frag was :
tunnel = !!skb->encapsulation;
...
if (!tunnel && proto == IPPROTO_UDP) {
So the new one should be :
udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
...
if (udpfrag) {
With help from Alexei Starovoitov
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
---
net/ipv4/af_inet.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 09d78d4a3cff..f17941486ab9 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1306,7 +1306,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
if (IS_ERR_OR_NULL(segs))
goto out;
- udpfrag = !!skb->encapsulation && proto == IPPROTO_UDP;
+ udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
skb = segs;
do {
iph = (struct iphdr *)(skb_mac_header(skb) + nhoff);
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 net-next] inet: fix a UFO regression
2013-11-08 0:44 ` [PATCH net-next] inet: fix a UFO regression Eric Dumazet
@ 2013-11-08 2:32 ` Eric Dumazet
2013-11-08 7:08 ` David Miller
0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2013-11-08 2:32 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: David Miller, netdev, Hannes Frederic Sowa
From: Eric Dumazet <edumazet@google.com>
While testing virtio_net and skb_segment() changes, Hannes reported
that UFO was sending wrong frames.
It appears this was introduced by a recent commit :
8c3a897bfab1 ("inet: restore gso for vxlan")
The old condition to perform IP frag was :
tunnel = !!skb->encapsulation;
...
if (!tunnel && proto == IPPROTO_UDP) {
So the new one should be :
udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
...
if (udpfrag) {
Initialization of udpfrag must be done before call
to ops->callbacks.gso_segment(skb, features), as
skb_udp_tunnel_segment() clears skb->encapsulation
(We want udpfrag to be true for UFO, false for VXLAN)
With help from Alexei Starovoitov
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
---
v2: move the udpfrag init, as spotted by Alexei.
net/ipv4/af_inet.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 09d78d4a3cff..68af9aac91d0 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1299,6 +1299,9 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
segs = ERR_PTR(-EPROTONOSUPPORT);
+ /* Note : following gso_segment() might change skb->encapsulation */
+ udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
+
ops = rcu_dereference(inet_offloads[proto]);
if (likely(ops && ops->callbacks.gso_segment))
segs = ops->callbacks.gso_segment(skb, features);
@@ -1306,7 +1309,6 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
if (IS_ERR_OR_NULL(segs))
goto out;
- udpfrag = !!skb->encapsulation && proto == IPPROTO_UDP;
skb = segs;
do {
iph = (struct iphdr *)(skb_mac_header(skb) + nhoff);
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 net-next] inet: fix a UFO regression
2013-11-08 2:32 ` [PATCH v2 " Eric Dumazet
@ 2013-11-08 7:08 ` David Miller
0 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2013-11-08 7:08 UTC (permalink / raw)
To: eric.dumazet; +Cc: ast, netdev, hannes
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 07 Nov 2013 18:32:06 -0800
> From: Eric Dumazet <edumazet@google.com>
>
> While testing virtio_net and skb_segment() changes, Hannes reported
> that UFO was sending wrong frames.
>
> It appears this was introduced by a recent commit :
> 8c3a897bfab1 ("inet: restore gso for vxlan")
>
> The old condition to perform IP frag was :
>
> tunnel = !!skb->encapsulation;
> ...
> if (!tunnel && proto == IPPROTO_UDP) {
>
> So the new one should be :
>
> udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
> ...
> if (udpfrag) {
>
> Initialization of udpfrag must be done before call
> to ops->callbacks.gso_segment(skb, features), as
> skb_udp_tunnel_segment() clears skb->encapsulation
>
> (We want udpfrag to be true for UFO, false for VXLAN)
>
> With help from Alexei Starovoitov
>
> Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied, thanks a lot Eric.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-11-08 20:57 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-08 5:44 [PATCH v2 net-next] inet: fix a UFO regression Alexei Starovoitov
2013-11-08 5:50 ` Eric Dumazet
2013-11-08 6:41 ` Sridhar Samudrala
2013-11-08 12:18 ` Eric Dumazet
2013-11-08 20:20 ` Sridhar Samudrala
2013-11-08 20:57 ` Eric Dumazet
-- strict thread matches above, loose matches on Subject: below --
2013-10-25 1:59 vxlan gso is broken by stackable gso_segment() Alexei Starovoitov
2013-10-25 4:06 ` Eric Dumazet
2013-10-25 9:09 ` Eric Dumazet
2013-10-25 22:18 ` David Miller
2013-10-25 22:41 ` Alexei Starovoitov
2013-10-28 1:18 ` [PATCH net-next] inet: restore gso for vxlan Eric Dumazet
2013-11-07 22:33 ` Eric Dumazet
2013-11-07 23:27 ` Alexei Starovoitov
2013-11-08 0:44 ` [PATCH net-next] inet: fix a UFO regression Eric Dumazet
2013-11-08 2:32 ` [PATCH v2 " Eric Dumazet
2013-11-08 7:08 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).