netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mlx4+vxlan offload breaks gre tunnels
@ 2014-11-05 15:04 Florian Westphal
  2014-11-05 16:17 ` Or Gerlitz
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Westphal @ 2014-11-05 15:04 UTC (permalink / raw)
  To: netdev; +Cc: amirv, ogerlitz

tl,dr: all tcp packets sent via gre tunnel have broken tcp csum if vxlan offload
is enabled with mlx4 driver.

Given following config on tx-side:
dev=enp3s0
ip addr add dev $dev 192.168.23.1/24
ip link set $dev up
ip link add mygre type gretap remote 192.168.23.2 local 192.168.23.1
ip addr add dev mygre 192.168.42.1/24
ip link set gre0 up
ip link set mygre up

and

options mlx4_core log_num_mgm_entry_size=-1 debug_level=1
port_type_array=2,2

in
/etc/modprobe.d/mlx4.conf

all tcp packets sent to destinations over the gre tunnel have bogus tcp
checksums (and are tossed on rx side when stack validates tcp checksum).

net-next head is commit 30349bdbc4da5ecf0efa25556e3caff9c9b8c5f7 .

What makes things work for me:
either

options mlx4_core 1 debug_level=1 port_type_array=2,2

(ie. no MLX4_TUNNEL_OFFLOAD_MODE_VXLAN)

or not setting NETIF_F_IP_CSUM in enc_features:

--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2579,10 +2579,12 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
                dev->priv_flags |= IFF_UNICAST_FLT;
 
        if (mdev->dev->caps.tunnel_offload_mode == MLX4_TUNNEL_OFFLOAD_MODE_VXLAN) {
-               dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
+               dev->hw_enc_features |= NETIF_F_RXCSUM |
                                        NETIF_F_TSO | NETIF_F_GSO_UDP_TUNNEL;

I am not sure if its right fix, but to my eyes this basically looks like
mlx4 is telling stack that it can handle tcp checksum offload within
tunnels, and that doesn't seem to be the case for all types (e.g. gre).

Could someone who understand the enc_features specifics better confirm that
above patch is correct (or provide a better/proper fix)?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mlx4+vxlan offload breaks gre tunnels
  2014-11-05 15:04 mlx4+vxlan offload breaks gre tunnels Florian Westphal
@ 2014-11-05 16:17 ` Or Gerlitz
  2014-11-05 16:53   ` Florian Westphal
                     ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Or Gerlitz @ 2014-11-05 16:17 UTC (permalink / raw)
  To: Florian Westphal, netdev, Tom Herbert, Jesse Gross; +Cc: amirv

On 11/5/2014 5:04 PM, Florian Westphal wrote:
> tl,dr: all tcp packets sent via gre tunnel have broken tcp csum if vxlan offload
> is enabled with mlx4 driver.
>
> Given following config on tx-side:
> dev=enp3s0
> ip addr add dev $dev 192.168.23.1/24
> ip link set $dev up
> ip link add mygre type gretap remote 192.168.23.2 local 192.168.23.1
> ip addr add dev mygre 192.168.42.1/24
> ip link set gre0 up
> ip link set mygre up
>
> and
>
> options mlx4_core log_num_mgm_entry_size=-1 debug_level=1
> port_type_array=2,2
>
> in
> /etc/modprobe.d/mlx4.conf
>
> all tcp packets sent to destinations over the gre tunnel have bogus tcp
> checksums (and are tossed on rx side when stack validates tcp checksum).
>
> net-next head is commit 30349bdbc4da5ecf0efa25556e3caff9c9b8c5f7 .
>
> What makes things work for me:
> either
>
> options mlx4_core 1 debug_level=1 port_type_array=2,2
>
> (ie. no MLX4_TUNNEL_OFFLOAD_MODE_VXLAN)
>
> or not setting NETIF_F_IP_CSUM in enc_features:
>
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -2579,10 +2579,12 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
>                  dev->priv_flags |= IFF_UNICAST_FLT;
>   
>          if (mdev->dev->caps.tunnel_offload_mode == MLX4_TUNNEL_OFFLOAD_MODE_VXLAN) {
> -               dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
> +               dev->hw_enc_features |= NETIF_F_RXCSUM |
>                                          NETIF_F_TSO | NETIF_F_GSO_UDP_TUNNEL;
>
> I am not sure if its right fix, but to my eyes this basically looks like
> mlx4 is telling stack that it can handle tcp checksum offload within
> tunnels, and that doesn't seem to be the case for all types (e.g. gre).
>
> Could someone who understand the enc_features specifics better confirm that
> above patch is correct (or provide a better/proper fix)?

Yep, I can see now the problem. It comes into play with ConnectX3-pro 
NICs that support VXLAN offloads (but not with ConnectX3 NIC which 
don't) when you enable the offloads support on the CX3-pro.

The problem originates from the fact that we can't advertize something 
like "the HW can offload the inner checksum of UDP/VXLAN encapsulated 
(but not for GRE)", e.g in a similar manner that exists in the GSO 
space, where you have NETIF_F_GSO _YYY for each yyy in {UDP, SIT, GRE, 
etc} tunneling scheme.

I think the best effort we can do now is

1. come up with something such as the below patch for 3.18 which is 
back-ward portable for -stable kernels, it will only arm the hw offloads 
if the OS tells us there's VXLAN in action

2. come  up with proper kernel APIs to let NICs advertize which encap 
schemes they can actually offload the inner checksum, Tom... your work 
which now runs over netdev.

Tom/Jesse- thoughts? are you +1-ing the below approach?

Or.

tested to work with the  following which is a bit different, tell me if 
it works for you

# node A - with mlx4_en address192.168.31.18
ip tunnel add gre1 mode gre local 192.168.31.18 remote 192.168.31.17 ttl 255
ifconfig gre1 10.10.10.18/24 up
ifconfig gre1 mtu 1450

# node B - with mlx4_en address192.168.31.17
ip tunnel add gre1 mode gre local 192.168.31.17 remote 192.168.31.18 ttl 255
ifconfig gre1 10.10.10.17/24 up
ifconfig gre1 mtu 1450


diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 0efbae9..7753833 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2292,6 +2292,12 @@ static void mlx4_en_add_vxlan_offloads(struct 
work_struct *work)
  out:
         if (ret)
                 en_err(priv, "failed setting L2 tunnel configuration 
ret %d\n", ret);
+
+       /* set offloads */
+       priv->dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
+                                     NETIF_F_TSO | NETIF_F_GSO_UDP_TUNNEL;
+       priv->dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
+       priv->dev->features    |= NETIF_F_GSO_UDP_TUNNEL;
  }

  static void mlx4_en_del_vxlan_offloads(struct work_struct *work)
@@ -2299,6 +2305,10 @@ static void mlx4_en_del_vxlan_offloads(struct 
work_struct *work)
         int ret;
         struct mlx4_en_priv *priv = container_of(work, struct mlx4_en_priv,
vxlan_del_task);
+       /* unset offloads */
+       priv->dev->hw_enc_features = 0;
+       priv->dev->hw_features &= ~NETIF_F_GSO_UDP_TUNNEL;
+       priv->dev->features    &= ~NETIF_F_GSO_UDP_TUNNEL;

         ret = mlx4_SET_PORT_VXLAN(priv->mdev->dev, priv->port,
                                   VXLAN_STEER_BY_OUTER_MAC, 0);
@@ -2578,13 +2588,6 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, 
int port,
         if (mdev->dev->caps.steering_mode != MLX4_STEERING_MODE_A0)
                 dev->priv_flags |= IFF_UNICAST_FLT;

-       if (mdev->dev->caps.tunnel_offload_mode == 
MLX4_TUNNEL_OFFLOAD_MODE_VXLAN) {
-               dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
-                                       NETIF_F_TSO | 
NETIF_F_GSO_UDP_TUNNEL;
-               dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
-               dev->features    |= NETIF_F_GSO_UDP_TUNNEL;
-       }
-
         mdev->pndev[port] = dev;

         netif_carrier_off(dev);

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: mlx4+vxlan offload breaks gre tunnels
  2014-11-05 16:17 ` Or Gerlitz
@ 2014-11-05 16:53   ` Florian Westphal
  2014-11-06 16:30     ` Or Gerlitz
  2014-11-05 21:59   ` Tom Herbert
  2014-11-06  7:21   ` Sathya Perla
  2 siblings, 1 reply; 8+ messages in thread
From: Florian Westphal @ 2014-11-05 16:53 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Florian Westphal, netdev, Tom Herbert, Jesse Gross, amirv

Or Gerlitz <ogerlitz@mellanox.com> wrote:
> On 11/5/2014 5:04 PM, Florian Westphal wrote:
> >tl,dr: all tcp packets sent via gre tunnel have broken tcp csum if vxlan offload
> >is enabled with mlx4 driver.
> >
> Yep, I can see now the problem. It comes into play with
> ConnectX3-pro NICs that support VXLAN offloads (but not with
> ConnectX3 NIC which don't) when you enable the offloads support on
> the CX3-pro.

[..]
> I think the best effort we can do now is
> 
> 1. come up with something such as the below patch for 3.18 which is
> back-ward portable for -stable kernels, it will only arm the hw
> offloads if the OS tells us there's VXLAN in action
> 
[..]
> 
> tested to work with the  following which is a bit different, tell me
> if it works for you

Right, the patch below works in my setup as well (until link-add-vxlan,
that is ;) )

Thanks,
Florian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mlx4+vxlan offload breaks gre tunnels
  2014-11-05 16:17 ` Or Gerlitz
  2014-11-05 16:53   ` Florian Westphal
@ 2014-11-05 21:59   ` Tom Herbert
  2014-11-06  7:03     ` Or Gerlitz
  2014-11-06  7:21   ` Sathya Perla
  2 siblings, 1 reply; 8+ messages in thread
From: Tom Herbert @ 2014-11-05 21:59 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Florian Westphal, Linux Netdev List, Jesse Gross, Amir Vadai

On Wed, Nov 5, 2014 at 8:17 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> On 11/5/2014 5:04 PM, Florian Westphal wrote:
>>
>> tl,dr: all tcp packets sent via gre tunnel have broken tcp csum if vxlan
>> offload
>> is enabled with mlx4 driver.
>>
>> Given following config on tx-side:
>> dev=enp3s0
>> ip addr add dev $dev 192.168.23.1/24
>> ip link set $dev up
>> ip link add mygre type gretap remote 192.168.23.2 local 192.168.23.1
>> ip addr add dev mygre 192.168.42.1/24
>> ip link set gre0 up
>> ip link set mygre up
>>
>> and
>>
>> options mlx4_core log_num_mgm_entry_size=-1 debug_level=1
>> port_type_array=2,2
>>
>> in
>> /etc/modprobe.d/mlx4.conf
>>
>> all tcp packets sent to destinations over the gre tunnel have bogus tcp
>> checksums (and are tossed on rx side when stack validates tcp checksum).
>>
>> net-next head is commit 30349bdbc4da5ecf0efa25556e3caff9c9b8c5f7 .
>>
>> What makes things work for me:
>> either
>>
>> options mlx4_core 1 debug_level=1 port_type_array=2,2
>>
>> (ie. no MLX4_TUNNEL_OFFLOAD_MODE_VXLAN)
>>
>> or not setting NETIF_F_IP_CSUM in enc_features:
>>
>> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> @@ -2579,10 +2579,12 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev,
>> int port,
>>                  dev->priv_flags |= IFF_UNICAST_FLT;
>>            if (mdev->dev->caps.tunnel_offload_mode ==
>> MLX4_TUNNEL_OFFLOAD_MODE_VXLAN) {
>> -               dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
>> +               dev->hw_enc_features |= NETIF_F_RXCSUM |
>>                                          NETIF_F_TSO |
>> NETIF_F_GSO_UDP_TUNNEL;
>>
>> I am not sure if its right fix, but to my eyes this basically looks like
>> mlx4 is telling stack that it can handle tcp checksum offload within
>> tunnels, and that doesn't seem to be the case for all types (e.g. gre).
>>
>> Could someone who understand the enc_features specifics better confirm
>> that
>> above patch is correct (or provide a better/proper fix)?
>
>
> Yep, I can see now the problem. It comes into play with ConnectX3-pro NICs
> that support VXLAN offloads (but not with ConnectX3 NIC which don't) when
> you enable the offloads support on the CX3-pro.
>
> The problem originates from the fact that we can't advertize something like
> "the HW can offload the inner checksum of UDP/VXLAN encapsulated (but not
> for GRE)", e.g in a similar manner that exists in the GSO space, where you
> have NETIF_F_GSO _YYY for each yyy in {UDP, SIT, GRE, etc} tunneling scheme.
>
> I think the best effort we can do now is
>
> 1. come up with something such as the below patch for 3.18 which is
> back-ward portable for -stable kernels, it will only arm the hw offloads if
> the OS tells us there's VXLAN in action
>
> 2. come  up with proper kernel APIs to let NICs advertize which encap
> schemes they can actually offload the inner checksum, Tom... your work which
> now runs over netdev.
>
Possibly #3: add ndo_gso_check to detect nested tunneling. In this
case it would see that gso_type has both SKB_GSO_GRE and
SKB_GSO_UDP_TUNNEL set.

> Tom/Jesse- thoughts? are you +1-ing the below approach?
>
> Or.
>
> tested to work with the  following which is a bit different, tell me if it
> works for you
>
> # node A - with mlx4_en address192.168.31.18
> ip tunnel add gre1 mode gre local 192.168.31.18 remote 192.168.31.17 ttl 255
> ifconfig gre1 10.10.10.18/24 up
> ifconfig gre1 mtu 1450
>
> # node B - with mlx4_en address192.168.31.17
> ip tunnel add gre1 mode gre local 192.168.31.17 remote 192.168.31.18 ttl 255
> ifconfig gre1 10.10.10.17/24 up
> ifconfig gre1 mtu 1450
>
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 0efbae9..7753833 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -2292,6 +2292,12 @@ static void mlx4_en_add_vxlan_offloads(struct
> work_struct *work)
>  out:
>         if (ret)
>                 en_err(priv, "failed setting L2 tunnel configuration ret
> %d\n", ret);
> +
> +       /* set offloads */
> +       priv->dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
> +                                     NETIF_F_TSO | NETIF_F_GSO_UDP_TUNNEL;
> +       priv->dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
> +       priv->dev->features    |= NETIF_F_GSO_UDP_TUNNEL;
>  }
>
>  static void mlx4_en_del_vxlan_offloads(struct work_struct *work)
> @@ -2299,6 +2305,10 @@ static void mlx4_en_del_vxlan_offloads(struct
> work_struct *work)
>         int ret;
>         struct mlx4_en_priv *priv = container_of(work, struct mlx4_en_priv,
> vxlan_del_task);
> +       /* unset offloads */
> +       priv->dev->hw_enc_features = 0;
> +       priv->dev->hw_features &= ~NETIF_F_GSO_UDP_TUNNEL;
> +       priv->dev->features    &= ~NETIF_F_GSO_UDP_TUNNEL;
>
>         ret = mlx4_SET_PORT_VXLAN(priv->mdev->dev, priv->port,
>                                   VXLAN_STEER_BY_OUTER_MAC, 0);
> @@ -2578,13 +2588,6 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int
> port,
>         if (mdev->dev->caps.steering_mode != MLX4_STEERING_MODE_A0)
>                 dev->priv_flags |= IFF_UNICAST_FLT;
>
> -       if (mdev->dev->caps.tunnel_offload_mode ==
> MLX4_TUNNEL_OFFLOAD_MODE_VXLAN) {
> -               dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
> -                                       NETIF_F_TSO |
> NETIF_F_GSO_UDP_TUNNEL;
> -               dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
> -               dev->features    |= NETIF_F_GSO_UDP_TUNNEL;
> -       }
> -
>         mdev->pndev[port] = dev;
>
>         netif_carrier_off(dev);
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mlx4+vxlan offload breaks gre tunnels
  2014-11-05 21:59   ` Tom Herbert
@ 2014-11-06  7:03     ` Or Gerlitz
  0 siblings, 0 replies; 8+ messages in thread
From: Or Gerlitz @ 2014-11-06  7:03 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Florian Westphal, Linux Netdev List, Jesse Gross, Amir Vadai

On Wed, Nov 5, 2014 at 11:59 PM, Tom Herbert <therbert@google.com> wrote:
> On Wed, Nov 5, 2014 at 8:17 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
>> On 11/5/2014 5:04 PM, Florian Westphal wrote:
>>>
>>> tl,dr: all tcp packets sent via gre tunnel have broken tcp csum if vxlan
>>> offload
>>> is enabled with mlx4 driver.
>>>
>>> Given following config on tx-side:
>>> dev=enp3s0
>>> ip addr add dev $dev 192.168.23.1/24
>>> ip link set $dev up
>>> ip link add mygre type gretap remote 192.168.23.2 local 192.168.23.1
>>> ip addr add dev mygre 192.168.42.1/24
>>> ip link set gre0 up
>>> ip link set mygre up
>>>
>>> and
>>>
>>> options mlx4_core log_num_mgm_entry_size=-1 debug_level=1
>>> port_type_array=2,2
>>>
>>> in
>>> /etc/modprobe.d/mlx4.conf
>>>
>>> all tcp packets sent to destinations over the gre tunnel have bogus tcp
>>> checksums (and are tossed on rx side when stack validates tcp checksum).
>>>
>>> net-next head is commit 30349bdbc4da5ecf0efa25556e3caff9c9b8c5f7 .
>>>
>>> What makes things work for me:
>>> either
>>>
>>> options mlx4_core 1 debug_level=1 port_type_array=2,2
>>>
>>> (ie. no MLX4_TUNNEL_OFFLOAD_MODE_VXLAN)
>>>
>>> or not setting NETIF_F_IP_CSUM in enc_features:
>>>
>>> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>>> @@ -2579,10 +2579,12 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev,
>>> int port,
>>>                  dev->priv_flags |= IFF_UNICAST_FLT;
>>>            if (mdev->dev->caps.tunnel_offload_mode ==
>>> MLX4_TUNNEL_OFFLOAD_MODE_VXLAN) {
>>> -               dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
>>> +               dev->hw_enc_features |= NETIF_F_RXCSUM |
>>>                                          NETIF_F_TSO |
>>> NETIF_F_GSO_UDP_TUNNEL;
>>>
>>> I am not sure if its right fix, but to my eyes this basically looks like
>>> mlx4 is telling stack that it can handle tcp checksum offload within
>>> tunnels, and that doesn't seem to be the case for all types (e.g. gre).
>>>
>>> Could someone who understand the enc_features specifics better confirm
>>> that
>>> above patch is correct (or provide a better/proper fix)?
>>
>>
>> Yep, I can see now the problem. It comes into play with ConnectX3-pro NICs
>> that support VXLAN offloads (but not with ConnectX3 NIC which don't) when
>> you enable the offloads support on the CX3-pro.
>>
>> The problem originates from the fact that we can't advertize something like
>> "the HW can offload the inner checksum of UDP/VXLAN encapsulated (but not
>> for GRE)", e.g in a similar manner that exists in the GSO space, where you
>> have NETIF_F_GSO _YYY for each yyy in {UDP, SIT, GRE, etc} tunneling scheme.
>>
>> I think the best effort we can do now is
>>
>> 1. come up with something such as the below patch for 3.18 which is
>> back-ward portable for -stable kernels, it will only arm the hw offloads if
>> the OS tells us there's VXLAN in action
>>
>> 2. come  up with proper kernel APIs to let NICs advertize which encap
>> schemes they can actually offload the inner checksum, Tom... your work which
>> now runs over netdev.

> Possibly #3: add ndo_gso_check to detect nested tunneling. In this
> case it would see that gso_type has both SKB_GSO_GRE and
> SKB_GSO_UDP_TUNNEL set.

I am not with you, where do we have nested tunneling here? the below
setting is simple GRE tunnel
to host (say) TCP traffic

>> tested to work with the  following which is a bit different, tell me if it
>> works for you
>>
>> # node A - with mlx4_en address192.168.31.18
>> ip tunnel add gre1 mode gre local 192.168.31.18 remote 192.168.31.17 ttl 255
>> ifconfig gre1 10.10.10.18/24 up
>> ifconfig gre1 mtu 1450
>>
>> # node B - with mlx4_en address192.168.31.17
>> ip tunnel add gre1 mode gre local 192.168.31.17 remote 192.168.31.18 ttl 255
>> ifconfig gre1 10.10.10.17/24 up
>> ifconfig gre1 mtu 1450

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: mlx4+vxlan offload breaks gre tunnels
  2014-11-05 16:17 ` Or Gerlitz
  2014-11-05 16:53   ` Florian Westphal
  2014-11-05 21:59   ` Tom Herbert
@ 2014-11-06  7:21   ` Sathya Perla
  2014-11-06  7:53     ` Or Gerlitz
  2 siblings, 1 reply; 8+ messages in thread
From: Sathya Perla @ 2014-11-06  7:21 UTC (permalink / raw)
  To: Or Gerlitz, Florian Westphal, netdev@vger.kernel.org, Tom Herbert,
	Jesse Gross
  Cc: amirv@mellanox.com, Sathya Perla

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> 
> On 11/5/2014 5:04 PM, Florian Westphal wrote:
> > tl,dr: all tcp packets sent via gre tunnel have broken tcp csum if vxlan
> offload
> > is enabled with mlx4 driver.
> >
> > Given following config on tx-side:
> > dev=enp3s0
> > ip addr add dev $dev 192.168.23.1/24
> > ip link set $dev up
> > ip link add mygre type gretap remote 192.168.23.2 local 192.168.23.1
> > ip addr add dev mygre 192.168.42.1/24
> > ip link set gre0 up
> > ip link set mygre up
> >
> > and
> >
> > options mlx4_core log_num_mgm_entry_size=-1 debug_level=1
> > port_type_array=2,2
> >
> > in
> > /etc/modprobe.d/mlx4.conf
> >
> > all tcp packets sent to destinations over the gre tunnel have bogus tcp
> > checksums (and are tossed on rx side when stack validates tcp checksum).
> >
> > net-next head is commit 30349bdbc4da5ecf0efa25556e3caff9c9b8c5f7 .
> >
> > What makes things work for me:
> > either
> >
> > options mlx4_core 1 debug_level=1 port_type_array=2,2
> >
> > (ie. no MLX4_TUNNEL_OFFLOAD_MODE_VXLAN)
> >
> > or not setting NETIF_F_IP_CSUM in enc_features:
> >
> > --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> > @@ -2579,10 +2579,12 @@ int mlx4_en_init_netdev(struct mlx4_en_dev
> *mdev, int port,
> >                  dev->priv_flags |= IFF_UNICAST_FLT;
> >
> >          if (mdev->dev->caps.tunnel_offload_mode ==
> MLX4_TUNNEL_OFFLOAD_MODE_VXLAN) {
> > -               dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM
> |
> > +               dev->hw_enc_features |= NETIF_F_RXCSUM |
> >                                          NETIF_F_TSO | NETIF_F_GSO_UDP_TUNNEL;
> >
> > I am not sure if its right fix, but to my eyes this basically looks like
> > mlx4 is telling stack that it can handle tcp checksum offload within
> > tunnels, and that doesn't seem to be the case for all types (e.g. gre).
> >
> > Could someone who understand the enc_features specifics better confirm
> that
> > above patch is correct (or provide a better/proper fix)?
> 
> Yep, I can see now the problem. It comes into play with ConnectX3-pro
> NICs that support VXLAN offloads (but not with ConnectX3 NIC which
> don't) when you enable the offloads support on the CX3-pro.
> 
> The problem originates from the fact that we can't advertize something
> like "the HW can offload the inner checksum of UDP/VXLAN encapsulated
> (but not for GRE)", e.g in a similar manner that exists in the GSO
> space, where you have NETIF_F_GSO _YYY for each yyy in {UDP, SIT, GRE,
> etc} tunneling scheme.
> 
> I think the best effort we can do now is
> 
> 1. come up with something such as the below patch for 3.18 which is
> back-ward portable for -stable kernels, it will only arm the hw offloads
> if the OS tells us there's VXLAN in action

Or, wouldn't the patch below not work (i.e., the same issue would persist)
when there is both VXLAN and some other (say GRE) tunnel in the system
and the NIC HW is capable of supporting checksum offload only on VxLAN.

Do you expect a user who uses VxLAN to not use other kinds of tunnels?

> 
> 2. come  up with proper kernel APIs to let NICs advertize which encap
> schemes they can actually offload the inner checksum, Tom... your work
> which now runs over netdev.
> 
> Tom/Jesse- thoughts? are you +1-ing the below approach?
> 
> Or.
> 
> tested to work with the  following which is a bit different, tell me if
> it works for you
> 
> # node A - with mlx4_en address192.168.31.18
> ip tunnel add gre1 mode gre local 192.168.31.18 remote 192.168.31.17 ttl 255
> ifconfig gre1 10.10.10.18/24 up
> ifconfig gre1 mtu 1450
> 
> # node B - with mlx4_en address192.168.31.17
> ip tunnel add gre1 mode gre local 192.168.31.17 remote 192.168.31.18 ttl 255
> ifconfig gre1 10.10.10.17/24 up
> ifconfig gre1 mtu 1450
> 
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 0efbae9..7753833 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -2292,6 +2292,12 @@ static void mlx4_en_add_vxlan_offloads(struct
> work_struct *work)
>   out:
>          if (ret)
>                  en_err(priv, "failed setting L2 tunnel configuration
> ret %d\n", ret);
> +
> +       /* set offloads */
> +       priv->dev->hw_enc_features |= NETIF_F_IP_CSUM |
> NETIF_F_RXCSUM |
> +                                     NETIF_F_TSO | NETIF_F_GSO_UDP_TUNNEL;
> +       priv->dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
> +       priv->dev->features    |= NETIF_F_GSO_UDP_TUNNEL;
>   }
> 
>   static void mlx4_en_del_vxlan_offloads(struct work_struct *work)
> @@ -2299,6 +2305,10 @@ static void mlx4_en_del_vxlan_offloads(struct
> work_struct *work)
>          int ret;
>          struct mlx4_en_priv *priv = container_of(work, struct mlx4_en_priv,
> vxlan_del_task);
> +       /* unset offloads */
> +       priv->dev->hw_enc_features = 0;
> +       priv->dev->hw_features &= ~NETIF_F_GSO_UDP_TUNNEL;
> +       priv->dev->features    &= ~NETIF_F_GSO_UDP_TUNNEL;
> 
>          ret = mlx4_SET_PORT_VXLAN(priv->mdev->dev, priv->port,
>                                    VXLAN_STEER_BY_OUTER_MAC, 0);
> @@ -2578,13 +2588,6 @@ int mlx4_en_init_netdev(struct mlx4_en_dev
> *mdev,
> int port,
>          if (mdev->dev->caps.steering_mode != MLX4_STEERING_MODE_A0)
>                  dev->priv_flags |= IFF_UNICAST_FLT;
> 
> -       if (mdev->dev->caps.tunnel_offload_mode ==
> MLX4_TUNNEL_OFFLOAD_MODE_VXLAN) {
> -               dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
> -                                       NETIF_F_TSO |
> NETIF_F_GSO_UDP_TUNNEL;
> -               dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
> -               dev->features    |= NETIF_F_GSO_UDP_TUNNEL;
> -       }
> -
>          mdev->pndev[port] = dev;
> 
>          netif_carrier_off(dev);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mlx4+vxlan offload breaks gre tunnels
  2014-11-06  7:21   ` Sathya Perla
@ 2014-11-06  7:53     ` Or Gerlitz
  0 siblings, 0 replies; 8+ messages in thread
From: Or Gerlitz @ 2014-11-06  7:53 UTC (permalink / raw)
  To: Sathya Perla
  Cc: Or Gerlitz, Florian Westphal, netdev@vger.kernel.org, Tom Herbert,
	Jesse Gross, amirv@mellanox.com

On Thu, Nov 6, 2014 at 9:21 AM, Sathya Perla <Sathya.Perla@emulex.com> wrote:
>> I think the best effort we can do now is
>>
>> 1. come up with something such as the below patch for 3.18 which is
>> back-ward portable for -stable kernels, it will only arm the hw offloads
>> if the OS tells us there's VXLAN in action

> Or, wouldn't the patch below not work (i.e., the same issue would persist)
> when there is both VXLAN and some other (say GRE) tunnel in the system
> and the NIC HW is capable of supporting checksum offload only on VxLAN.

Indeed, this would be the case. But the patch below will make things
to work when
only GRE is used (or when  only VXLAN is used). So we're making
progress vs. the current
situation where GRE breaks over a HW which is capable to do VXLAN offloads even
if there's no VXLAN tunnel around.


Or.


>> 2. come  up with proper kernel APIs to let NICs advertize which encap
>> schemes they can actually offload the inner checksum, Tom... your work
>> which now runs over netdev.

>> tested to work with the  following which is a bit different, tell me if
>> it works for you
>>
>> # node A - with mlx4_en address192.168.31.18
>> ip tunnel add gre1 mode gre local 192.168.31.18 remote 192.168.31.17 ttl 255
>> ifconfig gre1 10.10.10.18/24 up
>> ifconfig gre1 mtu 1450
>>
>> # node B - with mlx4_en address192.168.31.17
>> ip tunnel add gre1 mode gre local 192.168.31.17 remote 192.168.31.18 ttl 255
>> ifconfig gre1 10.10.10.17/24 up
>> ifconfig gre1 mtu 1450

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mlx4+vxlan offload breaks gre tunnels
  2014-11-05 16:53   ` Florian Westphal
@ 2014-11-06 16:30     ` Or Gerlitz
  0 siblings, 0 replies; 8+ messages in thread
From: Or Gerlitz @ 2014-11-06 16:30 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, Tom Herbert, Jesse Gross, amirv

On 11/5/2014 6:53 PM, Florian Westphal wrote:
> Right, the patch below works in my setup as well (until link-add-vxlan,
> that is;)  )

Good, let me look on that little further to see what's the best approach 
here, thanks for the report

Or.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-11-06 16:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-05 15:04 mlx4+vxlan offload breaks gre tunnels Florian Westphal
2014-11-05 16:17 ` Or Gerlitz
2014-11-05 16:53   ` Florian Westphal
2014-11-06 16:30     ` Or Gerlitz
2014-11-05 21:59   ` Tom Herbert
2014-11-06  7:03     ` Or Gerlitz
2014-11-06  7:21   ` Sathya Perla
2014-11-06  7:53     ` Or Gerlitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).