netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Or Gerlitz <ogerlitz@mellanox.com>
To: Florian Westphal <fw@strlen.de>, <netdev@vger.kernel.org>,
	Tom Herbert <therbert@google.com>, Jesse Gross <jesse@nicira.com>
Cc: <amirv@mellanox.com>
Subject: Re: mlx4+vxlan offload breaks gre tunnels
Date: Wed, 5 Nov 2014 18:17:59 +0200	[thread overview]
Message-ID: <545A4DB7.5010603@mellanox.com> (raw)
In-Reply-To: <20141105150448.GA20776@breakpoint.cc>

On 11/5/2014 5:04 PM, Florian Westphal wrote:
> tl,dr: all tcp packets sent via gre tunnel have broken tcp csum if vxlan offload
> is enabled with mlx4 driver.
>
> Given following config on tx-side:
> dev=enp3s0
> ip addr add dev $dev 192.168.23.1/24
> ip link set $dev up
> ip link add mygre type gretap remote 192.168.23.2 local 192.168.23.1
> ip addr add dev mygre 192.168.42.1/24
> ip link set gre0 up
> ip link set mygre up
>
> and
>
> options mlx4_core log_num_mgm_entry_size=-1 debug_level=1
> port_type_array=2,2
>
> in
> /etc/modprobe.d/mlx4.conf
>
> all tcp packets sent to destinations over the gre tunnel have bogus tcp
> checksums (and are tossed on rx side when stack validates tcp checksum).
>
> net-next head is commit 30349bdbc4da5ecf0efa25556e3caff9c9b8c5f7 .
>
> What makes things work for me:
> either
>
> options mlx4_core 1 debug_level=1 port_type_array=2,2
>
> (ie. no MLX4_TUNNEL_OFFLOAD_MODE_VXLAN)
>
> or not setting NETIF_F_IP_CSUM in enc_features:
>
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -2579,10 +2579,12 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
>                  dev->priv_flags |= IFF_UNICAST_FLT;
>   
>          if (mdev->dev->caps.tunnel_offload_mode == MLX4_TUNNEL_OFFLOAD_MODE_VXLAN) {
> -               dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
> +               dev->hw_enc_features |= NETIF_F_RXCSUM |
>                                          NETIF_F_TSO | NETIF_F_GSO_UDP_TUNNEL;
>
> I am not sure if its right fix, but to my eyes this basically looks like
> mlx4 is telling stack that it can handle tcp checksum offload within
> tunnels, and that doesn't seem to be the case for all types (e.g. gre).
>
> Could someone who understand the enc_features specifics better confirm that
> above patch is correct (or provide a better/proper fix)?

Yep, I can see now the problem. It comes into play with ConnectX3-pro 
NICs that support VXLAN offloads (but not with ConnectX3 NIC which 
don't) when you enable the offloads support on the CX3-pro.

The problem originates from the fact that we can't advertize something 
like "the HW can offload the inner checksum of UDP/VXLAN encapsulated 
(but not for GRE)", e.g in a similar manner that exists in the GSO 
space, where you have NETIF_F_GSO _YYY for each yyy in {UDP, SIT, GRE, 
etc} tunneling scheme.

I think the best effort we can do now is

1. come up with something such as the below patch for 3.18 which is 
back-ward portable for -stable kernels, it will only arm the hw offloads 
if the OS tells us there's VXLAN in action

2. come  up with proper kernel APIs to let NICs advertize which encap 
schemes they can actually offload the inner checksum, Tom... your work 
which now runs over netdev.

Tom/Jesse- thoughts? are you +1-ing the below approach?

Or.

tested to work with the  following which is a bit different, tell me if 
it works for you

# node A - with mlx4_en address192.168.31.18
ip tunnel add gre1 mode gre local 192.168.31.18 remote 192.168.31.17 ttl 255
ifconfig gre1 10.10.10.18/24 up
ifconfig gre1 mtu 1450

# node B - with mlx4_en address192.168.31.17
ip tunnel add gre1 mode gre local 192.168.31.17 remote 192.168.31.18 ttl 255
ifconfig gre1 10.10.10.17/24 up
ifconfig gre1 mtu 1450


diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 0efbae9..7753833 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2292,6 +2292,12 @@ static void mlx4_en_add_vxlan_offloads(struct 
work_struct *work)
  out:
         if (ret)
                 en_err(priv, "failed setting L2 tunnel configuration 
ret %d\n", ret);
+
+       /* set offloads */
+       priv->dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
+                                     NETIF_F_TSO | NETIF_F_GSO_UDP_TUNNEL;
+       priv->dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
+       priv->dev->features    |= NETIF_F_GSO_UDP_TUNNEL;
  }

  static void mlx4_en_del_vxlan_offloads(struct work_struct *work)
@@ -2299,6 +2305,10 @@ static void mlx4_en_del_vxlan_offloads(struct 
work_struct *work)
         int ret;
         struct mlx4_en_priv *priv = container_of(work, struct mlx4_en_priv,
vxlan_del_task);
+       /* unset offloads */
+       priv->dev->hw_enc_features = 0;
+       priv->dev->hw_features &= ~NETIF_F_GSO_UDP_TUNNEL;
+       priv->dev->features    &= ~NETIF_F_GSO_UDP_TUNNEL;

         ret = mlx4_SET_PORT_VXLAN(priv->mdev->dev, priv->port,
                                   VXLAN_STEER_BY_OUTER_MAC, 0);
@@ -2578,13 +2588,6 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, 
int port,
         if (mdev->dev->caps.steering_mode != MLX4_STEERING_MODE_A0)
                 dev->priv_flags |= IFF_UNICAST_FLT;

-       if (mdev->dev->caps.tunnel_offload_mode == 
MLX4_TUNNEL_OFFLOAD_MODE_VXLAN) {
-               dev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
-                                       NETIF_F_TSO | 
NETIF_F_GSO_UDP_TUNNEL;
-               dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
-               dev->features    |= NETIF_F_GSO_UDP_TUNNEL;
-       }
-
         mdev->pndev[port] = dev;

         netif_carrier_off(dev);

  reply	other threads:[~2014-11-05 16:18 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-05 15:04 mlx4+vxlan offload breaks gre tunnels Florian Westphal
2014-11-05 16:17 ` Or Gerlitz [this message]
2014-11-05 16:53   ` Florian Westphal
2014-11-06 16:30     ` Or Gerlitz
2014-11-05 21:59   ` Tom Herbert
2014-11-06  7:03     ` Or Gerlitz
2014-11-06  7:21   ` Sathya Perla
2014-11-06  7:53     ` Or Gerlitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=545A4DB7.5010603@mellanox.com \
    --to=ogerlitz@mellanox.com \
    --cc=amirv@mellanox.com \
    --cc=fw@strlen.de \
    --cc=jesse@nicira.com \
    --cc=netdev@vger.kernel.org \
    --cc=therbert@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).