From mboxrd@z Thu Jan  1 00:00:00 1970
From: Yongseok Koh <yskoh@mellanox.com>
Subject: Re: [PATCH v3 00/13] net/mlx5: e-switch VXLAN
 encap/decap hardware offload
Date: Thu, 1 Nov 2018 20:32:08 +0000
Message-ID: <20181101203200.GA6118@mtidpdk.mti.labs.mlnx>
References: <1539612815-47199-1-git-send-email-viacheslavo@mellanox.com>
 <1541074741-41368-1-git-send-email-viacheslavo@mellanox.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Cc: Shahaf Shuler <shahafs@mellanox.com>, "dev@dpdk.org" <dev@dpdk.org>
To: Slava Ovsiienko <viacheslavo@mellanox.com>
Return-path: <dev-bounces@dpdk.org>
Received: from EUR03-VE1-obe.outbound.protection.outlook.com
 (mail-eopbgr50089.outbound.protection.outlook.com [40.107.5.89])
 by dpdk.org (Postfix) with ESMTP id 36D8D4F9A
 for <dev@dpdk.org>; Thu,  1 Nov 2018 21:32:11 +0100 (CET)
In-Reply-To: <1541074741-41368-1-git-send-email-viacheslavo@mellanox.com>
Content-Language: en-US
Content-ID: <A9250F1327E13546AE6E06E26C2CC952@eurprd05.prod.outlook.com>
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On Thu, Nov 01, 2018 at 05:19:21AM -0700, Slava Ovsiienko wrote:
> This patchset adds the VXLAN encapsulation/decapsulation hardware
> offload feature for E-Switch.
> =20
> A typical use case of tunneling infrastructure is port representors=20
> in switchdev mode, with VXLAN traffic encapsulation performed on
> traffic coming *from* a representor and decapsulation on traffic
> going *to* that representor, in order to transparently assign
> a given VXLAN to VF traffic.
>=20
> Since these actions are supported at the E-Switch level, the "transfer"=20
> attribute must be set on such flow rules. They must also be combined
> with a port redirection action to make sense.
>=20
> Since only ingress is supported, encapsulation flow rules are normally
> applied on a physical port and emit traffic to a port representor.=20
> The opposite order is used for decapsulation.
>=20
> Like other mlx5 E-Switch flow rule actions, these ones are implemented
> through Linux's TC flower API. Since the Linux interface for VXLAN
> encap/decap involves virtual network devices (i.e. ip link add type
> 		vxlan [...]), the PMD dynamically spawns them on a needed basis
> through Netlink calls. These VXLAN implicitly created devices are
> called VTEPs (Virtual Tunnel End Points).
>=20
> VXLAN interfaces are dynamically created for each local port of
> outer networks and then used as targets for TC "flower" filters
> in order to perform encapsulation. For decapsulation the VXLAN
> devices are created for each unique UDP-port. These VXLAN interfaces
> are system-wide, the only one device with given UDP port can exist=20
> in the system (the attempt of creating another device with the=20
> same UDP local port returns EEXIST), so PMD should support the
> shared (between PMD instances) device database.=20
>=20
> Rules samples consideraions:
>=20
> $PF 		- physical device, outer network
> $VF 		- representor for VF, outer/inner network
> $VXLAN		- VTEP netdev name
> $PF_OUTER_IP 	- $PF IP (v4 or v6) within outer network
> $REMOTE_IP 	- remote peer IP (v4 or v6) within outer network
> $LOCAL_PORT	- local UDP port
> $REMOTE_PORT	- remote UDP port
>=20
> VXLAN VTEP creation with iproute2 (PMD does the same via Netlink):
>=20
> - for encapsulation:
>=20
>   ip link add $VXLAN type vxlan dstport $LOCAL_PORT external dev $PF
>   ip link set dev $VXLAN up
>   tc qdisc del dev $VXLAN ingress
>   tc qdisc add dev $VXLAN ingress
>=20
> $LOCAL_PORT for egress encapsulated traffic (note, this is not
> source UDP port in the VXLAN header, it is just UDP port assigned
> to VTEP, no practical usage) is selected from available	UDP ports
> automatically in range 30000-60000.
>=20
> - for decapsulation:
>=20
>   ip link add $VXLAN type vxlan dstport $LOCAL_PORT external
>   ip link set dev $VXLAN up
>   tc qdisc del dev $VXLAN ingress
>   tc qdisc add dev $VXLAN ingress
>=20
> $LOCAL_PORT is UDP port receiving the VXLAN traffic from outer networks.
>=20
> All ingress UDP traffic with given UDP destination port from ALL existing
> netdevs is routed by kernel to the $VXLAN net device. While applying the
> rule the kernel checks the IP parameter withing rule, determines the
> appropriate underlaying PF and tryes to setup the rule hardware offload.
>=20
> VXLAN encapsulation=20
>=20
> VXLAN encap rules are applied to the VF ingress traffic and have the=20
> VTEP as actual redirection destinations instead of outer PF.
> The encapsulation rule should provide:
> - redirection action VF->PF
> - VF port ID
> - some inner network parameters (MACs)=20
> - the tunnel outer source IP (v4/v6), (IS A MUST)
> - the tunnel outer destination IP (v4/v6), (IS A MUST).
> - VNI - Virtual Network Identifier (IS A MUST)
>=20
> VXLAN encapsulation rule sample for tc utility:
>=20
>   tc filter add dev $VF protocol all parent ffff: flower skip_sw \
> 	action tunnel_key set dst_port $REMOTE_PORT \
> 	src_ip $PF_OUTER_IP dst_ip $REMOTE_IP id $VNI \
> 	action mirred egress redirect dev $VXLAN
>=20
> VXLAN encapsulation rule sample for testpmd:
>=20
> - Setting up outer properties of VXLAN tunnel:
>=20
>   set vxlan ip-version ipv4 vni $VNI \
> 	udp-src $IGNORED udp-dst $REMOTE_PORT \
> 	ip-src $PF_OUTER_IP ip-dst $REMOTE_IP \
>  	eth-src $IGNORED eth-dst $REMOTE_MAC
>=20
> - Creating a flow rule on port ID 4 performing VXLAN encapsulation
>   with the abovementioned properties and directing the resulting
>   traffic to port ID 0:
>=20
>   flow create 4 ingress transfer pattern eth src is $INNER_MAC / end
> 	actions vxlan_encap / port_id id 0 / end
>=20
> There is no direct way found to provide kernel with all required
> encapsulatioh header parameters. The encapsulation VTEP is created
> attached to the outer interface and assumed as default path for
> egress encapsulated traffic. The outer tunnel IP address are
> assigned to interface using Netlink, the implicit route is
> created like this:
>=20
>   ip addr add <src_ip> peer <dst_ip> dev <outer> scope link
>=20
> The peer address option provides implicit route, and scope link
> attribute reduces the risk of conflicts. At initialization time all
> local scope link addresses are flushed from the outer network device.
>=20
> The destination MAC address is provided via permenent neigh rule:
>=20
>  ip neigh add dev <outer> lladdr <dst_mac> to <dst_ip> nud permanent
>=20
> At initialization time all neigh rules of permanent type are flushed
> from the outer network device.=20
>=20
> VXLAN decapsulation=20
>=20
> VXLAN decap rules are applied to the ingress traffic of VTEP ($VXLAN)
> device instead of PF. The decapsulation rule should provide:
> - redirection action PF->VF
> - VF port ID as redirection destination
> - $VXLAN device as ingress traffic source
> - the tunnel outer source IP (v4/v6), (optional)
> - the tunnel outer destination IP (v4/v6), (IS A MUST)
> - the tunnel local UDP port (IS A MUST, PMD looks for appropriate VTEP
>   with given local UDP port)
> - VNI - Virtual Network Identifier (IS A MUST)
>=20
> VXLAN decap rule sample for tc utility:=20
>=20
>   tc filter add dev $VXLAN protocol all parent ffff: flower skip_sw \
> 	enc_src_ip $REMOTE_IP enc_dst_ip $PF_OUTER_IP enc_key_id $VNI \
> 	nc_dst_port $LOCAL_PORT \
> 	action tunnel_key unset action mirred egress redirect dev $VF
> 					=09
> VXLAN decap rule sample for testpmd:=20
>=20
> - Creating a flow on port ID 0 performing VXLAN decapsulation and directi=
ng
>   the result to port ID 4 with checking inner properties:
>=20
>   flow create 0 ingress transfer pattern /=20
>   	ipv4 src is $REMOTE_IP dst $PF_LOCAL_IP /
> 	udp src is 9999 dst is $LOCAL_PORT / vxlan vni is $VNI /=20
> 	eth src is 00:11:22:33:44:55 dst is $INNER_MAC / end
>         actions vxlan_decap / port_id id 4 / end
>=20
> The VXLAN encap/decap rules constrains (implied by current kernel support=
)
>=20
> - VXLAN decapsulation provided for PF->VF direction only
> - VXLAN encapsulation provided for VF->PF direction only
> - current implementation will support non-shared database of VTEPs
>   (impossible simultaneous usage of the same UDP port by several
>    instances of DPDK apps)
>=20
> Suggested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---

Excellent commit log!!
One nit. Please change e-switch in the title/log to E-Switch.

Thanks,
Yongseok

> v3:
>   * patchset is resplitted into more dedicated parts
>   * decapsulation rule takes MAC from inner eth item
>   * appropriate RTE_BEx are replaced with runtime rte_cpu_xxx
>   * E-Switch Flow counter deletion is fixed
>   * VTEP management routines are refactored
>   * found typos are corrected
>=20
> v2:
>   * removed non-VXLAN related parts
>   * multipart Netlink messages support
>   * local IP and peer IP rules management
>   * neigh IP address to MAC address rules
>   * management rules cleanup at outer device initialization
>   * attached devices cleanup at outer device initialization
>=20
> v1:
>  * http://patches.dpdk.org/patch/45800/
>  * Refactored code of initial experimental proposal
>=20
> v0:
>  * http://patches.dpdk.org/cover/44080/
>  * Initial proposal by Adrien Mazarguil <adrien.mazarguil@6wind.com>
>=20
> Viacheslav Ovsiienko (13):
>   net/mlx5: prepare makefile for adding e-switch VXLAN
>   net/mlx5: prepare meson.build for adding e-switch VXLAN
>   net/mlx5: add necessary definitions for e-switch VXLAN
>   net/mlx5: add necessary structures for e-switch VXLAN
>   net/mlx5: swap items/actions validations for e-switch rules
>   net/mlx5: add e-switch VXLAN support to validation routine
>   net/mlx5: add VXLAN support to flow prepare routine
>   net/mlx5: add VXLAN support to flow translate routine
>   net/mlx5: e-switch VXLAN netlink routines update
>   net/mlx5: fix e-switch Flow counter deletion
>   net/mlx5: add e-switch VXLAN tunnel devices management
>   net/mlx5: add e-switch VXLAN encapsulation rules
>   net/mlx5: add e-switch VXLAN rule cleanup routines
>=20
>  drivers/net/mlx5/Makefile        |   85 +
>  drivers/net/mlx5/meson.build     |   34 +
>  drivers/net/mlx5/mlx5_flow.h     |   11 +
>  drivers/net/mlx5/mlx5_flow_tcf.c | 5118 +++++++++++++++++++++++++++++---=
------
>  4 files changed, 4107 insertions(+), 1141 deletions(-)
>=20
> --=20
> 1.8.3.1
>=20