From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yongseok Koh Subject: Re: [PATCH v2 5/7] net/mlx5: e-switch VXLAN tunnel devices management Date: Thu, 25 Oct 2018 00:28:16 +0000 Message-ID: <20181025002759.GA26874@mtidpdk.mti.labs.mlnx> References: <1538461807-37507-1-git-send-email-viacheslavo@mellanox.com> <1539612815-47199-1-git-send-email-viacheslavo@mellanox.com> <1539612815-47199-6-git-send-email-viacheslavo@mellanox.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Cc: Shahaf Shuler , "dev@dpdk.org" To: Slava Ovsiienko Return-path: Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-ve1eur01on0080.outbound.protection.outlook.com [104.47.1.80]) by dpdk.org (Postfix) with ESMTP id 1D6A72C2F for ; Thu, 25 Oct 2018 02:28:18 +0200 (CEST) In-Reply-To: <1539612815-47199-6-git-send-email-viacheslavo@mellanox.com> Content-Language: en-US Content-ID: <3D37C818E4B17C458ED1DF1A76C81D3B@eurprd05.prod.outlook.com> List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Mon, Oct 15, 2018 at 02:13:33PM +0000, Viacheslav Ovsiienko wrote: > VXLAN interfaces are dynamically created for each local UDP port > of outer networks and then used as targets for TC "flower" filters > in order to perform encapsulation. These VXLAN interfaces are > system-wide, the only one device with given UDP port can exist > in the system (the attempt of creating another device with the > same UDP local port returns EEXIST), so PMD should support the > shared device instances database for PMD instances. These VXLAN > implicitly created devices are called VTEPs (Virtual Tunnel > End Points). >=20 > Creation of the VTEP occurs at the moment of rule applying. The > link is set up, root ingress qdisc is also initialized. >=20 > Encapsulation VTEPs are created on per port basis, the single > VTEP is attached to the outer interface and is shared for all > encapsulation rules on this interface. The source UDP port is > automatically selected in range 30000-60000. >=20 > For decapsulaton one VTEP is created per every unique UDP > local port to accept tunnel traffic. The name of created > VTEP consists of prefix "vmlx_" and the number of UDP port in > decimal digits without leading zeros (vmlx_4789). The VTEP > can be preliminary created in the system before the launching > application, it allows to share UDP ports between primary > and secondary processes. >=20 > Suggested-by: Adrien Mazarguil > Signed-off-by: Viacheslav Ovsiienko > --- > drivers/net/mlx5/mlx5_flow_tcf.c | 503 +++++++++++++++++++++++++++++++++= +++++- > 1 file changed, 499 insertions(+), 4 deletions(-) >=20 > diff --git a/drivers/net/mlx5/mlx5_flow_tcf.c b/drivers/net/mlx5/mlx5_flo= w_tcf.c > index d6840d5..efa9c3b 100644 > --- a/drivers/net/mlx5/mlx5_flow_tcf.c > +++ b/drivers/net/mlx5/mlx5_flow_tcf.c > @@ -3443,6 +3443,432 @@ struct pedit_parser { > return -err; > } > =20 > +/* VTEP device list is shared between PMD port instances. */ > +static LIST_HEAD(, mlx5_flow_tcf_vtep) > + vtep_list_vxlan =3D LIST_HEAD_INITIALIZER(); > +static pthread_mutex_t vtep_list_mutex =3D PTHREAD_MUTEX_INITIALIZER; What's the reason for choosing pthread_mutex instead of rte_*_lock? > + > +/** > + * Deletes VTEP network device. > + * > + * @param[in] tcf > + * Context object initialized by mlx5_flow_tcf_context_create(). > + * @param[in] vtep > + * Object represinting the network device to delete. Memory > + * allocated for this object is freed by routine. > + */ > +static void > +flow_tcf_delete_iface(struct mlx6_flow_tcf_context *tcf, > + struct mlx5_flow_tcf_vtep *vtep) > +{ > + struct nlmsghdr *nlh; > + struct ifinfomsg *ifm; > + alignas(struct nlmsghdr) > + uint8_t buf[mnl_nlmsg_size(MNL_ALIGN(sizeof(*ifm))) + 8]; > + int ret; > + > + assert(!vtep->refcnt); > + if (vtep->created && vtep->ifindex) { First of all vtep->created seems of no use. It is introduced to select the = error message in flow_tcf_create_iface(). I don't see any necessity to distinguis= h between 'vtep is allocated by rte_malloc()' and 'vtep is created in kernel'= . And why do you need to check vtep->ifindex as well? If vtep is created in k= ernel and its ifindex isn't set, that should be an error which had to be hanled i= n flow_tcf_create_iface(). Such a vtep shouldn't exist. Also, the refcnt management is a bit strange. Please put an abstraction by adding create_iface(), get_iface() and release_iface(). In the get_ifce(), vtep->refcnt should be incremented. And in the release_iface(), it decrease= the refcnt and if it reaches to zero, the iface can be removed. create_iface() = will set the refcnt to 1. And if you refer to mlx5_hrxq_get(), it even does sear= ching the list not by repeating the same lookup code here and there. That will ma= ke your code much simpler. > + DRV_LOG(INFO, "VTEP delete (%d)", vtep->ifindex); > + nlh =3D mnl_nlmsg_put_header(buf); > + nlh->nlmsg_type =3D RTM_DELLINK; > + nlh->nlmsg_flags =3D NLM_F_REQUEST; > + ifm =3D mnl_nlmsg_put_extra_header(nlh, sizeof(*ifm)); > + ifm->ifi_family =3D AF_UNSPEC; > + ifm->ifi_index =3D vtep->ifindex; > + ret =3D flow_tcf_nl_ack(tcf, nlh, 0, NULL, NULL); > + if (ret) > + DRV_LOG(WARNING, "netlink: error deleting VXLAN " > + "encap/decap ifindex %u", > + ifm->ifi_index); > + } > + rte_free(vtep); > +} > + > +/** > + * Creates VTEP network device. > + * > + * @param[in] tcf > + * Context object initialized by mlx5_flow_tcf_context_create(). > + * @param[in] ifouter > + * Outer interface to attach new-created VXLAN device > + * If zero the VXLAN device will not be attached to any device. > + * @param[in] port > + * UDP port of created VTEP device. > + * @param[out] error > + * Perform verbose error reporting if not NULL. > + * > + * @return > + * Pointer to created device structure on success, NULL otherwise > + * and rte_errno is set. > + */ > +#ifndef HAVE_IFLA_VXLAN_COLLECT_METADATA Why negative(ifndef) first intead of positive(ifdef)? > +static struct mlx5_flow_tcf_vtep* > +flow_tcf_create_iface(struct mlx5_flow_tcf_context *tcf __rte_unused, > + unsigned int ifouter __rte_unused, > + uint16_t port __rte_unused, > + struct rte_flow_error *error) > +{ > + rte_flow_error_set(error, ENOTSUP, > + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, > + "netlink: failed to create VTEP, " > + "VXLAN metadat is not supported by kernel"); Typo. > + return NULL; > +} > +#else > +static struct mlx5_flow_tcf_vtep* > +flow_tcf_create_iface(struct mlx5_flow_tcf_context *tcf, How about adding 'vtep'? It sounds vague - creating a general interface. E.g., flow_tcf_create_vtep_iface()? > + unsigned int ifouter, > + uint16_t port, struct rte_flow_error *error) > +{ > + struct mlx5_flow_tcf_vtep *vtep; > + struct nlmsghdr *nlh; > + struct ifinfomsg *ifm; > + char name[sizeof(MLX5_VXLAN_DEVICE_PFX) + 24]; > + alignas(struct nlmsghdr) > + uint8_t buf[mnl_nlmsg_size(sizeof(*ifm)) + 128 + Use a macro for '128'. Can't know the meaning. > + SZ_NLATTR_DATA_OF(sizeof(name)) + > + SZ_NLATTR_NEST * 2 + > + SZ_NLATTR_STRZ_OF("vxlan") + > + SZ_NLATTR_DATA_OF(sizeof(uint32_t)) + > + SZ_NLATTR_DATA_OF(sizeof(uint32_t)) + > + SZ_NLATTR_DATA_OF(sizeof(uint16_t)) + > + SZ_NLATTR_DATA_OF(sizeof(uint8_t))]; > + struct nlattr *na_info; > + struct nlattr *na_vxlan; > + rte_be16_t vxlan_port =3D RTE_BE16(port); Use rte_cpu_to_be_*() instead. > + int ret; > + > + vtep =3D rte_zmalloc(__func__, sizeof(*vtep), > + alignof(struct mlx5_flow_tcf_vtep)); > + if (!vtep) { > + rte_flow_error_set > + (error, ENOMEM, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, > + NULL, "unadble to allocate memory for VTEP desc"); > + return NULL; > + } > + *vtep =3D (struct mlx5_flow_tcf_vtep){ > + .refcnt =3D 0, > + .port =3D port, > + .created =3D 0, > + .ifouter =3D 0, > + .ifindex =3D 0, > + .local =3D LIST_HEAD_INITIALIZER(), > + .neigh =3D LIST_HEAD_INITIALIZER(), > + }; > + memset(buf, 0, sizeof(buf)); > + nlh =3D mnl_nlmsg_put_header(buf); > + nlh->nlmsg_type =3D RTM_NEWLINK; > + nlh->nlmsg_flags =3D NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL; > + ifm =3D mnl_nlmsg_put_extra_header(nlh, sizeof(*ifm)); > + ifm->ifi_family =3D AF_UNSPEC; > + ifm->ifi_type =3D 0; > + ifm->ifi_index =3D 0; > + ifm->ifi_flags =3D IFF_UP; > + ifm->ifi_change =3D 0xffffffff; > + snprintf(name, sizeof(name), "%s%u", MLX5_VXLAN_DEVICE_PFX, port); > + mnl_attr_put_strz(nlh, IFLA_IFNAME, name); > + na_info =3D mnl_attr_nest_start(nlh, IFLA_LINKINFO); > + assert(na_info); > + mnl_attr_put_strz(nlh, IFLA_INFO_KIND, "vxlan"); > + na_vxlan =3D mnl_attr_nest_start(nlh, IFLA_INFO_DATA); > + if (ifouter) > + mnl_attr_put_u32(nlh, IFLA_VXLAN_LINK, ifouter); > + assert(na_vxlan); > + mnl_attr_put_u8(nlh, IFLA_VXLAN_COLLECT_METADATA, 1); > + mnl_attr_put_u8(nlh, IFLA_VXLAN_UDP_ZERO_CSUM6_RX, 1); > + mnl_attr_put_u8(nlh, IFLA_VXLAN_LEARNING, 0); > + mnl_attr_put_u16(nlh, IFLA_VXLAN_PORT, vxlan_port); > + mnl_attr_nest_end(nlh, na_vxlan); > + mnl_attr_nest_end(nlh, na_info); > + assert(sizeof(buf) >=3D nlh->nlmsg_len); > + ret =3D flow_tcf_nl_ack(tcf, nlh, 0, NULL, NULL); > + if (ret) > + DRV_LOG(WARNING, > + "netlink: VTEP %s create failure (%d)", > + name, rte_errno); > + else > + vtep->created =3D 1; Flow of code here isn't smooth, thus could be error-prone. Most of all, I d= on't like ret has multiple meanings. ret should be return value but you are usin= g it to store ifindex. > + if (ret && ifouter) > + ret =3D 0; > + else > + ret =3D if_nametoindex(name); If vtep isn't created and ifouter is set, then skip init below, which means, if vtep is created or ifouter is set, it tries to get ifindex of vte= p. But why do you want to try to call this API even if it failed to create vte= p? Let's not make code flow convoluted even though it logically works. Let's m= ake it straightforward. > + if (ret) { > + vtep->ifindex =3D ret; > + vtep->ifouter =3D ifouter; > + memset(buf, 0, sizeof(buf)); > + nlh =3D mnl_nlmsg_put_header(buf); > + nlh->nlmsg_type =3D RTM_NEWLINK; > + nlh->nlmsg_flags =3D NLM_F_REQUEST; > + ifm =3D mnl_nlmsg_put_extra_header(nlh, sizeof(*ifm)); > + ifm->ifi_family =3D AF_UNSPEC; > + ifm->ifi_type =3D 0; > + ifm->ifi_index =3D vtep->ifindex; > + ifm->ifi_flags =3D IFF_UP; > + ifm->ifi_change =3D IFF_UP; > + ret =3D flow_tcf_nl_ack(tcf, nlh, 0, NULL, NULL); > + if (ret) { > + DRV_LOG(WARNING, > + "netlink: VTEP %s set link up failure (%d)", > + name, rte_errno); > + rte_free(vtep); > + rte_flow_error_set > + (error, -errno, > + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, > + "netlink: failed to set VTEP link up"); > + vtep =3D NULL; > + } else { > + ret =3D mlx5_flow_tcf_init(tcf, vtep->ifindex, error); > + if (ret) > + DRV_LOG(WARNING, > + "VTEP %s init failure (%d)", name, rte_errno); > + } > + } else { > + DRV_LOG(WARNING, > + "VTEP %s failed to get index (%d)", name, errno); > + rte_flow_error_set > + (error, -errno, > + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, > + !vtep->created ? "netlink: failed to create VTEP" : > + "netlink: failed to retrieve VTEP ifindex"); > + ret =3D 1; If it fails to create a vtep above, it will print out two warning messages = and one rte_flow_error message. And it even selects message to print between tw= o? And there's another info msg at the end even in case of failure. Do you rea= lly want to do this even with manipulating ret to change code path? Not a good practice. Usually, code path should be straightforward for sucessful path and for errors/failures, return immediately or use 'goto' if there's need for clean= up. Please refactor entire function. > + } > + if (ret) { > + flow_tcf_delete_iface(tcf, vtep); > + vtep =3D NULL; > + } > + DRV_LOG(INFO, "VTEP create (%d, %s)", vtep->port, vtep ? "OK" : "error"= ); > + return vtep; > +} > +#endif /* HAVE_IFLA_VXLAN_COLLECT_METADATA */ > + > +/** > + * Create target interface index for VXLAN tunneling decapsulation. > + * In order to share the UDP port within the other interfaces the > + * VXLAN device created as not attached to any interface (if created). > + * > + * @param[in] tcf > + * Context object initialized by mlx5_flow_tcf_context_create(). > + * @param[in] dev_flow > + * Flow tcf object with tunnel structure pointer set. > + * @param[out] error > + * Perform verbose error reporting if not NULL. > + * @return > + * Interface index on success, zero otherwise and rte_errno is set. Return negative errno in case of failure like others. * Interface index on success, a negative errno value otherwise and rte_e= rrno is set. > + */ > +static unsigned int > +flow_tcf_decap_vtep_create(struct mlx5_flow_tcf_context *tcf, > + struct mlx5_flow *dev_flow, > + struct rte_flow_error *error) > +{ > + struct mlx5_flow_tcf_vtep *vtep, *vlst; > + uint16_t port =3D dev_flow->tcf.vxlan_decap->udp_port; > + > + vtep =3D NULL; > + LIST_FOREACH(vlst, &vtep_list_vxlan, next) { > + if (vlst->port =3D=3D port) { > + vtep =3D vlst; > + break; > + } > + } You just need one variable. struct mlx5_flow_tcf_vtep *vtep; LIST_FOREACH(vtep, &vtep_list_vxlan, next) { if (vtep->port =3D=3D port) break; } > + if (!vtep) { > + vtep =3D flow_tcf_create_iface(tcf, 0, port, error); > + if (vtep) > + LIST_INSERT_HEAD(&vtep_list_vxlan, vtep, next); > + } else { > + if (vtep->ifouter) { > + rte_flow_error_set(error, -errno, > + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, > + "Failed to create decap VTEP, attached " > + "device with the same UDP port exists"); > + vtep =3D NULL; Making vtep null to skip the following code? Please merge the two same if/e= lse and make the code path strightforward. And which errno do you expect here? Should it be set EEXIST instead? > + } > + } > + if (vtep) { > + vtep->refcnt++; > + assert(vtep->ifindex); > + return vtep->ifindex; > + } else { > + return 0; > + } Why repeating same if/else? This is my suggestion but if you take my suggestion to have flow_tcf_[create|get|release]_iface(), this will get much simpler. { struct mlx5_flow_tcf_vtep *vtep; uint16_t port =3D dev_flow->tcf.vxlan_decap->udp_port; LIST_FOREACH(vtep, &vtep_list_vxlan, next) { if (vtep->port =3D=3D port) break; } if (vtep && vtep->ifouter) return rte_flow_error_set(... EEXIST ...); else if (vtep) { ++vtep->refcnt; } else { vtep =3D flow_tcf_create_iface(tcf, 0, port, error); if (!vtep) return rte_flow_error_set(...); LIST_INSERT_HEAD(&vtep_list_vxlan, vtep, next); } assert(vtep->ifindex); return vtep->ifindex; } > +} > + > +/** > + * Creates target interface index for VXLAN tunneling encapsulation. > + * > + * @param[in] tcf > + * Context object initialized by mlx5_flow_tcf_context_create(). > + * @param[in] ifouter > + * Network interface index to attach VXLAN encap device to. > + * @param[in] dev_flow > + * Flow tcf object with tunnel structure pointer set. > + * @param[out] error > + * Perform verbose error reporting if not NULL. > + * @return > + * Interface index on success, zero otherwise and rte_errno is set. > + */ > +static unsigned int > +flow_tcf_encap_vtep_create(struct mlx5_flow_tcf_context *tcf, > + unsigned int ifouter, > + struct mlx5_flow *dev_flow __rte_unused, > + struct rte_flow_error *error) > +{ > + static uint16_t encap_port =3D MLX5_VXLAN_PORT_RANGE_MIN - 1; > + struct mlx5_flow_tcf_vtep *vtep, *vlst; > + > + assert(ifouter); > + /* Look whether the attached VTEP for encap is created. */ > + vtep =3D NULL; > + LIST_FOREACH(vlst, &vtep_list_vxlan, next) { > + if (vlst->ifouter =3D=3D ifouter) { > + vtep =3D vlst; > + break; > + } > + } Same here. > + if (!vtep) { > + uint16_t pcnt; > + > + /* Not found, we should create the new attached VTEP. */ > +/* > + * TODO: not implemented yet > + * flow_tcf_encap_iface_cleanup(tcf, ifouter); > + * flow_tcf_encap_local_cleanup(tcf, ifouter); > + * flow_tcf_encap_neigh_cleanup(tcf, ifouter); > + */ Personal note is not appropriate even though it is removed in the following patch. > + for (pcnt =3D 0; pcnt <=3D (MLX5_VXLAN_PORT_RANGE_MAX > + - MLX5_VXLAN_PORT_RANGE_MIN); pcnt++) { > + encap_port++; > + /* Wraparound the UDP port index. */ > + if (encap_port < MLX5_VXLAN_PORT_RANGE_MIN || > + encap_port > MLX5_VXLAN_PORT_RANGE_MAX) > + encap_port =3D MLX5_VXLAN_PORT_RANGE_MIN; > + /* Check whether UDP port is in already in use. */ > + vtep =3D NULL; > + LIST_FOREACH(vlst, &vtep_list_vxlan, next) { > + if (vlst->port =3D=3D encap_port) { > + vtep =3D vlst; > + break; > + } > + } If you want to find out an empty port number, you can use rte_bitmap instea= d of repeating searching the entire list for all possible port numbers. > + if (vtep) { > + vtep =3D NULL; > + continue; > + } > + vtep =3D flow_tcf_create_iface(tcf, ifouter, > + encap_port, error); > + if (vtep) { > + LIST_INSERT_HEAD(&vtep_list_vxlan, vtep, next); > + break; > + } > + if (rte_errno !=3D EEXIST) > + break; > + } > + } > + if (!vtep) > + return 0; > + vtep->refcnt++; > + assert(vtep->ifindex); > + return vtep->ifindex; Please refactor this func according to what I suggested for flow_tcf_decap_vtep_create() and flow_tcf_delete_iface(). > +} > + > +/** > + * Creates target interface index for tunneling of any type. > + * > + * @param[in] tcf > + * Context object initialized by mlx5_flow_tcf_context_create(). > + * @param[in] ifouter > + * Network interface index to attach VXLAN encap device to. > + * @param[in] dev_flow > + * Flow tcf object with tunnel structure pointer set. > + * @param[out] error > + * Perform verbose error reporting if not NULL. > + * @return > + * Interface index on success, zero otherwise and rte_errno is set. * Interface index on success, a negative errno value otherwise and * rte_errno is set. > + */ > +static unsigned int > +flow_tcf_tunnel_vtep_create(struct mlx5_flow_tcf_context *tcf, > + unsigned int ifouter, > + struct mlx5_flow *dev_flow, > + struct rte_flow_error *error) > +{ > + unsigned int ret; > + > + assert(dev_flow->tcf.tunnel); > + pthread_mutex_lock(&vtep_list_mutex); > + switch (dev_flow->tcf.tunnel->type) { > + case MLX5_FLOW_TCF_TUNACT_VXLAN_ENCAP: > + ret =3D flow_tcf_encap_vtep_create(tcf, ifouter, > + dev_flow, error); > + break; > + case MLX5_FLOW_TCF_TUNACT_VXLAN_DECAP: > + ret =3D flow_tcf_decap_vtep_create(tcf, dev_flow, error); > + break; > + default: > + rte_flow_error_set(error, ENOTSUP, > + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, > + "unsupported tunnel type"); > + ret =3D 0; > + break; > + } > + pthread_mutex_unlock(&vtep_list_mutex); > + return ret; > +} > + > +/** > + * Deletes tunneling interface by UDP port. > + * > + * @param[in] tcf > + * Context object initialized by mlx5_flow_tcf_context_create(). > + * @param[in] ifindex > + * Network interface index of VXLAN device. > + * @param[in] dev_flow > + * Flow tcf object with tunnel structure pointer set. > + */ > +static void > +flow_tcf_tunnel_vtep_delete(struct mlx5_flow_tcf_context *tcf, > + unsigned int ifindex, > + struct mlx5_flow *dev_flow) > +{ > + struct mlx5_flow_tcf_vtep *vtep, *vlst; > + > + assert(dev_flow->tcf.tunnel); > + pthread_mutex_lock(&vtep_list_mutex); > + vtep =3D NULL; > + LIST_FOREACH(vlst, &vtep_list_vxlan, next) { > + if (vlst->ifindex =3D=3D ifindex) { > + vtep =3D vlst; > + break; > + } > + } It is weird. You just can have vtep pointer in the dev_flow->tcf.tunnel ins= tead of ifindex_tun which is same as vtep->ifindex like the assertion below. The= n, this lookup can be skipped. > + if (!vtep) { > + DRV_LOG(WARNING, "No VTEP device found in the list"); > + goto exit; > + } > + switch (dev_flow->tcf.tunnel->type) { > + case MLX5_FLOW_TCF_TUNACT_VXLAN_DECAP: > + break; > + case MLX5_FLOW_TCF_TUNACT_VXLAN_ENCAP: > +/* > + * TODO: Remove the encap ancillary rules first. > + * flow_tcf_encap_neigh(tcf, vtep, dev_flow, false, NULL); > + * flow_tcf_encap_local(tcf, vtep, dev_flow, false, NULL); > + */ Is it a personal note? Please remove. > + break; > + default: > + assert(false); > + DRV_LOG(WARNING, "Unsupported tunnel type"); > + break; > + } > + assert(dev_flow->tcf.tunnel->ifindex_tun =3D=3D vtep->ifindex); > + assert(vtep->refcnt); > + if (!vtep->refcnt || !--vtep->refcnt) { > + LIST_REMOVE(vtep, next); > + flow_tcf_delete_iface(tcf, vtep); > + } > +exit: > + pthread_mutex_unlock(&vtep_list_mutex); > +} > + > /** > * Apply flow to E-Switch by sending Netlink message. > * > @@ -3461,18 +3887,61 @@ struct pedit_parser { > struct rte_flow_error *error) > { > struct priv *priv =3D dev->data->dev_private; > - struct mlx5_flow_tcf_context *nl =3D priv->tcf_context; > + struct mlx5_flow_tcf_context *tcf =3D priv->tcf_context; > struct mlx5_flow *dev_flow; > struct nlmsghdr *nlh; > + int ret; > =20 > dev_flow =3D LIST_FIRST(&flow->dev_flows); > /* E-Switch flow can't be expanded. */ > assert(!LIST_NEXT(dev_flow, next)); > + if (dev_flow->tcf.applied) > + return 0; > nlh =3D dev_flow->tcf.nlh; > nlh->nlmsg_type =3D RTM_NEWTFILTER; > nlh->nlmsg_flags =3D NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL; > - if (!flow_tcf_nl_ack(nl, nlh, 0, NULL, NULL)) > + if (dev_flow->tcf.tunnel) { > + /* > + * Replace the interface index, target for > + * encapsulation, source for decapsulation. > + */ > + assert(!dev_flow->tcf.tunnel->ifindex_tun); > + assert(dev_flow->tcf.tunnel->ifindex_ptr); > + /* Create actual VTEP device when rule is being applied. */ > + dev_flow->tcf.tunnel->ifindex_tun > + =3D flow_tcf_tunnel_vtep_create(tcf, > + *dev_flow->tcf.tunnel->ifindex_ptr, > + dev_flow, error); > + DRV_LOG(INFO, "Replace ifindex: %d->%d", > + dev_flow->tcf.tunnel->ifindex_tun, > + *dev_flow->tcf.tunnel->ifindex_ptr); > + if (!dev_flow->tcf.tunnel->ifindex_tun) > + return -rte_errno; > + dev_flow->tcf.tunnel->ifindex_org > + =3D *dev_flow->tcf.tunnel->ifindex_ptr; > + *dev_flow->tcf.tunnel->ifindex_ptr > + =3D dev_flow->tcf.tunnel->ifindex_tun; > + } > + ret =3D flow_tcf_nl_ack(tcf, nlh, 0, NULL, NULL); > + if (dev_flow->tcf.tunnel) { > + DRV_LOG(INFO, "Restore ifindex: %d->%d", > + dev_flow->tcf.tunnel->ifindex_org, > + *dev_flow->tcf.tunnel->ifindex_ptr); > + *dev_flow->tcf.tunnel->ifindex_ptr > + =3D dev_flow->tcf.tunnel->ifindex_org; > + dev_flow->tcf.tunnel->ifindex_org =3D 0; ifindex_org looks a temporary storage in this code. And this kind of hassle (replace/restore) is there because you took the ifindex from the netlink message. Why don't you have just struct mlx5_flow_tcf_tunnel_hdr { uint32_t type; /**< Tunnel action type. */ unsigned int ifindex; /**< Original dst/src interface */ struct mlx5_flow_tcf_vtep *vtep; /**< Tunnel endpoint device. */ unsigned int *nlmsg_ifindex_ptr; /**< ifindex ptr in Netlink message. */ }; and don't change ifindex? Thanks, Yongseok > + } > + if (!ret) { > + dev_flow->tcf.applied =3D 1; > return 0; > + } > + DRV_LOG(WARNING, "netlink: failed to create TC rule (%d)", rte_errno); > + if (dev_flow->tcf.tunnel->ifindex_tun) { > + flow_tcf_tunnel_vtep_delete(tcf, > + dev_flow->tcf.tunnel->ifindex_tun, > + dev_flow); > + dev_flow->tcf.tunnel->ifindex_tun =3D 0; > + } > return rte_flow_error_set(error, rte_errno, > RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, > "netlink: failed to create TC flow rule"); > @@ -3490,7 +3959,7 @@ struct pedit_parser { > flow_tcf_remove(struct rte_eth_dev *dev, struct rte_flow *flow) > { > struct priv *priv =3D dev->data->dev_private; > - struct mlx5_flow_tcf_context *nl =3D priv->tcf_context; > + struct mlx5_flow_tcf_context *tcf =3D priv->tcf_context; > struct mlx5_flow *dev_flow; > struct nlmsghdr *nlh; > =20 > @@ -3501,10 +3970,36 @@ struct pedit_parser { > return; > /* E-Switch flow can't be expanded. */ > assert(!LIST_NEXT(dev_flow, next)); > + if (!dev_flow->tcf.applied) > + return; > + if (dev_flow->tcf.tunnel) { > + /* > + * Replace the interface index, target for > + * encapsulation, source for decapsulation. > + */ > + assert(dev_flow->tcf.tunnel->ifindex_tun); > + assert(dev_flow->tcf.tunnel->ifindex_ptr); > + dev_flow->tcf.tunnel->ifindex_org > + =3D *dev_flow->tcf.tunnel->ifindex_ptr; > + *dev_flow->tcf.tunnel->ifindex_ptr > + =3D dev_flow->tcf.tunnel->ifindex_tun; > + } > nlh =3D dev_flow->tcf.nlh; > nlh->nlmsg_type =3D RTM_DELTFILTER; > nlh->nlmsg_flags =3D NLM_F_REQUEST; > - flow_tcf_nl_ack(nl, nlh, 0, NULL, NULL); > + flow_tcf_nl_ack(tcf, nlh, 0, NULL, NULL); > + if (dev_flow->tcf.tunnel) { > + *dev_flow->tcf.tunnel->ifindex_ptr > + =3D dev_flow->tcf.tunnel->ifindex_org; > + dev_flow->tcf.tunnel->ifindex_org =3D 0; > + if (dev_flow->tcf.tunnel->ifindex_tun) { > + flow_tcf_tunnel_vtep_delete(tcf, > + dev_flow->tcf.tunnel->ifindex_tun, > + dev_flow); > + dev_flow->tcf.tunnel->ifindex_tun =3D 0; > + } > + } > + dev_flow->tcf.applied =3D 0; > } > =20 > /** >=20