All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Hu, Jiayu" <jiayu.hu@intel.com>
To: "yang_y_yi@163.com" <yang_y_yi@163.com>, "dev@dpdk.org" <dev@dpdk.org>
Cc: "thomas@monjalon.net" <thomas@monjalon.net>,
	"yangyi01@inspur.com" <yangyi01@inspur.com>
Subject: Re: [dpdk-dev] [PATCH v6 2/3] gro: add VXLAN UDP/IPv4 GRO support
Date: Mon, 21 Sep 2020 07:54:36 +0000	[thread overview]
Message-ID: <f565c530272b46ed893a0738323cec35@intel.com> (raw)
In-Reply-To: <20200917034959.194372-3-yang_y_yi@163.com>

Hi Yi,

Some comments are inline.

Thanks,
Jiayu

> -----Original Message-----
> From: yang_y_yi@163.com <yang_y_yi@163.com>
> Sent: Thursday, September 17, 2020 11:50 AM
> To: dev@dpdk.org
> Cc: Hu, Jiayu <jiayu.hu@intel.com>; thomas@monjalon.net;
> yangyi01@inspur.com; yang_y_yi@163.com
> Subject: [PATCH v6 2/3] gro: add VXLAN UDP/IPv4 GRO support
> 
> From: Yi Yang <yangyi01@inspur.com>
> 
> VXLAN UDP/IPv4 GRO can help improve VM-to-VM UDP
> performance when UFO or GSO is enabled in VM, GRO
> must be supported if UFO or GSO is enabled,
> otherwise, performance can't get big improvement
> if only GSO is there.
> 
> With this enabled in DPDK, OVS DPDK can leverage it
> to improve VM-to-VM UDP performance, it will reassemble
> VXLAN UDP/IPv4 fragments immediate after they are
> received from a physical NIC. It is very helpful in
> OVS DPDK VXLAN use case.
> 
> Note: outer IP ID isn't used to check if two packets
> are same flow and can be merged because the difference
> between outer IP IDs of two packets isn't always +/-1
> in case of OVS DPDK.
> 
> Signed-off-by: Yi Yang <yangyi01@inspur.com>
> ---
>  lib/librte_gro/gro_udp4.h       |   1 +
>  lib/librte_gro/gro_vxlan_udp4.c | 542
> ++++++++++++++++++++++++++++++++++++++++
>  lib/librte_gro/gro_vxlan_udp4.h | 154 ++++++++++++
>  lib/librte_gro/meson.build      |   2 +-
>  lib/librte_gro/rte_gro.c        | 115 +++++++--
>  lib/librte_gro/rte_gro.h        |   3 +
>  6 files changed, 790 insertions(+), 27 deletions(-)
>  create mode 100644 lib/librte_gro/gro_vxlan_udp4.c
>  create mode 100644 lib/librte_gro/gro_vxlan_udp4.h
> 
> diff --git a/lib/librte_gro/gro_udp4.h b/lib/librte_gro/gro_udp4.h
> index 0a078e4..d38b393 100644
> --- a/lib/librte_gro/gro_udp4.h
> +++ b/lib/librte_gro/gro_udp4.h
> @@ -7,6 +7,7 @@
> 
>  #include <rte_ip.h>
>  #include <rte_udp.h>
> +#include <rte_vxlan.h>
> 
>  #define INVALID_ARRAY_INDEX 0xffffffffUL
>  #define GRO_UDP4_TBL_MAX_ITEM_NUM (1024UL * 1024UL)
> diff --git a/lib/librte_gro/gro_vxlan_udp4.c b/lib/librte_gro/gro_vxlan_udp4.c
> new file mode 100644
> index 0000000..4eece56
> --- /dev/null
> +++ b/lib/librte_gro/gro_vxlan_udp4.c
> @@ -0,0 +1,542 @@
> +
> +uint16_t
> +gro_vxlan_udp4_tbl_timeout_flush(struct gro_vxlan_udp4_tbl *tbl,
> +		uint64_t flush_timestamp,
> +		struct rte_mbuf **out,
> +		uint16_t nb_out)
> +{
> +	uint16_t k = 0;
> +	uint32_t i, j;
> +	uint32_t max_flow_num = tbl->max_flow_num;
> +
> +	for (i = 0; i < max_flow_num; i++) {
> +		if (unlikely(tbl->flow_num == 0))
> +			return k;
> +
> +		j = tbl->flows[i].start_index;
> +		while (j != INVALID_ARRAY_INDEX) {
> +			if (tbl->items[j].inner_item.start_time <=
> +					flush_timestamp) {
> +				gro_vxlan_udp4_merge_items(tbl, j);
> +				out[k++] = tbl->items[j].inner_item.firstseg;
> +				if (tbl->items[j].inner_item.nb_merged > 1)
> +					update_vxlan_header(&(tbl-
> >items[j]));
> +				/*
> +				 * Delete the item and get the next packet
> +				 * index.
> +				 */
> +				j = delete_item(tbl, j, INVALID_ARRAY_INDEX);
> +				tbl->flows[i].start_index = j;
> +				if (j == INVALID_ARRAY_INDEX)
> +					tbl->flow_num--;
> +
> +				if (unlikely(k == nb_out))
> +					return k;
> +			} else
> +				/*
> +				 * The left packets in the flow won't be
> +				 * timeout. Go to check other flows.
> +				 */
> +				break;

The items of a flow are ordered by frag_oft, and start_time
of these items is not always in ascending order. Therefore,
you cannot skip checking the items after the item whose
start_time is greater than flush_timestamp. This issue also
exists in UDP/IPv4 GRO, and need to correct them both.

> +		}
> +	}
> +	return k;
> +}
> diff --git a/lib/librte_gro/gro_vxlan_udp4.h
> b/lib/librte_gro/gro_vxlan_udp4.h
> new file mode 100644
> index 0000000..6a42fb3
> --- /dev/null
> +++ b/lib/librte_gro/gro_vxlan_udp4.h
> @@ -0,0 +1,154 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Inspur Corporation
> + */
> +
> +#ifndef _GRO_VXLAN_UDP4_H_
> +#define _GRO_VXLAN_UDP4_H_
> +
> +#include "gro_udp4.h"
> +
> +#define GRO_VXLAN_UDP4_TBL_MAX_ITEM_NUM (1024UL * 1024UL)
> +
> +/* Header fields representing a VxLAN flow */
> +struct vxlan_udp4_flow_key {
> +	struct udp4_flow_key inner_key;
> +	struct rte_vxlan_hdr vxlan_hdr;
> +
> +	struct rte_ether_addr outer_eth_saddr;
> +	struct rte_ether_addr outer_eth_daddr;
> +
> +	uint32_t outer_ip_src_addr;
> +	uint32_t outer_ip_dst_addr;
> +
> +	/* Note: It is unnecessary to save outer_src_port here because it can
> +	 * be different for VxLAN UDP fragments from the same flow.
> +	 */
> +	uint16_t outer_dst_port;
> +

The above empty line is unnecessary.

> +};
> +
> +
> +struct gro_vxlan_udp4_item {
> +	struct gro_udp4_item inner_item;
> +	/* Note: VXLAN UDP/IPv4 GRO needn't check outer_ip_id because
> +	 * the difference between outer_ip_ids of two received packets
> +	 * isn't always +/-1 in case of OVS DPDK. So no outer_ip_id
> +	 * and outer_is_atomic fields here.
> +	 */

It seems the above comments for outer IP ID is enough, and no need
to highlight in the commit log again. How do you think?

> +};
> +
> diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> index f623230..db990cf 100644
> --- a/lib/librte_gro/rte_gro.c
> +++ b/lib/librte_gro/rte_gro.c
> @@ -11,6 +11,7 @@
>  #include "gro_tcp4.h"
>  #include "gro_udp4.h"
>  #include "gro_vxlan_tcp4.h"
> +#include "gro_vxlan_udp4.h"
> 
>  /*
>   * GRO context structure. It keeps the table structures, which are
> @@ -137,19 +148,27 @@ struct gro_ctx {
>  	struct gro_udp4_item udp_items[RTE_GRO_MAX_BURST_ITEM_NUM]
> = {{0} };
> 
>  	/* Allocate a reassembly table for VXLAN TCP GRO */
> -	struct gro_vxlan_tcp4_tbl vxlan_tbl;
> -	struct gro_vxlan_tcp4_flow
> vxlan_flows[RTE_GRO_MAX_BURST_ITEM_NUM];
> -	struct gro_vxlan_tcp4_item
> vxlan_items[RTE_GRO_MAX_BURST_ITEM_NUM]
> +	struct gro_vxlan_tcp4_tbl vxlan_tcp_tbl;
> +	struct gro_vxlan_tcp4_flow
> vxlan_tcp_flows[RTE_GRO_MAX_BURST_ITEM_NUM];
> +	struct gro_vxlan_tcp4_item
> vxlan_tcp_items[RTE_GRO_MAX_BURST_ITEM_NUM]
>  			= {{{0}, 0, 0} };
> 
> +	/* Allocate a reassembly table for VXLAN UDP GRO */
> +	struct gro_vxlan_udp4_tbl vxlan_udp_tbl;
> +	struct gro_vxlan_udp4_flow
> vxlan_udp_flows[RTE_GRO_MAX_BURST_ITEM_NUM];
> +	struct gro_vxlan_udp4_item
> vxlan_udp_items[RTE_GRO_MAX_BURST_ITEM_NUM]
> +			= {{{0}} };
> +
>  	struct rte_mbuf *unprocess_pkts[nb_pkts];
>  	uint32_t item_num;
>  	int32_t ret;
>  	uint16_t i, unprocess_num = 0, nb_after_gro = nb_pkts;
> -	uint8_t do_tcp4_gro = 0, do_vxlan_gro = 0, do_udp4_gro = 0;
> +	uint8_t do_tcp4_gro = 0, do_vxlan_tcp_gro = 0, do_udp4_gro = 0,
> +		do_vxlan_udp_gro = 0;
> 
>  	if (unlikely((param->gro_types & (RTE_GRO_IPV4_VXLAN_TCP_IPV4 |
>  					RTE_GRO_TCP_IPV4 |
> +					RTE_GRO_IPV4_VXLAN_UDP_IPV4 |
>  					RTE_GRO_UDP_IPV4)) == 0))
>  		return nb_pkts;
> 
> @@ -160,15 +179,28 @@ struct gro_ctx {
> 
>  	if (param->gro_types & RTE_GRO_IPV4_VXLAN_TCP_IPV4) {
>  		for (i = 0; i < item_num; i++)
> -			vxlan_flows[i].start_index = INVALID_ARRAY_INDEX;
> -
> -		vxlan_tbl.flows = vxlan_flows;
> -		vxlan_tbl.items = vxlan_items;
> -		vxlan_tbl.flow_num = 0;
> -		vxlan_tbl.item_num = 0;
> -		vxlan_tbl.max_flow_num = item_num;
> -		vxlan_tbl.max_item_num = item_num;
> -		do_vxlan_gro = 1;
> +			vxlan_tcp_flows[i].start_index =
> INVALID_ARRAY_INDEX;
> +
> +		vxlan_tcp_tbl.flows = vxlan_tcp_flows;
> +		vxlan_tcp_tbl.items = vxlan_tcp_items;
> +		vxlan_tcp_tbl.flow_num = 0;
> +		vxlan_tcp_tbl.item_num = 0;
> +		vxlan_tcp_tbl.max_flow_num = item_num;
> +		vxlan_tcp_tbl.max_item_num = item_num;
> +		do_vxlan_tcp_gro = 1;
> +	}
> +
> +	if (param->gro_types & RTE_GRO_IPV4_VXLAN_UDP_IPV4) {
> +		for (i = 0; i < item_num; i++)
> +			vxlan_udp_flows[i].start_index =
> INVALID_ARRAY_INDEX;
> +
> +		vxlan_udp_tbl.flows = vxlan_udp_flows;
> +		vxlan_udp_tbl.items = vxlan_udp_items;
> +		vxlan_udp_tbl.flow_num = 0;
> +		vxlan_udp_tbl.item_num = 0;
> +		vxlan_udp_tbl.max_flow_num = item_num;
> +		vxlan_udp_tbl.max_item_num = item_num;
> +		do_vxlan_udp_gro = 1;
>  	}
> 
>  	if (param->gro_types & RTE_GRO_TCP_IPV4) {
> @@ -204,9 +236,18 @@ struct gro_ctx {
>  		 * will be flushed from the tables.
>  		 */
>  		if (IS_IPV4_VXLAN_TCP4_PKT(pkts[i]->packet_type) &&
> -				do_vxlan_gro) {
> +				do_vxlan_tcp_gro) {
>  			ret = gro_vxlan_tcp4_reassemble(pkts[i],
> -							&vxlan_tbl, 0);
> +							&vxlan_tcp_tbl, 0);
> +			if (ret > 0)
> +				/* Merge successfully */
> +				nb_after_gro--;
> +			else if (ret < 0)
> +				unprocess_pkts[unprocess_num++] = pkts[i];
> +		} else if (IS_IPV4_VXLAN_UDP4_PKT(pkts[i]->packet_type) &&
> +				do_vxlan_udp_gro) {
> +			ret = gro_vxlan_udp4_reassemble(pkts[i],
> +							&vxlan_udp_tbl, 0);
>  			if (ret > 0)
>  				/* Merge successfully */
>  				nb_after_gro--;
> @@ -236,11 +277,17 @@ struct gro_ctx {
>  		 || (unprocess_num < nb_pkts)) {
>  		i = 0;
>  		/* Flush all packets from the tables */
> -		if (do_vxlan_gro) {
> -			i = gro_vxlan_tcp4_tbl_timeout_flush(&vxlan_tbl,
> +		if (do_vxlan_tcp_gro) {
> +			i = gro_vxlan_tcp4_tbl_timeout_flush(&vxlan_tcp_tbl,
>  					0, pkts, nb_pkts);
>  		}
> 
> +		if (do_vxlan_udp_gro) {
> +			i +=
> gro_vxlan_udp4_tbl_timeout_flush(&vxlan_udp_tbl,
> +					0, &pkts[i], nb_pkts - i);
> +
> +		}
> +
>  		if (do_tcp4_gro) {
>  			i += gro_tcp4_tbl_timeout_flush(&tcp_tbl, 0,
>  					&pkts[i], nb_pkts - i);
> @@ -269,33 +316,42 @@ struct gro_ctx {
>  {
>  	struct rte_mbuf *unprocess_pkts[nb_pkts];
>  	struct gro_ctx *gro_ctx = ctx;
> -	void *tcp_tbl, *udp_tbl, *vxlan_tbl;
> +	void *tcp_tbl, *udp_tbl, *vxlan_tcp_tbl, *vxlan_udp_tbl;
>  	uint64_t current_time;
>  	uint16_t i, unprocess_num = 0;
> -	uint8_t do_tcp4_gro, do_vxlan_gro, do_udp4_gro;
> +	uint8_t do_tcp4_gro, do_vxlan_tcp_gro, do_udp4_gro,
> do_vxlan_udp_gro;
> 
>  	if (unlikely((gro_ctx->gro_types & (RTE_GRO_IPV4_VXLAN_TCP_IPV4
> |
>  					RTE_GRO_TCP_IPV4 |
> +					RTE_GRO_IPV4_VXLAN_UDP_IPV4 |
>  					RTE_GRO_UDP_IPV4)) == 0))
>  		return nb_pkts;
> 
>  	tcp_tbl = gro_ctx->tbls[RTE_GRO_TCP_IPV4_INDEX];
> -	vxlan_tbl = gro_ctx->tbls[RTE_GRO_IPV4_VXLAN_TCP_IPV4_INDEX];
> +	vxlan_tcp_tbl = gro_ctx-
> >tbls[RTE_GRO_IPV4_VXLAN_TCP_IPV4_INDEX];
>  	udp_tbl = gro_ctx->tbls[RTE_GRO_UDP_IPV4_INDEX];
> +	vxlan_udp_tbl = gro_ctx-
> >tbls[RTE_GRO_IPV4_VXLAN_UDP_IPV4_INDEX];
> 
>  	do_tcp4_gro = (gro_ctx->gro_types & RTE_GRO_TCP_IPV4) ==
>  		RTE_GRO_TCP_IPV4;
> -	do_vxlan_gro = (gro_ctx->gro_types &
> RTE_GRO_IPV4_VXLAN_TCP_IPV4) ==
> +	do_vxlan_tcp_gro = (gro_ctx->gro_types &
> RTE_GRO_IPV4_VXLAN_TCP_IPV4) ==
>  		RTE_GRO_IPV4_VXLAN_TCP_IPV4;
>  	do_udp4_gro = (gro_ctx->gro_types & RTE_GRO_UDP_IPV4) ==
>  		RTE_GRO_UDP_IPV4;
> +	do_vxlan_udp_gro = (gro_ctx->gro_types &
> RTE_GRO_IPV4_VXLAN_UDP_IPV4) ==
> +		RTE_GRO_IPV4_VXLAN_UDP_IPV4;
> 
>  	current_time = rte_rdtsc();
> 
>  	for (i = 0; i < nb_pkts; i++) {
>  		if (IS_IPV4_VXLAN_TCP4_PKT(pkts[i]->packet_type) &&
> -				do_vxlan_gro) {
> -			if (gro_vxlan_tcp4_reassemble(pkts[i], vxlan_tbl,
> +				do_vxlan_tcp_gro) {
> +			if (gro_vxlan_tcp4_reassemble(pkts[i], vxlan_tcp_tbl,
> +						current_time) < 0)
> +				unprocess_pkts[unprocess_num++] = pkts[i];
> +		} else if (IS_IPV4_VXLAN_UDP4_PKT(pkts[i]->packet_type) &&
> +				do_vxlan_udp_gro) {
> +			if (gro_vxlan_udp4_reassemble(pkts[i],
> vxlan_udp_tbl,
>  						current_time) < 0)
>  				unprocess_pkts[unprocess_num++] = pkts[i];
>  		} else if (IS_IPV4_TCP_PKT(pkts[i]->packet_type) &&
> @@ -341,6 +397,13 @@ struct gro_ctx {
>  		left_nb_out = max_nb_out - num;
>  	}
> 
> +	if ((gro_types & RTE_GRO_IPV4_VXLAN_UDP_IPV4) && max_nb_out >
> 0) {

Max_nb_out is read-only and not updated. So it cannot check if 'out' array
has enough space. Need to use left_nb_out instead. This issue also existed
in UDP/IPv4 GRO, and please correct them both.

> +		num += gro_vxlan_udp4_tbl_timeout_flush(gro_ctx->tbls[
> +				RTE_GRO_IPV4_VXLAN_UDP_IPV4_INDEX],
> +				flush_timestamp, &out[num], left_nb_out);
> +		left_nb_out = max_nb_out - num;
> +	}
> +
>  	/* If no available space in 'out', stop flushing. */
>  	if ((gro_types & RTE_GRO_TCP_IPV4) && max_nb_out > 0) {
>  		num += gro_tcp4_tbl_timeout_flush(
> diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> index 470f3ed..9f9ed49 100644
> --- a/lib/librte_gro/rte_gro.h
> +++ b/lib/librte_gro/rte_gro.h
> @@ -35,6 +35,9 @@
>  #define RTE_GRO_UDP_IPV4_INDEX 2
>  #define RTE_GRO_UDP_IPV4 (1ULL << RTE_GRO_UDP_IPV4_INDEX)
>  /**< UDP/IPv4 GRO flag */
> +#define RTE_GRO_IPV4_VXLAN_UDP_IPV4_INDEX 3
> +#define RTE_GRO_IPV4_VXLAN_UDP_IPV4 (1ULL <<
> RTE_GRO_IPV4_VXLAN_UDP_IPV4_INDEX)
> +/**< VxLAN UDP/IPv4 GRO flag. */
> 
>  /**
>   * Structure used to create GRO context objects or used to pass
> --
> 1.8.3.1


  reply	other threads:[~2020-09-21  7:54 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-17  3:49 [dpdk-dev] [PATCH v6 0/3] gro: add UDP/IPv4 GRO and VXLAN UDP/IPv4 GRO support yang_y_yi
2020-09-17  3:49 ` [dpdk-dev] [PATCH v6 1/3] gro: add " yang_y_yi
2020-09-21  6:21   ` Hu, Jiayu
2020-09-17  3:49 ` [dpdk-dev] [PATCH v6 2/3] gro: add VXLAN " yang_y_yi
2020-09-21  7:54   ` Hu, Jiayu [this message]
2020-09-22  1:29     ` yang_y_yi
2020-09-22  1:44       ` yang_y_yi
2020-09-22  6:14         ` Hu, Jiayu
2020-09-22  6:23           ` yang_y_yi
2020-09-22  6:55             ` Jiayu Hu
2020-09-22  7:38               ` yang_y_yi
2020-09-23  2:15                 ` Jiayu Hu
2020-09-23  2:28                   ` yang_y_yi
2020-09-23  2:43                     ` Jiayu Hu
2020-09-24  2:41                       ` yang_y_yi
2020-09-22  3:01       ` Jiayu Hu
2020-09-22  3:00         ` yang_y_yi
2020-09-17  3:49 ` [dpdk-dev] [PATCH v6 3/3] doc: update prog_guide and rel_notes for GRO yang_y_yi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f565c530272b46ed893a0738323cec35@intel.com \
    --to=jiayu.hu@intel.com \
    --cc=dev@dpdk.org \
    --cc=thomas@monjalon.net \
    --cc=yang_y_yi@163.com \
    --cc=yangyi01@inspur.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.