All of lore.kernel.org
 help / color / mirror / Atom feed
From: luyicai <luyicai@huawei.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: "Zhoujingbin (Robin, Russell Lab)" <zhoujingbin@huawei.com>,
	chenchanghu <chenchanghu@huawei.com>,
	"Lilijun (Jerry)" <jerry.lilijun@huawei.com>,
	Linhaifeng <haifeng.lin@huawei.com>,
	"Guohongzhi (Russell Lab)" <guohongzhi1@huawei.com>,
	wangyunjian <wangyunjian@huawei.com>,
	"stable@dpdk.org" <stable@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v5] ip_frag: remove padding length of fragment
Date: Tue, 15 Dec 2020 03:18:50 +0000	[thread overview]
Message-ID: <08610910793145a78cbc84e2f43d5e8b@huawei.com> (raw)
In-Reply-To: <BYAPR11MB3301456E350C40DB56F41EA99AC70@BYAPR11MB3301.namprd11.prod.outlook.com>



> -----Original Message-----
> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com] 
> Sent: Monday, December 14, 2020 10:45 PM
> To: luyicai <luyicai@huawei.com>; dev@dpdk.org
> Cc: Zhoujingbin (Robin, Russell Lab) <zhoujingbin@huawei.com>; chenchanghu <chenchanghu@huawei.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; Linhaifeng <haifeng.lin@huawei.com>; Guohongzhi (Russell Lab) <guohongzhi1@huawei.com>; wangyunjian <wangyunjian@huawei.com>; stable@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v5] ip_frag: remove padding length of fragment


> Hi Yicai,
 
> In some situations, we would get several ip fragments, which total 
> data length is less than min_ip_len(64) and padding with zeros.
> We simulated intermediate fragments by modifying the MTU.
> To illustrate the problem, we simplify the packet format and ignore 
> the impact of the packet header.In namespace2, a packet whose data 
> length is 1520 is sent.
> When the packet passes tap2, the packet is divided into two
> fragments: fragment A and B, similar to (1520 = 1510 + 10).
> When the packet passes tap3, the larger fragment packet A is divided 
> into two fragments A1 and A2, similar to (1510 = 1500 + 10).
> Finally, the bond interface receives three fragments:
> A1, A2, and B (1520 = 1500 + 10 + 10).
> One fragmented packet A2 is smaller than the minimum Ethernet frame 
> length, so it needs to be padded.
> 
> |---------------------------------------------------|
> |                      HOST                         |
> | |--------------|   |----------------------------| |
> | |      ns2     |   |      |--------------|      | |
> | |  |--------|  |   |  |--------|    |--------|  | |
> | |  |  tap1  |  |   |  |  tap2  | ns1|  tap3  |  | |
> | |  |mtu=1510|  |   |  |mtu=1510|    |mtu=1500|  | |
> | |--|1.1.1.1 |--|   |--|1.1.1.2 |----|2.1.1.1 |--| |
> |    |--------|         |--------|    |--------|    |
> |         |                 |              |        |
> |         |-----------------|              |        |
> |                                          |        |
> |                                      |--------|   |
> |                                      |  bond  |   |
> |--------------------------------------|mtu=1500|---|
>                                        |--------|
> 
> When processing the preceding packets above, DPDK would aggregate 
> fragmented packets A2 and B.
> And error packets are generated, which padding(zero) is displayed in 
> the middle of the packet.
> 
> A2 + B:
> 0000   fa 16 3e 9f fb 82 fa 47 b2 57 dc 20 08 00 45 00
> 0010   00 33 b4 66 00 ba 3f 01 c1 a5 01 01 01 01 02 01
> 0020   01 02 c0 c1 c2 c3 c4 c5 c6 c7 00 00 00 00 00 00
> 0030   00 00 00 00 00 00 00 00 00 00 00 00 c8 c9 ca cb
> 0040   cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db
> 0050   dc dd de df e0 e1 e2 e3 e4 e5 e6
> 
> So, we would calculate the length of padding, and remove the padding 
> in pkt_len and data_len before aggregation.
> 
> Fixes: 7f0983ee331c ("ip_frag: check fragment length of incoming 
> packet")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yicai Lu <luyicai@huawei.com>
> ---
> v4 -> v5: Update the comments and description.
> ---
>  lib/librte_ip_frag/rte_ipv4_reassembly.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c 
> b/lib/librte_ip_frag/rte_ipv4_reassembly.c
> index 1dda8ac..fdf66a4 100644
> --- a/lib/librte_ip_frag/rte_ipv4_reassembly.c
> +++ b/lib/librte_ip_frag/rte_ipv4_reassembly.c
> @@ -104,6 +104,7 @@ struct rte_mbuf *
>  	const unaligned_uint64_t *psd;
>  	uint16_t flag_offset, ip_ofs, ip_flag;
>  	int32_t ip_len;
> +	int32_t trim;
> 
>  	flag_offset = rte_be_to_cpu_16(ip_hdr->fragment_offset);
>  	ip_ofs = (uint16_t)(flag_offset & RTE_IPV4_HDR_OFFSET_MASK); @@ 
> -117,14 +118,15 @@ struct rte_mbuf *
> 
>  	ip_ofs *= RTE_IPV4_HDR_OFFSET_UNITS;
>  	ip_len = rte_be_to_cpu_16(ip_hdr->total_length) - mb->l3_len;
> +	trim  = mb->pkt_len - (ip_len + mb->l3_len + mb->l2_len);
> 
>  	IP_FRAG_LOG(DEBUG, "%s:%d:\n"
> -		"mbuf: %p, tms: %" PRIu64
> -		", key: <%" PRIx64 ", %#x>, ofs: %u, len: %d, flags: %#x\n"
> +		"mbuf: %p, tms: %" PRIu64 ", key: <%" PRIx64 ", %#x>"
> +		"ofs: %u, len: %d, padding: %d, flags: %#x\n"
>  		"tbl: %p, max_cycles: %" PRIu64 ", entry_mask: %#x, "
>  		"max_entries: %u, use_entries: %u\n\n",
>  		__func__, __LINE__,
> -		mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, ip_flag,
> +		mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, trim, ip_flag,
>  		tbl, tbl->max_cycles, tbl->entry_mask, tbl->max_entries,
>  		tbl->use_entries);
> 
> @@ -134,6 +136,10 @@ struct rte_mbuf *
>  		return NULL;
>  	}
> 
> +	if (unlikely(trim > 0)) {
> +		rte_pktmbuf_trim(mb, trim);
> +	}

> As a nit {} braces are not required for single expression.
> LGTM in general, just one thing: shouldn't we have the same fix for ipv6 then?
> Konstantin

Hi Konstantin,

Thanks!

During the problem analysis, we have discussed on ipv6 
and concluded that it does not exist in ipv6.

For ipv6, it consists of the following parts: 
basic header = 40(bytes)
DMAC = 6(bytes)
SMAC = 6(bytes)
Type = 2(bytes)
CRC = 4(bytes)
fragment header = 8(bytes)
...

40 + 6 + 6 + 2 + 4 + 8 = 66 (bytes)

Total is already greater than min_ip_len(64). So it doesn't 
need to be padded with zeros.

> +
>  	/* try to find/add entry into the fragment's table. */
>  	if ((fp = ip_frag_find(tbl, dr, &key, tms)) == NULL) {
>  		IP_FRAG_MBUF2DR(dr, mb);
> --
> 1.9.5.msysgit.1


  reply	other threads:[~2020-12-15  3:19 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-18 13:29 [dpdk-dev] [PATCH] ip_frag: recalculate data length of fragment Yicai Lu
2020-11-22 11:22 ` Thomas Monjalon
2020-11-23  2:27 ` [dpdk-dev] [PATCH v2] " Yicai Lu
2020-12-12 11:05 ` [dpdk-dev] [PATCH v5] ip_frag: remove padding " Yicai Lu
2020-12-14 14:44   ` Ananyev, Konstantin
2020-12-15  3:18     ` luyicai [this message]
2020-12-16 10:44       ` Ananyev, Konstantin
2020-12-16 10:54         ` luyicai
2020-12-16 13:36 ` [dpdk-dev] [PATCH v6] " Yicai Lu
2020-12-18 11:41   ` Ananyev, Konstantin
2021-01-15 10:29     ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=08610910793145a78cbc84e2f43d5e8b@huawei.com \
    --to=luyicai@huawei.com \
    --cc=chenchanghu@huawei.com \
    --cc=dev@dpdk.org \
    --cc=guohongzhi1@huawei.com \
    --cc=haifeng.lin@huawei.com \
    --cc=jerry.lilijun@huawei.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=stable@dpdk.org \
    --cc=wangyunjian@huawei.com \
    --cc=zhoujingbin@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.