All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: madhuker.mythri@oracle.com
Cc: ferruh.yigit@intel.com, dev@dpdk.org
Subject: Re: [PATCH] net/tap: Fixed RSS algorithm to support fragmented packets
Date: Fri, 3 Jun 2022 09:21:09 -0700	[thread overview]
Message-ID: <20220603092109.71735d43@hermes.local> (raw)
In-Reply-To: <20220603085359.229c898c@hermes.local>

On Fri, 3 Jun 2022 08:53:59 -0700
Stephen Hemminger <stephen@networkplumber.org> wrote:

> On Fri, 25 Mar 2022 20:58:09 +0530
> madhuker.mythri@oracle.com wrote:
> 
> > From: Madhuker Mythri <madhuker.mythri@oracle.com>
> > 
> > As per analysis on DPDK Tap PMD, the existing RSS algorithm considering 4-tuple(Src-IP, Dst-IP, Src-port and Dst-port) and identification of fragment packets is not done, thus we are seeing all the fragmented chunks of single packet differs RSS hash value and distributed across multiple queues.
> > The RSS algorithm assumes that, all the incoming IP packets are based on L4-protocol(UDP/TCP) and trying to fetch the L4 fields(Src-port and Dst-port) for each incoming packet, but for the fragmented packets these L4-header will not be present(except for first packet) and should not consider in RSS hash for L4 header fields in-case of fragmented packets.
> > Which is a bug in the RSS algorithm implemented in the BPF functionality under TAP PMD.
> > 
> > So, modified the RSS eBPF C-program and generated the structure of C-array in the 'tap_bpf_insns.h' file, which is in eBPF byte-code instructions format.
> > 
> > Bugzilla Id: 870
> > 
> > Signed-off-by: Madhuker Mythri <madhuker.mythri@oracle.com>
> > ---
> >  drivers/net/tap/tap_bpf_insns.h   | 3371 +++++++++++++++--------------
> >  drivers/net/tap/tap_bpf_program.c |   48 +-
> >  2 files changed, 1743 insertions(+), 1676 deletions(-)
> >  
> 
> Going back to the original RSS specs on Windows:
> https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss-hashing-types
> 
> There is note there:
> 
> 	If a NIC receives a packet that has both IP and TCP headers, NDIS_HASH_TCP_IPV4 should not always
> 	be used. In the case of a fragmented IP packet, NDIS_HASH_IPV4 must be used. 
> 	This includes the first fragment which contains both IP and TCP headers.
> 
> The modified BPF program in this patch does not do that exactly.
> Adding port of 0 is not the same hashing a smaller tuple.
> 
> IPV6 fragments need similar fix?

Something like the following (totally untested)...

diff --git a/drivers/net/tap/tap_bpf_program.c b/drivers/net/tap/tap_bpf_program.c
index 20c310e5e7ba..db6b32b9003b 100644
--- a/drivers/net/tap/tap_bpf_program.c
+++ b/drivers/net/tap/tap_bpf_program.c
@@ -170,10 +170,19 @@ rss_l3_l4(struct __sk_buff *skb)
 			.dport = PORT(*(src_dst_port + 2),
 					*(src_dst_port + 3)),
 		};
+		__u16 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
+		__u16 frag_off = PORT(frag_off_addr[0], frag_off_addr[1]);
+		__u8 *proto_addr = data + off + offsetof(struct iphdr, protocol);
+		__u8 protocol = *proto_addr;
 		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
+
+		/* If only want L3 or fragmented or not tcp/udp then do L3 only */
+		if ((rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3)) ||
+		    (frag_off & 0x3fff) ||
+		    !(protocol == IPPROTO_TCP || protocol == IPPROTO_UDP))
 			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
+
+		hash = rte_softrss_be((__u32 *)&v4_tuple, key, input_len);
 	} else if (proto == htons(ETH_P_IPV6)) {
 		if (data + off + sizeof(struct ipv6hdr) +
 					sizeof(__u32) > data_end)
@@ -182,6 +191,8 @@ rss_l3_l4(struct __sk_buff *skb)
 					offsetof(struct ipv6hdr, saddr);
 		__u8 *src_dst_port = data + off +
 					sizeof(struct ipv6hdr);
+		__u8 *nexthdr = data + off + offsetof(struct ipv6hdr, nexthdr);
+
 		struct ipv6_l3_l4_tuple v6_tuple;
 		for (j = 0; j < 4; j++)
 			*((uint32_t *)&v6_tuple.src_addr + j) =
@@ -197,9 +208,11 @@ rss_l3_l4(struct __sk_buff *skb)
 			      *(src_dst_port + 3));
 
 		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
+
+		if ((rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3)) ||
+		    !(*nexthdr == IPPROTO_TCP || *nexthdr == IPPROTO_UDP))
 			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
+		hash = rte_softrss_be((__u32 *)&v6_tuple, key, input_len);
 	} else {
 		return TC_ACT_PIPE;
 	}

  reply	other threads:[~2022-06-03 16:21 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-25 15:28 [PATCH] net/tap: Fixed RSS algorithm to support fragmented packets madhuker.mythri
2022-06-03 15:53 ` Stephen Hemminger
2022-06-03 16:21   ` Stephen Hemminger [this message]
  -- strict thread matches above, loose matches on Subject: below --
2022-04-20 11:24 madhuker.mythri
2023-01-19 12:03 ` Ferruh Yigit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220603092109.71735d43@hermes.local \
    --to=stephen@networkplumber.org \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=madhuker.mythri@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.