netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/4] net: ethtool: support including Flow Label in the flow hash for RSS
@ 2025-07-22  1:49 Jakub Kicinski
  2025-07-22  1:49 ` [PATCH net-next v2 1/4] " Jakub Kicinski
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Jakub Kicinski @ 2025-07-22  1:49 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, donald.hunter,
	michael.chan, pavan.chebbi, gal, andrew, willemdebruijn.kernel,
	Jakub Kicinski

Add support for using IPv6 Flow Label in Rx hash computation
and therefore RSS queue selection.

RFC: https://lore.kernel.org/20250609173442.1745856-1-kuba@kernel.org

Jakub Kicinski (4):
  net: ethtool: support including Flow Label in the flow hash for RSS
  eth: fbnic: support RSS on IPv6 Flow Label
  eth: bnxt: support RSS on IPv6 Flow Label
  selftests: drv-net: add test for RSS on flow label

 Documentation/netlink/specs/ethtool.yaml      |   3 +
 .../testing/selftests/drivers/net/hw/Makefile |   1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |   1 +
 include/uapi/linux/ethtool.h                  |   1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |   2 +
 .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c |  23 ++-
 .../net/ethernet/meta/fbnic/fbnic_ethtool.c   |   2 +-
 drivers/net/ethernet/meta/fbnic/fbnic_rpc.c   |   2 +
 net/ethtool/ioctl.c                           |  25 +++
 net/ethtool/rss.c                             |  27 ++--
 .../drivers/net/hw/rss_flow_label.py          | 151 ++++++++++++++++++
 11 files changed, 221 insertions(+), 17 deletions(-)
 create mode 100755 tools/testing/selftests/drivers/net/hw/rss_flow_label.py

-- 
2.50.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net-next v2 1/4] net: ethtool: support including Flow Label in the flow hash for RSS
  2025-07-22  1:49 [PATCH net-next v2 0/4] net: ethtool: support including Flow Label in the flow hash for RSS Jakub Kicinski
@ 2025-07-22  1:49 ` Jakub Kicinski
  2025-07-22  1:49 ` [PATCH net-next v2 2/4] eth: fbnic: support RSS on IPv6 Flow Label Jakub Kicinski
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2025-07-22  1:49 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, donald.hunter,
	michael.chan, pavan.chebbi, gal, andrew, willemdebruijn.kernel,
	Jakub Kicinski, ecree.xilinx

Some modern NICs support including the IPv6 Flow Label in
the flow hash for RSS queue selection. This is outside
the old "Microsoft spec", but was included in the OCP NIC spec:

  [ ] RSS include ow label in the hash (configurable)

https://www.opencompute.org/w/index.php?title=Core_Offloads#Receive_Side_Scaling

RSS Flow Label hashing allows TCP Protective Load Balancing (PLB)
to recover from receiver congestion / overload.
Rx CPU/queue hotspots are relatively common for data ingest
workloads, and so far we had to try to detect the condition
at the RPC layer and reopen the connection. PLB lets us change
the Flow Label and therefore Rx CPU on RTO, with minimal packet
reordering. PLB reaction times are much faster, and can happen
at any point in the connection, not just at RPC boundaries.

Due to the nature of host processing (relatively long queues,
other kernel subsystems masking IRQs for 100s of msecs)
the risk of reordering within the host is higher than in
the network. But for applications which need it - it is far
preferable to potentially persistent overload of subset of
queues.

It is expected that the hash communicated to the host
may change if the Flow Label changes. This may be surprising
to some host software, but I don't expect the devices
can compute two Toeplitz hashes, one with the Flow Label
for queue selection and one without for the rx hash
communicated to the host. Besides, changing the hash
may potentially help to change the path thru host queues.
User can disable NETIF_F_RXHASH if they require a stable
flow hash.

The name RXH_IP6_FL was chosen based on what we call
Flow Label variables in IPv6 processing (fl). I prefer
fl_lbl but that appears to be an fbnic-only spelling.
We could spell out RXH_IP6_FLOW_LABEL but existing
RXH_ defines are a lot more terse.

Willem notes [1] that Flow Label is defined as identifying the flow
and therefore including both the flow label _and_ the L4 header
fields is not generally necessary. But it should not hurt so
it's not explicitly prevented if the driver supports hashing
on both at the same time.

Link: https://lore.kernel.org/68483433b45e2_3cd66f29440@willemb.c.googlers.com.notmuch [1]
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: andrew@lunn.ch
CC: ecree.xilinx@gmail.com
---
 Documentation/netlink/specs/ethtool.yaml |  3 +++
 include/uapi/linux/ethtool.h             |  1 +
 net/ethtool/ioctl.c                      | 25 ++++++++++++++++++++++
 net/ethtool/rss.c                        | 27 ++++++++++++------------
 4 files changed, 43 insertions(+), 13 deletions(-)

diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
index 1063d5d32fea..e635cfe9780b 100644
--- a/Documentation/netlink/specs/ethtool.yaml
+++ b/Documentation/netlink/specs/ethtool.yaml
@@ -204,6 +204,9 @@ c-version-name: ethtool-genl-version
         doc: dst port in case of TCP/UDP/SCTP
       -
         name: gtp-teid
+      -
+        name: ip6-fl
+        doc: IPv6 Flow Label
       -
         name: discard
         value: 31
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 9e9afdd1238a..8bd5ea5469d9 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -2380,6 +2380,7 @@ enum {
 #define	RXH_L4_B_0_1	(1 << 6) /* src port in case of TCP/UDP/SCTP */
 #define	RXH_L4_B_2_3	(1 << 7) /* dst port in case of TCP/UDP/SCTP */
 #define	RXH_GTP_TEID	(1 << 8) /* teid in case of GTP */
+#define	RXH_IP6_FL	(1 << 9) /* IPv6 flow label */
 #define	RXH_DISCARD	(1 << 31)
 
 #define	RX_CLS_FLOW_DISC	0xffffffffffffffffULL
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index 43a7854e784e..0b2a4d0573b3 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -1014,6 +1014,28 @@ static bool flow_type_hashable(u32 flow_type)
 	return false;
 }
 
+static bool flow_type_v6(u32 flow_type)
+{
+	switch (flow_type) {
+	case TCP_V6_FLOW:
+	case UDP_V6_FLOW:
+	case SCTP_V6_FLOW:
+	case AH_ESP_V6_FLOW:
+	case AH_V6_FLOW:
+	case ESP_V6_FLOW:
+	case IPV6_FLOW:
+	case GTPU_V6_FLOW:
+	case GTPC_V6_FLOW:
+	case GTPC_TEID_V6_FLOW:
+	case GTPU_EH_V6_FLOW:
+	case GTPU_UL_V6_FLOW:
+	case GTPU_DL_V6_FLOW:
+		return true;
+	}
+
+	return false;
+}
+
 /* When adding a new type, update the assert and, if it's hashable, add it to
  * the flow_type_hashable switch case.
  */
@@ -1077,6 +1099,9 @@ ethtool_set_rxfh_fields(struct net_device *dev, u32 cmd, void __user *useraddr)
 	if (rc)
 		return rc;
 
+	if (info.data & RXH_IP6_FL && !flow_type_v6(info.flow_type))
+		return -EINVAL;
+
 	if (info.flow_type & FLOW_RSS && info.rss_context &&
 	    !ops->rxfh_per_ctx_fields)
 		return -EINVAL;
diff --git a/net/ethtool/rss.c b/net/ethtool/rss.c
index 992e98abe9dd..202d95e8bf3e 100644
--- a/net/ethtool/rss.c
+++ b/net/ethtool/rss.c
@@ -536,35 +536,36 @@ void ethtool_rss_notify(struct net_device *dev, u32 type, u32 rss_context)
 #define RFH_MASK (RXH_L2DA | RXH_VLAN | RXH_IP_SRC | RXH_IP_DST | \
 		  RXH_L3_PROTO | RXH_L4_B_0_1 | RXH_L4_B_2_3 |	  \
 		  RXH_GTP_TEID | RXH_DISCARD)
+#define RFH_MASKv6 (RFH_MASK | RXH_IP6_FL)
 
 static const struct nla_policy ethnl_rss_flows_policy[] = {
 	[ETHTOOL_A_FLOW_ETHER]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
 	[ETHTOOL_A_FLOW_IP4]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
-	[ETHTOOL_A_FLOW_IP6]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
+	[ETHTOOL_A_FLOW_IP6]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6),
 	[ETHTOOL_A_FLOW_TCP4]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
 	[ETHTOOL_A_FLOW_UDP4]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
 	[ETHTOOL_A_FLOW_SCTP4]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
 	[ETHTOOL_A_FLOW_AH_ESP4]	= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
-	[ETHTOOL_A_FLOW_TCP6]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
-	[ETHTOOL_A_FLOW_UDP6]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
-	[ETHTOOL_A_FLOW_SCTP6]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
-	[ETHTOOL_A_FLOW_AH_ESP6]	= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
+	[ETHTOOL_A_FLOW_TCP6]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6),
+	[ETHTOOL_A_FLOW_UDP6]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6),
+	[ETHTOOL_A_FLOW_SCTP6]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6),
+	[ETHTOOL_A_FLOW_AH_ESP6]	= NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6),
 	[ETHTOOL_A_FLOW_AH4]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
 	[ETHTOOL_A_FLOW_ESP4]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
-	[ETHTOOL_A_FLOW_AH6]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
-	[ETHTOOL_A_FLOW_ESP6]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
+	[ETHTOOL_A_FLOW_AH6]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6),
+	[ETHTOOL_A_FLOW_ESP6]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6),
 	[ETHTOOL_A_FLOW_GTPU4]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
-	[ETHTOOL_A_FLOW_GTPU6]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
+	[ETHTOOL_A_FLOW_GTPU6]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6),
 	[ETHTOOL_A_FLOW_GTPC4]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
-	[ETHTOOL_A_FLOW_GTPC6]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
+	[ETHTOOL_A_FLOW_GTPC6]		= NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6),
 	[ETHTOOL_A_FLOW_GTPC_TEID4]	= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
-	[ETHTOOL_A_FLOW_GTPC_TEID6]	= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
+	[ETHTOOL_A_FLOW_GTPC_TEID6]	= NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6),
 	[ETHTOOL_A_FLOW_GTPU_EH4]	= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
-	[ETHTOOL_A_FLOW_GTPU_EH6]	= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
+	[ETHTOOL_A_FLOW_GTPU_EH6]	= NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6),
 	[ETHTOOL_A_FLOW_GTPU_UL4]	= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
-	[ETHTOOL_A_FLOW_GTPU_UL6]	= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
+	[ETHTOOL_A_FLOW_GTPU_UL6]	= NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6),
 	[ETHTOOL_A_FLOW_GTPU_DL4]	= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
-	[ETHTOOL_A_FLOW_GTPU_DL6]	= NLA_POLICY_MASK(NLA_UINT, RFH_MASK),
+	[ETHTOOL_A_FLOW_GTPU_DL6]	= NLA_POLICY_MASK(NLA_UINT, RFH_MASKv6),
 };
 
 const struct nla_policy ethnl_rss_set_policy[ETHTOOL_A_RSS_FLOW_HASH + 1] = {
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next v2 2/4] eth: fbnic: support RSS on IPv6 Flow Label
  2025-07-22  1:49 [PATCH net-next v2 0/4] net: ethtool: support including Flow Label in the flow hash for RSS Jakub Kicinski
  2025-07-22  1:49 ` [PATCH net-next v2 1/4] " Jakub Kicinski
@ 2025-07-22  1:49 ` Jakub Kicinski
  2025-07-22  1:49 ` [PATCH net-next v2 3/4] eth: bnxt: " Jakub Kicinski
  2025-07-22  1:49 ` [PATCH net-next v2 4/4] selftests: drv-net: add test for RSS on flow label Jakub Kicinski
  3 siblings, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2025-07-22  1:49 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, donald.hunter,
	michael.chan, pavan.chebbi, gal, andrew, willemdebruijn.kernel,
	Jakub Kicinski, alexanderduyck, mohsin.bashr

Support IPv6 Flow Label hashing. Use both inner and outer IPv6
header's Flow Label if both headers are detected. Flow Label
is unlike normal header fields, by enabling it user accepts
the unstable hash and possible reordering. Because of that
I think it's reasonable to hash over all Flow Labels we can
find, even tho we don't hash over all L3 addresses.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: alexanderduyck@fb.com
CC: mohsin.bashr@gmail.com
---
 drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c | 2 +-
 drivers/net/ethernet/meta/fbnic/fbnic_rpc.c     | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
index 588da02d6e22..fe3a203dee75 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
@@ -1310,7 +1310,7 @@ fbnic_get_rss_hash_opts(struct net_device *netdev,
 #define FBNIC_L2_HASH_OPTIONS \
 	(RXH_L2DA | RXH_DISCARD)
 #define FBNIC_L3_HASH_OPTIONS \
-	(FBNIC_L2_HASH_OPTIONS | RXH_IP_SRC | RXH_IP_DST)
+	(FBNIC_L2_HASH_OPTIONS | RXH_IP_SRC | RXH_IP_DST | RXH_IP6_FL)
 #define FBNIC_L4_HASH_OPTIONS \
 	(FBNIC_L3_HASH_OPTIONS | RXH_L4_B_0_1 | RXH_L4_B_2_3)
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c b/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c
index 8ff07b5562e3..a4dc1024c0c2 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c
@@ -71,6 +71,8 @@ u16 fbnic_flow_hash_2_rss_en_mask(struct fbnic_net *fbn, int flow_type)
 	rss_en_mask |= FBNIC_FH_2_RSSEM_BIT(IP_DST, IP_DST, flow_hash);
 	rss_en_mask |= FBNIC_FH_2_RSSEM_BIT(L4_B_0_1, L4_SRC, flow_hash);
 	rss_en_mask |= FBNIC_FH_2_RSSEM_BIT(L4_B_2_3, L4_DST, flow_hash);
+	rss_en_mask |= FBNIC_FH_2_RSSEM_BIT(IP6_FL, OV6_FL_LBL, flow_hash);
+	rss_en_mask |= FBNIC_FH_2_RSSEM_BIT(IP6_FL, IV6_FL_LBL, flow_hash);
 
 	return rss_en_mask;
 }
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next v2 3/4] eth: bnxt: support RSS on IPv6 Flow Label
  2025-07-22  1:49 [PATCH net-next v2 0/4] net: ethtool: support including Flow Label in the flow hash for RSS Jakub Kicinski
  2025-07-22  1:49 ` [PATCH net-next v2 1/4] " Jakub Kicinski
  2025-07-22  1:49 ` [PATCH net-next v2 2/4] eth: fbnic: support RSS on IPv6 Flow Label Jakub Kicinski
@ 2025-07-22  1:49 ` Jakub Kicinski
  2025-07-22  7:57   ` Michael Chan
  2025-07-22  1:49 ` [PATCH net-next v2 4/4] selftests: drv-net: add test for RSS on flow label Jakub Kicinski
  3 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2025-07-22  1:49 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, donald.hunter,
	michael.chan, pavan.chebbi, gal, andrew, willemdebruijn.kernel,
	Jakub Kicinski

It appears that the bnxt FW API has the relevant bit for Flow Label
hashing. Plumb in the support. Obey the capability bit.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: michael.chan@broadcom.com
CC: pavan.chebbi@broadcom.com
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  2 ++
 .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 23 ++++++++++++++++---
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index fda0d3cc6227..40ae34923511 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -2407,6 +2407,7 @@ struct bnxt {
 #define BNXT_RSS_CAP_ESP_V4_RSS_CAP		BIT(6)
 #define BNXT_RSS_CAP_ESP_V6_RSS_CAP		BIT(7)
 #define BNXT_RSS_CAP_MULTI_RSS_CTX		BIT(8)
+#define BNXT_RSS_CAP_IPV6_FLOW_LABEL_RSS_CAP	BIT(9)
 
 	u8			rss_hash_key[HW_HASH_KEY_SIZE];
 	u8			rss_hash_key_valid:1;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index de8080df69a8..6425955c06d6 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6957,6 +6957,8 @@ static int bnxt_hwrm_vnic_qcaps(struct bnxt *bp)
 			bp->rss_cap |= BNXT_RSS_CAP_ESP_V4_RSS_CAP;
 		if (flags & VNIC_QCAPS_RESP_FLAGS_RSS_IPSEC_ESP_SPI_IPV6_CAP)
 			bp->rss_cap |= BNXT_RSS_CAP_ESP_V6_RSS_CAP;
+		if (flags & VNIC_QCAPS_RESP_FLAGS_RSS_IPV6_FLOW_LABEL_CAP)
+			bp->rss_cap |= BNXT_RSS_CAP_IPV6_FLOW_LABEL_RSS_CAP;
 		if (flags & VNIC_QCAPS_RESP_FLAGS_RE_FLUSH_CAP)
 			bp->fw_cap |= BNXT_FW_CAP_VNIC_RE_FLUSH;
 	}
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 1b37612b1c01..4b7213908b76 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -1582,9 +1582,14 @@ static u64 get_ethtool_ipv4_rss(struct bnxt *bp)
 
 static u64 get_ethtool_ipv6_rss(struct bnxt *bp)
 {
+	u64 rss = 0;
+
 	if (bp->rss_hash_cfg & VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6)
-		return RXH_IP_SRC | RXH_IP_DST;
-	return 0;
+		rss |= RXH_IP_SRC | RXH_IP_DST;
+	if (bp->rss_hash_cfg & VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6_FLOW_LABEL)
+		rss |= RXH_IP6_FL;
+
+	return rss;
 }
 
 static int bnxt_get_rxfh_fields(struct net_device *dev,
@@ -1662,13 +1667,18 @@ static int bnxt_set_rxfh_fields(struct net_device *dev,
 
 	if (cmd->data == RXH_4TUPLE)
 		tuple = 4;
-	else if (cmd->data == RXH_2TUPLE)
+	else if (cmd->data == RXH_2TUPLE ||
+		 cmd->data == (RXH_2TUPLE | RXH_IP6_FL))
 		tuple = 2;
 	else if (!cmd->data)
 		tuple = 0;
 	else
 		return -EINVAL;
 
+	if (cmd->data & RXH_IP6_FL &&
+	    !(bp->rss_cap & BNXT_RSS_CAP_IPV6_FLOW_LABEL_RSS_CAP))
+		return -EINVAL;
+
 	if (cmd->flow_type == TCP_V4_FLOW) {
 		rss_hash_cfg &= ~VNIC_RSS_CFG_REQ_HASH_TYPE_TCP_IPV4;
 		if (tuple == 4)
@@ -1736,6 +1746,13 @@ static int bnxt_set_rxfh_fields(struct net_device *dev,
 			rss_hash_cfg |= VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6;
 		else if (!tuple)
 			rss_hash_cfg &= ~VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6;
+
+		if (cmd->data & RXH_IP6_FL)
+			rss_hash_cfg |=
+				VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6_FLOW_LABEL;
+		else
+			rss_hash_cfg &=
+				~VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6_FLOW_LABEL;
 		break;
 	}
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next v2 4/4] selftests: drv-net: add test for RSS on flow label
  2025-07-22  1:49 [PATCH net-next v2 0/4] net: ethtool: support including Flow Label in the flow hash for RSS Jakub Kicinski
                   ` (2 preceding siblings ...)
  2025-07-22  1:49 ` [PATCH net-next v2 3/4] eth: bnxt: " Jakub Kicinski
@ 2025-07-22  1:49 ` Jakub Kicinski
  2025-07-23 13:49   ` Willem de Bruijn
  3 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2025-07-22  1:49 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, donald.hunter,
	michael.chan, pavan.chebbi, gal, andrew, willemdebruijn.kernel,
	Jakub Kicinski, shuah, sdf, linux-kselftest

Add a simple test for checking that RSS on flow label works,
and that its rejected for IPv4 flows.

 # ./tools/testing/selftests/drivers/net/hw/rss_flow_label.py
 TAP version 13
 1..2
 ok 1 rss_flow_label.test_rss_flow_label
 ok 2 rss_flow_label.test_rss_flow_label_6only
 # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: shuah@kernel.org
CC: sdf@fomichev.me
CC: linux-kselftest@vger.kernel.org
---
 .../testing/selftests/drivers/net/hw/Makefile |   1 +
 .../drivers/net/hw/rss_flow_label.py          | 151 ++++++++++++++++++
 2 files changed, 152 insertions(+)
 create mode 100755 tools/testing/selftests/drivers/net/hw/rss_flow_label.py

diff --git a/tools/testing/selftests/drivers/net/hw/Makefile b/tools/testing/selftests/drivers/net/hw/Makefile
index fdc97355588c..5159fd34cb33 100644
--- a/tools/testing/selftests/drivers/net/hw/Makefile
+++ b/tools/testing/selftests/drivers/net/hw/Makefile
@@ -18,6 +18,7 @@ TEST_PROGS = \
 	pp_alloc_fail.py \
 	rss_api.py \
 	rss_ctx.py \
+	rss_flow_label.py \
 	rss_input_xfrm.py \
 	tso.py \
 	xsk_reconfig.py \
diff --git a/tools/testing/selftests/drivers/net/hw/rss_flow_label.py b/tools/testing/selftests/drivers/net/hw/rss_flow_label.py
new file mode 100755
index 000000000000..e471e13160ae
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/hw/rss_flow_label.py
@@ -0,0 +1,151 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""
+Tests for RSS hashing on IPv6 Flow Label.
+"""
+
+import glob
+import socket
+from lib.py import CmdExitFailure
+from lib.py import ksft_run, ksft_exit, ksft_eq, ksft_ge, ksft_in, \
+    ksft_not_in, ksft_raises, KsftSkipEx
+from lib.py import bkg, cmd, defer, fd_read_timeout, rand_port
+from lib.py import NetDrvEpEnv
+
+
+def _ethtool_get_cfg(cfg, fl_type):
+    descr = cmd(f"ethtool -n {cfg.ifname} rx-flow-hash {fl_type}").stdout
+
+    converter = {
+        "IP SA": "s",
+        "IP DA": "d",
+        "L3 proto": "t",
+        "L4 bytes 0 & 1 [TCP/UDP src port]": "f",
+        "L4 bytes 2 & 3 [TCP/UDP dst port]": "n",
+        "IPv6 Flow Label": "l",
+    }
+
+    ret = ""
+    for line in descr.split("\n")[1:-2]:
+        # if this raises we probably need to add more keys to converter above
+        ret += converter[line]
+    return ret
+
+
+def _traffic(cfg, one_sock, one_cpu):
+    local_port  = rand_port(socket.SOCK_DGRAM)
+    remote_port = rand_port(socket.SOCK_DGRAM)
+
+    sock = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
+    sock.bind(("", local_port))
+    sock.connect((cfg.remote_addr_v["6"], 0))
+    if one_sock:
+        send = f"exec 5<>/dev/udp/{cfg.addr_v['6']}/{local_port}; " \
+                "for i in `seq 20`; do echo a >&5; sleep 0.02; done; exec 5>&-"
+    else:
+        send = "for i in `seq 20`; do echo a | socat -t0.02 - UDP6:" \
+              f"[{cfg.addr_v['6']}]:{local_port},sourceport={remote_port}; done"
+
+    cpus = set()
+    with bkg(send, shell=True, host=cfg.remote, exit_wait=True):
+        for _ in range(20):
+            fd_read_timeout(sock.fileno(), 1)
+            cpu = sock.getsockopt(socket.SOL_SOCKET, socket.SO_INCOMING_CPU)
+            cpus.add(cpu)
+
+    if one_cpu:
+        ksft_eq(len(cpus), 1,
+                f"{one_sock=} - expected one CPU, got traffic on: {cpus=}")
+    else:
+        ksft_ge(len(cpus), 2,
+                f"{one_sock=} - expected many CPUs, got traffic on: {cpus=}")
+
+
+def test_rss_flow_label(cfg):
+    """
+    Test hashing on IPv6 flow label. Send traffic over a single socket
+    and over multiple sockets. Depend on the remote having auto-label
+    enabled so that it randomizes the label per socket.
+    """
+
+    cfg.require_ipver("6")
+    cfg.require_cmd("socat", remote=True)
+    if not hasattr(socket, "SO_INCOMING_CPU"):
+        raise KsftSkipEx("socket.SO_INCOMING_CPU was added in Python 3.11")
+
+    # 1 is the default, if someone changed it we probably shouldn"t mess with it
+    af = cmd("cat /proc/sys/net/ipv6/auto_flowlabels", host=cfg.remote).stdout
+    if af.strip() != "1":
+        raise KsftSkipEx("Remote does not have auto_flowlabels enabled")
+
+    qcnt = len(glob.glob(f"/sys/class/net/{cfg.ifname}/queues/rx-*"))
+    if qcnt < 2:
+        raise KsftSkipEx(f"Local has only {qcnt} queues")
+
+    # Enable flow label hashing for UDP6
+    initial = _ethtool_get_cfg(cfg, "udp6")
+    no_lbl = initial.replace("l", "")
+    if "l" not in initial:
+        try:
+            cmd(f"ethtool -N {cfg.ifname} rx-flow-hash udp6 l{no_lbl}")
+        except CmdExitFailure as exc:
+            raise KsftSkipEx("Device doesn't support Flow Label for UDP6") from exc
+
+        defer(cmd, f"ethtool -N {cfg.ifname} rx-flow-hash udp6 {initial}")
+
+    _traffic(cfg, one_sock=True, one_cpu=True)
+    _traffic(cfg, one_sock=False, one_cpu=False)
+
+    # Disable it, we should see no hashing (reset was already defer()ed)
+    cmd(f"ethtool -N {cfg.ifname} rx-flow-hash udp6 {no_lbl}")
+
+    _traffic(cfg, one_sock=False, one_cpu=True)
+
+
+def _check_v4_flow_types(cfg):
+    for fl_type in ["tcp4", "udp4", "ah4", "esp4", "sctp4"]:
+        try:
+            cur = cmd(f"ethtool -n {cfg.ifname} rx-flow-hash {fl_type}").stdout
+            ksft_not_in("Flow Label", cur,
+                        comment=f"{fl_type=} has Flow Label:" + cur)
+        except CmdExitFailure:
+            # Probably does not support this flow type
+            pass
+
+
+def test_rss_flow_label_6only(cfg):
+    """
+    Test interactions with IPv4 flow types. It should not be possible to set
+    IPv6 Flow Label hashing for an IPv4 flow type. The Flow Label should also
+    not appear in the IPv4 "current config".
+    """
+
+    with ksft_raises(CmdExitFailure) as cm:
+        cmd(f"ethtool -N {cfg.ifname} rx-flow-hash tcp4 sdfnl")
+    ksft_in("Invalid argument", cm.exception.cmd.stderr)
+
+    _check_v4_flow_types(cfg)
+
+    # Try to enable Flow Labels and check again, in case it leaks thru
+    initial = _ethtool_get_cfg(cfg, "udp6")
+    changed = initial.replace("l", "") if "l" in initial else initial + "l"
+
+    cmd(f"ethtool -N {cfg.ifname} rx-flow-hash udp6 {changed}")
+    restore = defer(cmd, f"ethtool -N {cfg.ifname} rx-flow-hash udp6 {initial}")
+
+    _check_v4_flow_types(cfg)
+    restore.exec()
+    _check_v4_flow_types(cfg)
+
+
+def main() -> None:
+    with NetDrvEpEnv(__file__, nsim_test=False) as cfg:
+        ksft_run([test_rss_flow_label,
+                  test_rss_flow_label_6only],
+                 args=(cfg, ))
+    ksft_exit()
+
+
+if __name__ == "__main__":
+    main()
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v2 3/4] eth: bnxt: support RSS on IPv6 Flow Label
  2025-07-22  1:49 ` [PATCH net-next v2 3/4] eth: bnxt: " Jakub Kicinski
@ 2025-07-22  7:57   ` Michael Chan
  2025-07-23 17:30     ` Jakub Kicinski
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Chan @ 2025-07-22  7:57 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
	donald.hunter, pavan.chebbi, gal, andrew, willemdebruijn.kernel

[-- Attachment #1: Type: text/plain, Size: 3275 bytes --]

On Mon, Jul 21, 2025 at 6:49 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> It appears that the bnxt FW API has the relevant bit for Flow Label
> hashing. Plumb in the support. Obey the capability bit.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---

> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
> index 1b37612b1c01..4b7213908b76 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
> @@ -1582,9 +1582,14 @@ static u64 get_ethtool_ipv4_rss(struct bnxt *bp)
>
>  static u64 get_ethtool_ipv6_rss(struct bnxt *bp)
>  {
> +       u64 rss = 0;
> +
>         if (bp->rss_hash_cfg & VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6)
> -               return RXH_IP_SRC | RXH_IP_DST;
> -       return 0;
> +               rss |= RXH_IP_SRC | RXH_IP_DST;

This flag means 2-tuple so we should:

return RXH_IP_SRC | RXH_IP_DST;

> +       if (bp->rss_hash_cfg & VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6_FLOW_LABEL)
> +               rss |= RXH_IP6_FL;

This flag means 3-tuple so we should:

return RXP_IP_SRC | RXH_IP_DST | RXH_IP6_FL;

Both of these flags cannot be set at the same time.  It can only be in
2-tuple or 3-tuple mode (see more below).

> +
> +       return rss;
>  }
>
>  static int bnxt_get_rxfh_fields(struct net_device *dev,
> @@ -1662,13 +1667,18 @@ static int bnxt_set_rxfh_fields(struct net_device *dev,
>
>         if (cmd->data == RXH_4TUPLE)
>                 tuple = 4;
> -       else if (cmd->data == RXH_2TUPLE)
> +       else if (cmd->data == RXH_2TUPLE ||
> +                cmd->data == (RXH_2TUPLE | RXH_IP6_FL))
>                 tuple = 2;
>         else if (!cmd->data)
>                 tuple = 0;
>         else
>                 return -EINVAL;
>
> +       if (cmd->data & RXH_IP6_FL &&
> +           !(bp->rss_cap & BNXT_RSS_CAP_IPV6_FLOW_LABEL_RSS_CAP))
> +               return -EINVAL;
> +
>         if (cmd->flow_type == TCP_V4_FLOW) {
>                 rss_hash_cfg &= ~VNIC_RSS_CFG_REQ_HASH_TYPE_TCP_IPV4;
>                 if (tuple == 4)
> @@ -1736,6 +1746,13 @@ static int bnxt_set_rxfh_fields(struct net_device *dev,
>                         rss_hash_cfg |= VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6;
>                 else if (!tuple)
>                         rss_hash_cfg &= ~VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6;
> +
> +               if (cmd->data & RXH_IP6_FL)
> +                       rss_hash_cfg |=
> +                               VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6_FLOW_LABEL;
> +               else
> +                       rss_hash_cfg &=
> +                               ~VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6_FLOW_LABEL;

Here, it needs to be something like this so that we only set the flag
for 2-tuple or 3-tuple.  The FW call will fail if both flags are set:

if (!tuple || tuple == 2)
        rss_hash_cfg &= ~VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6;
if (tuple == 2) {
        if (cmd->data & RXH_IP6_FL)
                rss_hash_cfg |=
                        VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6_FLOW_LABEL;
        else
                rss_hash_cfg |= VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6;
}

We'll do more testing tomorrow.  Thanks.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4196 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v2 4/4] selftests: drv-net: add test for RSS on flow label
  2025-07-22  1:49 ` [PATCH net-next v2 4/4] selftests: drv-net: add test for RSS on flow label Jakub Kicinski
@ 2025-07-23 13:49   ` Willem de Bruijn
  0 siblings, 0 replies; 8+ messages in thread
From: Willem de Bruijn @ 2025-07-23 13:49 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, donald.hunter,
	michael.chan, pavan.chebbi, gal, andrew, willemdebruijn.kernel,
	Jakub Kicinski, shuah, sdf, linux-kselftest

Jakub Kicinski wrote:
> Add a simple test for checking that RSS on flow label works,
> and that its rejected for IPv4 flows.
> 
>  # ./tools/testing/selftests/drivers/net/hw/rss_flow_label.py
>  TAP version 13
>  1..2
>  ok 1 rss_flow_label.test_rss_flow_label
>  ok 2 rss_flow_label.test_rss_flow_label_6only
>  # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Willem de Bruijn <willemb@google.com>

RFS/RPS could mess with the SO_INCOMING_CPU reports if enabled.
Maybe check cfg.ifname rx-0 that they are disabled?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v2 3/4] eth: bnxt: support RSS on IPv6 Flow Label
  2025-07-22  7:57   ` Michael Chan
@ 2025-07-23 17:30     ` Jakub Kicinski
  0 siblings, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2025-07-23 17:30 UTC (permalink / raw)
  To: Michael Chan
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
	donald.hunter, pavan.chebbi, gal, andrew, willemdebruijn.kernel

On Tue, 22 Jul 2025 00:57:14 -0700 Michael Chan wrote:
> Here, it needs to be something like this so that we only set the flag
> for 2-tuple or 3-tuple.  The FW call will fail if both flags are set:
> 
> if (!tuple || tuple == 2)
>         rss_hash_cfg &= ~VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6;
> if (tuple == 2) {
>         if (cmd->data & RXH_IP6_FL)
>                 rss_hash_cfg |=
>                         VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6_FLOW_LABEL;
>         else
>                 rss_hash_cfg |= VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6;
> }

Thanks for the help! Thinking thru this carefully I'm worried this will
leave the FLOW_LABEL bit in place if we switch from FL to 4-tuple.
Tho, I'm not clear on how 0-tuple to 4-tuple switch works at present.

Would this look good to you?

	case TCP_V6_FLOW:
	case UDP_V6_FLOW:
	case SCTP_V6_FLOW:
	case AH_ESP_V6_FLOW:
	case AH_V6_FLOW:
	case ESP_V6_FLOW:
	case IPV6_FLOW:
		rss_hash_cfg &= ~(VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6 |
				  VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6_FLOW_LABEL);
		if (!tuple)
			break;
		if (cmd->data & RXH_IP6_FL)
			rss_hash_cfg |=
				VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6_FLOW_LABEL;
		else
			rss_hash_cfg |= VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6;
		break;

or should the else branch have 

	else if (tuple == 2)

?

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-07-23 17:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-22  1:49 [PATCH net-next v2 0/4] net: ethtool: support including Flow Label in the flow hash for RSS Jakub Kicinski
2025-07-22  1:49 ` [PATCH net-next v2 1/4] " Jakub Kicinski
2025-07-22  1:49 ` [PATCH net-next v2 2/4] eth: fbnic: support RSS on IPv6 Flow Label Jakub Kicinski
2025-07-22  1:49 ` [PATCH net-next v2 3/4] eth: bnxt: " Jakub Kicinski
2025-07-22  7:57   ` Michael Chan
2025-07-23 17:30     ` Jakub Kicinski
2025-07-22  1:49 ` [PATCH net-next v2 4/4] selftests: drv-net: add test for RSS on flow label Jakub Kicinski
2025-07-23 13:49   ` Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).