netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC net-next 0/6] net: ethtool: support including Flow Label in the flow hash for RSS
@ 2025-06-09 17:34 Jakub Kicinski
  2025-06-09 17:34 ` [RFC net-next 1/6] net: ethtool: factor out the validation for ETHTOOL_SRXFH Jakub Kicinski
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Jakub Kicinski @ 2025-06-09 17:34 UTC (permalink / raw)
  To: michael.chan, pavan.chebbi
  Cc: willemdebruijn.kernel, netdev, davem, edumazet, pabeni,
	andrew+netdev, horms, Jakub Kicinski

Add support for using IPv6 Flow Label in Rx hash computation
and therefore RSS queue selection.

Michael, Pavan, could you please give this series a whirl?
I see the capability bit in bnxt but none of the devices
I have access to seem to expose it. Note the test in the
last patch which you should be able to run. ethtool with
the support: https://github.com/kuba-moo/ethtool

Jakub Kicinski (6):
  net: ethtool: factor out the validation for ETHTOOL_SRXFH
  net: ethtool: support including Flow Label in the flow hash for RSS
  eth: fbnic: support RSS on IPv6 Flow Label
  eth: bnxt: support RSS on IPv6 Flow Label
  selftests: drv-net: import things in lib one by one
  selftests: drv-net: add test for RSS on flow label

 .../testing/selftests/drivers/net/hw/Makefile |   1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |   1 +
 include/uapi/linux/ethtool.h                  |   1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |   2 +
 .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c |  26 ++-
 .../net/ethernet/meta/fbnic/fbnic_ethtool.c   |   2 +-
 drivers/net/ethernet/meta/fbnic/fbnic_rpc.c   |   2 +
 net/ethtool/ioctl.c                           |  56 ++++++-
 .../drivers/net/hw/lib/py/__init__.py         |  17 ++
 .../drivers/net/hw/rss_flow_label.py          | 151 ++++++++++++++++++
 .../selftests/drivers/net/lib/py/__init__.py  |  14 ++
 11 files changed, 260 insertions(+), 13 deletions(-)
 create mode 100755 tools/testing/selftests/drivers/net/hw/rss_flow_label.py

-- 
2.49.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC net-next 1/6] net: ethtool: factor out the validation for ETHTOOL_SRXFH
  2025-06-09 17:34 [RFC net-next 0/6] net: ethtool: support including Flow Label in the flow hash for RSS Jakub Kicinski
@ 2025-06-09 17:34 ` Jakub Kicinski
  2025-06-09 17:34 ` [RFC net-next 2/6] net: ethtool: support including Flow Label in the flow hash for RSS Jakub Kicinski
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Jakub Kicinski @ 2025-06-09 17:34 UTC (permalink / raw)
  To: michael.chan, pavan.chebbi
  Cc: willemdebruijn.kernel, netdev, davem, edumazet, pabeni,
	andrew+netdev, horms, Jakub Kicinski, andrew, ecree.xilinx

We'll need another check for ETHTOOL_SRXFH input. Move the logic
to a helper because ethtool_set_rxnfc() is getting long.
No functional change.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: andrew@lunn.ch
CC: ecree.xilinx@gmail.com
---
 net/ethtool/ioctl.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index 39ec920f5de7..e8ca70554b2e 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -1060,6 +1060,27 @@ static int ethtool_check_flow_types(struct net_device *dev, u32 input_xfrm)
 	return 0;
 }
 
+static int
+ethtool_srxfh_check(struct net_device *dev, const struct ethtool_rxnfc *info)
+{
+	const struct ethtool_ops *ops = dev->ethtool_ops;
+	int rc;
+
+	if (ops->get_rxfh) {
+		struct ethtool_rxfh_param rxfh = {};
+
+		rc = ops->get_rxfh(dev, &rxfh);
+		if (rc)
+			return rc;
+
+		rc = ethtool_check_xfrm_rxfh(rxfh.input_xfrm, info->data);
+		if (rc)
+			return rc;
+	}
+
+	return 0;
+}
+
 static noinline_for_stack int ethtool_set_rxnfc(struct net_device *dev,
 						u32 cmd, void __user *useraddr)
 {
@@ -1087,14 +1108,8 @@ static noinline_for_stack int ethtool_set_rxnfc(struct net_device *dev,
 			return -EINVAL;
 	}
 
-	if (cmd == ETHTOOL_SRXFH && ops->get_rxfh) {
-		struct ethtool_rxfh_param rxfh = {};
-
-		rc = ops->get_rxfh(dev, &rxfh);
-		if (rc)
-			return rc;
-
-		rc = ethtool_check_xfrm_rxfh(rxfh.input_xfrm, info.data);
+	if (cmd == ETHTOOL_SRXFH) {
+		rc = ethtool_srxfh_check(dev, &info);
 		if (rc)
 			return rc;
 	}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC net-next 2/6] net: ethtool: support including Flow Label in the flow hash for RSS
  2025-06-09 17:34 [RFC net-next 0/6] net: ethtool: support including Flow Label in the flow hash for RSS Jakub Kicinski
  2025-06-09 17:34 ` [RFC net-next 1/6] net: ethtool: factor out the validation for ETHTOOL_SRXFH Jakub Kicinski
@ 2025-06-09 17:34 ` Jakub Kicinski
  2025-06-10 13:33   ` Willem de Bruijn
  2025-06-09 17:34 ` [RFC net-next 3/6] eth: fbnic: support RSS on IPv6 Flow Label Jakub Kicinski
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2025-06-09 17:34 UTC (permalink / raw)
  To: michael.chan, pavan.chebbi
  Cc: willemdebruijn.kernel, netdev, davem, edumazet, pabeni,
	andrew+netdev, horms, Jakub Kicinski, andrew, ecree.xilinx

Some modern NICs support including the IPv6 Flow Label in
the flow hash for RSS queue selection. This is outside
the old "Microsoft spec", but was included in the OCP NIC spec:

  [ ] RSS include ow label in the hash (configurable)

https://www.opencompute.org/documents/ocp-server-nic-core-features-specification-ocp-spec-format-1-1-pdf

RSS Flow Label hashing allows TCP Protective Load Balancing (PLB)
to recover from receiver congestion / overload.
Rx CPU/queue hotspots are relatively common for data ingest
workloads, and so far we had to try to detect the condition
at the RPC layer and reopen the connection. PLB lets us change
the Flow Label and therefore Rx CPU on RTO, with minimal packet
reordering. PLB reaction times are much faster, and can happen
at any point in the connection, not just at RPC boundaries.

Due to the nature of host processing (relatively long queues,
other kernel subsystems masking IRQs for 100s of msecs)
the risk of reordering within the host is higher than in
the network. But for applications which need it - it is far
preferable to potentially persistent overload of subset of
queues.

It is expected that the hash communicated to the host
may change if the Flow Label changes. This may be surprising
to some host software, but I don't expect the devices
can compute two Toeplitz hashes, one with the Flow Label
for queue selection and one without for the rx hash
communicated to the host. Besides, changing the hash
may potentially help to change the path thru host queues.
User can disable NETIF_F_RXHASH if they require a stable
flow hash.

The name RXH_IP6_FL was chosen based on what we call
Flow Label variables in IPv6 processing (fl). I prefer
fl_lbl but that appears to be an fbnic-only spelling.
We could spell out RXH_IP6_FLOW_LABEL but existing
RXH_ defines are a lot more terse.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: andrew@lunn.ch
CC: ecree.xilinx@gmail.com
---
 include/uapi/linux/ethtool.h |  1 +
 net/ethtool/ioctl.c          | 25 +++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 707c1844010c..fed36644eb1d 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -2380,6 +2380,7 @@ enum {
 #define	RXH_L4_B_0_1	(1 << 6) /* src port in case of TCP/UDP/SCTP */
 #define	RXH_L4_B_2_3	(1 << 7) /* dst port in case of TCP/UDP/SCTP */
 #define	RXH_GTP_TEID	(1 << 8) /* teid in case of GTP */
+#define	RXH_IP6_FL	(1 << 9) /* IPv6 flow label */
 #define	RXH_DISCARD	(1 << 31)
 
 #define	RX_CLS_FLOW_DISC	0xffffffffffffffffULL
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index e8ca70554b2e..181ec2347547 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -1013,6 +1013,28 @@ static bool flow_type_hashable(u32 flow_type)
 	return false;
 }
 
+static bool flow_type_v6(u32 flow_type)
+{
+	switch (flow_type) {
+	case TCP_V6_FLOW:
+	case UDP_V6_FLOW:
+	case SCTP_V6_FLOW:
+	case AH_ESP_V6_FLOW:
+	case AH_V6_FLOW:
+	case ESP_V6_FLOW:
+	case IPV6_FLOW:
+	case GTPU_V6_FLOW:
+	case GTPC_V6_FLOW:
+	case GTPC_TEID_V6_FLOW:
+	case GTPU_EH_V6_FLOW:
+	case GTPU_UL_V6_FLOW:
+	case GTPU_DL_V6_FLOW:
+		return true;
+	}
+
+	return false;
+}
+
 /* When adding a new type, update the assert and, if it's hashable, add it to
  * the flow_type_hashable switch case.
  */
@@ -1066,6 +1088,9 @@ ethtool_srxfh_check(struct net_device *dev, const struct ethtool_rxnfc *info)
 	const struct ethtool_ops *ops = dev->ethtool_ops;
 	int rc;
 
+	if (info->data & RXH_IP6_FL && !flow_type_v6(info->flow_type))
+		return -EINVAL;
+
 	if (ops->get_rxfh) {
 		struct ethtool_rxfh_param rxfh = {};
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC net-next 3/6] eth: fbnic: support RSS on IPv6 Flow Label
  2025-06-09 17:34 [RFC net-next 0/6] net: ethtool: support including Flow Label in the flow hash for RSS Jakub Kicinski
  2025-06-09 17:34 ` [RFC net-next 1/6] net: ethtool: factor out the validation for ETHTOOL_SRXFH Jakub Kicinski
  2025-06-09 17:34 ` [RFC net-next 2/6] net: ethtool: support including Flow Label in the flow hash for RSS Jakub Kicinski
@ 2025-06-09 17:34 ` Jakub Kicinski
  2025-06-09 17:34 ` [RFC net-next 4/6] eth: bnxt: " Jakub Kicinski
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Jakub Kicinski @ 2025-06-09 17:34 UTC (permalink / raw)
  To: michael.chan, pavan.chebbi
  Cc: willemdebruijn.kernel, netdev, davem, edumazet, pabeni,
	andrew+netdev, horms, Jakub Kicinski, alexanderduyck,
	mohsin.bashr

Support IPv6 Flow Label hashing. Use both inner and outer IPv6
header's Flow Label if both headers are detected. Flow Label
is unlike normal header fields, by enabling it user accepts
the unstable hash and possible reordering. Because of that
I think it's reasonable to hash over all Flow Labels we can
find, even tho we don't hash over all L3 addresses.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: alexanderduyck@fb.com
CC: mohsin.bashr@gmail.com
---
 drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c | 2 +-
 drivers/net/ethernet/meta/fbnic/fbnic_rpc.c     | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
index 5c7556c8c4c5..24f5dc3baa70 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
@@ -806,7 +806,7 @@ static int fbnic_get_rxnfc(struct net_device *netdev,
 #define FBNIC_L2_HASH_OPTIONS \
 	(RXH_L2DA | RXH_DISCARD)
 #define FBNIC_L3_HASH_OPTIONS \
-	(FBNIC_L2_HASH_OPTIONS | RXH_IP_SRC | RXH_IP_DST)
+	(FBNIC_L2_HASH_OPTIONS | RXH_IP_SRC | RXH_IP_DST | RXH_IP6_FL)
 #define FBNIC_L4_HASH_OPTIONS \
 	(FBNIC_L3_HASH_OPTIONS | RXH_L4_B_0_1 | RXH_L4_B_2_3)
 
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c b/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c
index 8ff07b5562e3..a4dc1024c0c2 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c
@@ -71,6 +71,8 @@ u16 fbnic_flow_hash_2_rss_en_mask(struct fbnic_net *fbn, int flow_type)
 	rss_en_mask |= FBNIC_FH_2_RSSEM_BIT(IP_DST, IP_DST, flow_hash);
 	rss_en_mask |= FBNIC_FH_2_RSSEM_BIT(L4_B_0_1, L4_SRC, flow_hash);
 	rss_en_mask |= FBNIC_FH_2_RSSEM_BIT(L4_B_2_3, L4_DST, flow_hash);
+	rss_en_mask |= FBNIC_FH_2_RSSEM_BIT(IP6_FL, OV6_FL_LBL, flow_hash);
+	rss_en_mask |= FBNIC_FH_2_RSSEM_BIT(IP6_FL, IV6_FL_LBL, flow_hash);
 
 	return rss_en_mask;
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC net-next 4/6] eth: bnxt: support RSS on IPv6 Flow Label
  2025-06-09 17:34 [RFC net-next 0/6] net: ethtool: support including Flow Label in the flow hash for RSS Jakub Kicinski
                   ` (2 preceding siblings ...)
  2025-06-09 17:34 ` [RFC net-next 3/6] eth: fbnic: support RSS on IPv6 Flow Label Jakub Kicinski
@ 2025-06-09 17:34 ` Jakub Kicinski
  2025-06-09 18:24   ` Michael Chan
  2025-06-13  0:11   ` Michael Chan
  2025-06-09 17:34 ` [RFC net-next 5/6] selftests: drv-net: import things in lib one by one Jakub Kicinski
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 18+ messages in thread
From: Jakub Kicinski @ 2025-06-09 17:34 UTC (permalink / raw)
  To: michael.chan, pavan.chebbi
  Cc: willemdebruijn.kernel, netdev, davem, edumazet, pabeni,
	andrew+netdev, horms, Jakub Kicinski

It appears that the bnxt FW API has the relevant bit for Flow Label
hashing. Plumb in the support. Obey the capability bit.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: michael.chan@broadcom.com
CC: pavan.chebbi@broadcom.com
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  2 ++
 .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 26 ++++++++++++++++---
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index fda0d3cc6227..40ae34923511 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -2407,6 +2407,7 @@ struct bnxt {
 #define BNXT_RSS_CAP_ESP_V4_RSS_CAP		BIT(6)
 #define BNXT_RSS_CAP_ESP_V6_RSS_CAP		BIT(7)
 #define BNXT_RSS_CAP_MULTI_RSS_CTX		BIT(8)
+#define BNXT_RSS_CAP_IPV6_FLOW_LABEL_RSS_CAP	BIT(9)
 
 	u8			rss_hash_key[HW_HASH_KEY_SIZE];
 	u8			rss_hash_key_valid:1;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index d5495762c945..ae09158db8dc 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6945,6 +6945,8 @@ static int bnxt_hwrm_vnic_qcaps(struct bnxt *bp)
 			bp->rss_cap |= BNXT_RSS_CAP_ESP_V4_RSS_CAP;
 		if (flags & VNIC_QCAPS_RESP_FLAGS_RSS_IPSEC_ESP_SPI_IPV6_CAP)
 			bp->rss_cap |= BNXT_RSS_CAP_ESP_V6_RSS_CAP;
+		if (flags & VNIC_QCAPS_RESP_FLAGS_RSS_IPV6_FLOW_LABEL_CAP)
+			bp->rss_cap |= BNXT_RSS_CAP_IPV6_FLOW_LABEL_RSS_CAP;
 		if (flags & VNIC_QCAPS_RESP_FLAGS_RE_FLUSH_CAP)
 			bp->fw_cap |= BNXT_FW_CAP_VNIC_RE_FLUSH;
 	}
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index f5d490bf997e..fd9405cadad1 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -1652,13 +1652,24 @@ static int bnxt_srxfh(struct bnxt *bp, struct ethtool_rxnfc *cmd)
 	u32 rss_hash_cfg = bp->rss_hash_cfg;
 	int tuple, rc = 0;
 
-	if (cmd->data == RXH_4TUPLE)
+	switch (cmd->data) {
+	case RXH_4TUPLE | RXH_IP6_FL:
+	case RXH_4TUPLE:
 		tuple = 4;
-	else if (cmd->data == RXH_2TUPLE)
+		break;
+	case RXH_2TUPLE | RXH_IP6_FL:
+	case RXH_2TUPLE:
 		tuple = 2;
-	else if (!cmd->data)
+		break;
+	case 0:
 		tuple = 0;
-	else
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (cmd->data & RXH_IP6_FL &&
+	    !(bp->rss_cap & BNXT_RSS_CAP_IPV6_FLOW_LABEL_RSS_CAP))
 		return -EINVAL;
 
 	if (cmd->flow_type == TCP_V4_FLOW) {
@@ -1728,6 +1739,13 @@ static int bnxt_srxfh(struct bnxt *bp, struct ethtool_rxnfc *cmd)
 			rss_hash_cfg |= VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6;
 		else if (!tuple)
 			rss_hash_cfg &= ~VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6;
+
+		if (cmd->data & RXH_IP6_FL)
+			rss_hash_cfg |=
+				VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6_FLOW_LABEL;
+		else
+			rss_hash_cfg &=
+				~VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6_FLOW_LABEL;
 		break;
 	}
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC net-next 5/6] selftests: drv-net: import things in lib one by one
  2025-06-09 17:34 [RFC net-next 0/6] net: ethtool: support including Flow Label in the flow hash for RSS Jakub Kicinski
                   ` (3 preceding siblings ...)
  2025-06-09 17:34 ` [RFC net-next 4/6] eth: bnxt: " Jakub Kicinski
@ 2025-06-09 17:34 ` Jakub Kicinski
  2025-06-09 17:34 ` [RFC net-next 6/6] selftests: drv-net: add test for RSS on flow label Jakub Kicinski
  2025-06-09 18:26 ` [RFC net-next 0/6] net: ethtool: support including Flow Label in the flow hash for RSS Andrew Lunn
  6 siblings, 0 replies; 18+ messages in thread
From: Jakub Kicinski @ 2025-06-09 17:34 UTC (permalink / raw)
  To: michael.chan, pavan.chebbi
  Cc: willemdebruijn.kernel, netdev, davem, edumazet, pabeni,
	andrew+netdev, horms, Jakub Kicinski, shuah, mohan.prasad, sdf,
	dw, linux-kselftest

pylint doesn't understand our path hacks, and it generates a lot
of warnings for driver tests. Import what we use one by one, this
is hopefully not too tedious and it makes pylint happy.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: shuah@kernel.org
CC: mohan.prasad@microchip.com
CC: sdf@fomichev.me
CC: dw@davidwei.uk
CC: linux-kselftest@vger.kernel.org
---
 .../selftests/drivers/net/hw/lib/py/__init__.py | 17 +++++++++++++++++
 .../selftests/drivers/net/lib/py/__init__.py    | 14 ++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/tools/testing/selftests/drivers/net/hw/lib/py/__init__.py b/tools/testing/selftests/drivers/net/hw/lib/py/__init__.py
index b582885786f5..56ff11074b55 100644
--- a/tools/testing/selftests/drivers/net/hw/lib/py/__init__.py
+++ b/tools/testing/selftests/drivers/net/hw/lib/py/__init__.py
@@ -7,8 +7,25 @@ KSFT_DIR = (Path(__file__).parent / "../../../../..").resolve()
 
 try:
     sys.path.append(KSFT_DIR.as_posix())
+
     from net.lib.py import *
     from drivers.net.lib.py import *
+
+    # Import one by one to avoid pylint false positives
+    from net.lib.py import EthtoolFamily, NetdevFamily, NetshaperFamily, \
+        NlError, RtnlFamily
+    from net.lib.py import CmdExitFailure
+    from net.lib.py import bkg, cmd, defer, ethtool, fd_read_timeout, ip, \
+        rand_port, tool, wait_port_listen
+    from net.lib.py import fd_read_timeout
+    from net.lib.py import KsftSkipEx, KsftFailEx, KsftXfailEx
+    from net.lib.py import ksft_disruptive, ksft_exit, ksft_pr, ksft_run, \
+        ksft_setup
+    from net.lib.py import ksft_eq, ksft_ge, ksft_in, ksft_is, ksft_lt, \
+        ksft_ne, ksft_not_in, ksft_raises, ksft_true
+    from net.lib.py import NetNSEnter
+    from drivers.net.lib.py import GenerateTraffic
+    from drivers.net.lib.py import NetDrvEnv, NetDrvEpEnv
 except ModuleNotFoundError as e:
     ksft_pr("Failed importing `net` library from kernel sources")
     ksft_pr(str(e))
diff --git a/tools/testing/selftests/drivers/net/lib/py/__init__.py b/tools/testing/selftests/drivers/net/lib/py/__init__.py
index 401e70f7f136..9ed1d8f70524 100644
--- a/tools/testing/selftests/drivers/net/lib/py/__init__.py
+++ b/tools/testing/selftests/drivers/net/lib/py/__init__.py
@@ -7,7 +7,21 @@ KSFT_DIR = (Path(__file__).parent / "../../../..").resolve()
 
 try:
     sys.path.append(KSFT_DIR.as_posix())
+
     from net.lib.py import *
+
+    # Import one by one to avoid pylint false positives
+    from net.lib.py import EthtoolFamily, NetdevFamily, NetshaperFamily, \
+        NlError, RtnlFamily
+    from net.lib.py import CmdExitFailure
+    from net.lib.py import bkg, cmd, defer, ethtool, fd_read_timeout, ip, \
+        rand_port, tool, wait_port_listen
+    from net.lib.py import fd_read_timeout
+    from net.lib.py import KsftSkipEx, KsftFailEx, KsftXfailEx
+    from net.lib.py import ksft_disruptive, ksft_exit, ksft_pr, ksft_run, \
+        ksft_setup
+    from net.lib.py import ksft_eq, ksft_ge, ksft_in, ksft_is, ksft_lt, \
+        ksft_ne, ksft_not_in, ksft_raises, ksft_true
 except ModuleNotFoundError as e:
     ksft_pr("Failed importing `net` library from kernel sources")
     ksft_pr(str(e))
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC net-next 6/6] selftests: drv-net: add test for RSS on flow label
  2025-06-09 17:34 [RFC net-next 0/6] net: ethtool: support including Flow Label in the flow hash for RSS Jakub Kicinski
                   ` (4 preceding siblings ...)
  2025-06-09 17:34 ` [RFC net-next 5/6] selftests: drv-net: import things in lib one by one Jakub Kicinski
@ 2025-06-09 17:34 ` Jakub Kicinski
  2025-06-09 18:26 ` [RFC net-next 0/6] net: ethtool: support including Flow Label in the flow hash for RSS Andrew Lunn
  6 siblings, 0 replies; 18+ messages in thread
From: Jakub Kicinski @ 2025-06-09 17:34 UTC (permalink / raw)
  To: michael.chan, pavan.chebbi
  Cc: willemdebruijn.kernel, netdev, davem, edumazet, pabeni,
	andrew+netdev, horms, Jakub Kicinski, shuah, sdf, linux-kselftest

Add a simple test for checking that RSS on flow label works,
and that its rejected for IPv4 flows.

 # ./tools/testing/selftests/drivers/net/hw/rss_flow_label.py
 TAP version 13
 1..2
 ok 1 rss_flow_label.test_rss_flow_label
 ok 2 rss_flow_label.test_rss_flow_label_6only
 # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: shuah@kernel.org
CC: sdf@fomichev.me
CC: linux-kselftest@vger.kernel.org
---
 .../testing/selftests/drivers/net/hw/Makefile |   1 +
 .../drivers/net/hw/rss_flow_label.py          | 151 ++++++++++++++++++
 2 files changed, 152 insertions(+)
 create mode 100755 tools/testing/selftests/drivers/net/hw/rss_flow_label.py

diff --git a/tools/testing/selftests/drivers/net/hw/Makefile b/tools/testing/selftests/drivers/net/hw/Makefile
index df2c047ffa90..56bf1f1b8377 100644
--- a/tools/testing/selftests/drivers/net/hw/Makefile
+++ b/tools/testing/selftests/drivers/net/hw/Makefile
@@ -17,6 +17,7 @@ TEST_PROGS = \
 	loopback.sh \
 	pp_alloc_fail.py \
 	rss_ctx.py \
+	rss_flow_label.py \
 	rss_input_xfrm.py \
 	tso.py \
 	xsk_reconfig.py \
diff --git a/tools/testing/selftests/drivers/net/hw/rss_flow_label.py b/tools/testing/selftests/drivers/net/hw/rss_flow_label.py
new file mode 100755
index 000000000000..e471e13160ae
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/hw/rss_flow_label.py
@@ -0,0 +1,151 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""
+Tests for RSS hashing on IPv6 Flow Label.
+"""
+
+import glob
+import socket
+from lib.py import CmdExitFailure
+from lib.py import ksft_run, ksft_exit, ksft_eq, ksft_ge, ksft_in, \
+    ksft_not_in, ksft_raises, KsftSkipEx
+from lib.py import bkg, cmd, defer, fd_read_timeout, rand_port
+from lib.py import NetDrvEpEnv
+
+
+def _ethtool_get_cfg(cfg, fl_type):
+    descr = cmd(f"ethtool -n {cfg.ifname} rx-flow-hash {fl_type}").stdout
+
+    converter = {
+        "IP SA": "s",
+        "IP DA": "d",
+        "L3 proto": "t",
+        "L4 bytes 0 & 1 [TCP/UDP src port]": "f",
+        "L4 bytes 2 & 3 [TCP/UDP dst port]": "n",
+        "IPv6 Flow Label": "l",
+    }
+
+    ret = ""
+    for line in descr.split("\n")[1:-2]:
+        # if this raises we probably need to add more keys to converter above
+        ret += converter[line]
+    return ret
+
+
+def _traffic(cfg, one_sock, one_cpu):
+    local_port  = rand_port(socket.SOCK_DGRAM)
+    remote_port = rand_port(socket.SOCK_DGRAM)
+
+    sock = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
+    sock.bind(("", local_port))
+    sock.connect((cfg.remote_addr_v["6"], 0))
+    if one_sock:
+        send = f"exec 5<>/dev/udp/{cfg.addr_v['6']}/{local_port}; " \
+                "for i in `seq 20`; do echo a >&5; sleep 0.02; done; exec 5>&-"
+    else:
+        send = "for i in `seq 20`; do echo a | socat -t0.02 - UDP6:" \
+              f"[{cfg.addr_v['6']}]:{local_port},sourceport={remote_port}; done"
+
+    cpus = set()
+    with bkg(send, shell=True, host=cfg.remote, exit_wait=True):
+        for _ in range(20):
+            fd_read_timeout(sock.fileno(), 1)
+            cpu = sock.getsockopt(socket.SOL_SOCKET, socket.SO_INCOMING_CPU)
+            cpus.add(cpu)
+
+    if one_cpu:
+        ksft_eq(len(cpus), 1,
+                f"{one_sock=} - expected one CPU, got traffic on: {cpus=}")
+    else:
+        ksft_ge(len(cpus), 2,
+                f"{one_sock=} - expected many CPUs, got traffic on: {cpus=}")
+
+
+def test_rss_flow_label(cfg):
+    """
+    Test hashing on IPv6 flow label. Send traffic over a single socket
+    and over multiple sockets. Depend on the remote having auto-label
+    enabled so that it randomizes the label per socket.
+    """
+
+    cfg.require_ipver("6")
+    cfg.require_cmd("socat", remote=True)
+    if not hasattr(socket, "SO_INCOMING_CPU"):
+        raise KsftSkipEx("socket.SO_INCOMING_CPU was added in Python 3.11")
+
+    # 1 is the default, if someone changed it we probably shouldn"t mess with it
+    af = cmd("cat /proc/sys/net/ipv6/auto_flowlabels", host=cfg.remote).stdout
+    if af.strip() != "1":
+        raise KsftSkipEx("Remote does not have auto_flowlabels enabled")
+
+    qcnt = len(glob.glob(f"/sys/class/net/{cfg.ifname}/queues/rx-*"))
+    if qcnt < 2:
+        raise KsftSkipEx(f"Local has only {qcnt} queues")
+
+    # Enable flow label hashing for UDP6
+    initial = _ethtool_get_cfg(cfg, "udp6")
+    no_lbl = initial.replace("l", "")
+    if "l" not in initial:
+        try:
+            cmd(f"ethtool -N {cfg.ifname} rx-flow-hash udp6 l{no_lbl}")
+        except CmdExitFailure as exc:
+            raise KsftSkipEx("Device doesn't support Flow Label for UDP6") from exc
+
+        defer(cmd, f"ethtool -N {cfg.ifname} rx-flow-hash udp6 {initial}")
+
+    _traffic(cfg, one_sock=True, one_cpu=True)
+    _traffic(cfg, one_sock=False, one_cpu=False)
+
+    # Disable it, we should see no hashing (reset was already defer()ed)
+    cmd(f"ethtool -N {cfg.ifname} rx-flow-hash udp6 {no_lbl}")
+
+    _traffic(cfg, one_sock=False, one_cpu=True)
+
+
+def _check_v4_flow_types(cfg):
+    for fl_type in ["tcp4", "udp4", "ah4", "esp4", "sctp4"]:
+        try:
+            cur = cmd(f"ethtool -n {cfg.ifname} rx-flow-hash {fl_type}").stdout
+            ksft_not_in("Flow Label", cur,
+                        comment=f"{fl_type=} has Flow Label:" + cur)
+        except CmdExitFailure:
+            # Probably does not support this flow type
+            pass
+
+
+def test_rss_flow_label_6only(cfg):
+    """
+    Test interactions with IPv4 flow types. It should not be possible to set
+    IPv6 Flow Label hashing for an IPv4 flow type. The Flow Label should also
+    not appear in the IPv4 "current config".
+    """
+
+    with ksft_raises(CmdExitFailure) as cm:
+        cmd(f"ethtool -N {cfg.ifname} rx-flow-hash tcp4 sdfnl")
+    ksft_in("Invalid argument", cm.exception.cmd.stderr)
+
+    _check_v4_flow_types(cfg)
+
+    # Try to enable Flow Labels and check again, in case it leaks thru
+    initial = _ethtool_get_cfg(cfg, "udp6")
+    changed = initial.replace("l", "") if "l" in initial else initial + "l"
+
+    cmd(f"ethtool -N {cfg.ifname} rx-flow-hash udp6 {changed}")
+    restore = defer(cmd, f"ethtool -N {cfg.ifname} rx-flow-hash udp6 {initial}")
+
+    _check_v4_flow_types(cfg)
+    restore.exec()
+    _check_v4_flow_types(cfg)
+
+
+def main() -> None:
+    with NetDrvEpEnv(__file__, nsim_test=False) as cfg:
+        ksft_run([test_rss_flow_label,
+                  test_rss_flow_label_6only],
+                 args=(cfg, ))
+    ksft_exit()
+
+
+if __name__ == "__main__":
+    main()
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC net-next 4/6] eth: bnxt: support RSS on IPv6 Flow Label
  2025-06-09 17:34 ` [RFC net-next 4/6] eth: bnxt: " Jakub Kicinski
@ 2025-06-09 18:24   ` Michael Chan
  2025-06-09 18:44     ` Jakub Kicinski
  2025-06-13  0:11   ` Michael Chan
  1 sibling, 1 reply; 18+ messages in thread
From: Michael Chan @ 2025-06-09 18:24 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: pavan.chebbi, willemdebruijn.kernel, netdev, davem, edumazet,
	pabeni, andrew+netdev, horms

[-- Attachment #1: Type: text/plain, Size: 357 bytes --]

On Mon, Jun 9, 2025 at 10:34 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> It appears that the bnxt FW API has the relevant bit for Flow Label
> hashing. Plumb in the support. Obey the capability bit.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Looks good to me, but we need to report RXH_IP6_FL in bnxt_grxfh() for
IP V6 flows, right?

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4196 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC net-next 0/6] net: ethtool: support including Flow Label in the flow hash for RSS
  2025-06-09 17:34 [RFC net-next 0/6] net: ethtool: support including Flow Label in the flow hash for RSS Jakub Kicinski
                   ` (5 preceding siblings ...)
  2025-06-09 17:34 ` [RFC net-next 6/6] selftests: drv-net: add test for RSS on flow label Jakub Kicinski
@ 2025-06-09 18:26 ` Andrew Lunn
  2025-06-09 18:58   ` Jakub Kicinski
  6 siblings, 1 reply; 18+ messages in thread
From: Andrew Lunn @ 2025-06-09 18:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: michael.chan, pavan.chebbi, willemdebruijn.kernel, netdev, davem,
	edumazet, pabeni, andrew+netdev, horms

On Mon, Jun 09, 2025 at 10:34:36AM -0700, Jakub Kicinski wrote:
> Add support for using IPv6 Flow Label in Rx hash computation
> and therefore RSS queue selection.

It took me a while to get there, i wondered why you are extending the
IOCTL code, rather than netlink. But netlink ethtool does not appear
to support ops->set_rxnfc() calls.

Rather than extend the deprecated ioctl i think the first patch in the
series should add set_rxnfc() to netlink ethtool.

	Andrew

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC net-next 4/6] eth: bnxt: support RSS on IPv6 Flow Label
  2025-06-09 18:24   ` Michael Chan
@ 2025-06-09 18:44     ` Jakub Kicinski
  2025-06-09 23:03       ` Michael Chan
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2025-06-09 18:44 UTC (permalink / raw)
  To: Michael Chan
  Cc: pavan.chebbi, willemdebruijn.kernel, netdev, davem, edumazet,
	pabeni, andrew+netdev, horms

On Mon, 9 Jun 2025 11:24:39 -0700 Michael Chan wrote:
> > It appears that the bnxt FW API has the relevant bit for Flow Label
> > hashing. Plumb in the support. Obey the capability bit.
> >
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>  
> 
> Looks good to me, but we need to report RXH_IP6_FL in bnxt_grxfh() for
> IP V6 flows, right?

Ah, sorry, something like this?

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index fd9405cadad1..4385a94d4d1e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -1582,9 +1582,14 @@ static u64 get_ethtool_ipv4_rss(struct bnxt *bp)
 
 static u64 get_ethtool_ipv6_rss(struct bnxt *bp)
 {
+	u64 rss = 0;
+
 	if (bp->rss_hash_cfg & VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6)
-		return RXH_IP_SRC | RXH_IP_DST;
-	return 0;
+		rss |= RXH_IP_SRC | RXH_IP_DST;
+	if (bp->rss_hash_cfg & VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6_FLOW_LABEL)
+		rss |= RXH_IP6_FL;
+
+	return rss;
 }
 
 static int bnxt_grxfh(struct bnxt *bp, struct ethtool_rxnfc *cmd)

Would someone at Broadcom be able to test (per cover letter)?
I'd love to have the test validated on bnxt if possible :(
I can post a v2 with the snippet merged in if that helps.

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC net-next 0/6] net: ethtool: support including Flow Label in the flow hash for RSS
  2025-06-09 18:26 ` [RFC net-next 0/6] net: ethtool: support including Flow Label in the flow hash for RSS Andrew Lunn
@ 2025-06-09 18:58   ` Jakub Kicinski
  2025-06-09 19:30     ` Andrew Lunn
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2025-06-09 18:58 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: michael.chan, pavan.chebbi, willemdebruijn.kernel, netdev, davem,
	edumazet, pabeni, andrew+netdev, horms

On Mon, 9 Jun 2025 20:26:14 +0200 Andrew Lunn wrote:
> On Mon, Jun 09, 2025 at 10:34:36AM -0700, Jakub Kicinski wrote:
> > Add support for using IPv6 Flow Label in Rx hash computation
> > and therefore RSS queue selection.  
> 
> It took me a while to get there, i wondered why you are extending the
> IOCTL code, rather than netlink. But netlink ethtool does not appear
> to support ops->set_rxnfc() calls.
> 
> Rather than extend the deprecated ioctl i think the first patch in the
> series should add set_rxnfc() to netlink ethtool.

I suppose the fact we added at least 2 features to this API since 
the netlink conversion will not convince you otherwise? (input_xfrm
with all of its options, and GTP flow types and hashing)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC net-next 0/6] net: ethtool: support including Flow Label in the flow hash for RSS
  2025-06-09 18:58   ` Jakub Kicinski
@ 2025-06-09 19:30     ` Andrew Lunn
  2025-06-09 19:53       ` Jakub Kicinski
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Lunn @ 2025-06-09 19:30 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: michael.chan, pavan.chebbi, willemdebruijn.kernel, netdev, davem,
	edumazet, pabeni, andrew+netdev, horms

On Mon, Jun 09, 2025 at 11:58:25AM -0700, Jakub Kicinski wrote:
> On Mon, 9 Jun 2025 20:26:14 +0200 Andrew Lunn wrote:
> > On Mon, Jun 09, 2025 at 10:34:36AM -0700, Jakub Kicinski wrote:
> > > Add support for using IPv6 Flow Label in Rx hash computation
> > > and therefore RSS queue selection.  
> > 
> > It took me a while to get there, i wondered why you are extending the
> > IOCTL code, rather than netlink. But netlink ethtool does not appear
> > to support ops->set_rxnfc() calls.
> > 
> > Rather than extend the deprecated ioctl i think the first patch in the
> > series should add set_rxnfc() to netlink ethtool.
> 
> I suppose the fact we added at least 2 features to this API since 
> the netlink conversion will not convince you otherwise? (input_xfrm
> with all of its options, and GTP flow types and hashing)

Not really. We should of asked that the first patch in those series
added the netlink code. Why did we bother adding netlink, if we are
going to keep extending the IOCTL interface?

	Andrew

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC net-next 0/6] net: ethtool: support including Flow Label in the flow hash for RSS
  2025-06-09 19:30     ` Andrew Lunn
@ 2025-06-09 19:53       ` Jakub Kicinski
  0 siblings, 0 replies; 18+ messages in thread
From: Jakub Kicinski @ 2025-06-09 19:53 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: michael.chan, pavan.chebbi, willemdebruijn.kernel, netdev, davem,
	edumazet, pabeni, andrew+netdev, horms

On Mon, 9 Jun 2025 21:30:53 +0200 Andrew Lunn wrote:
> On Mon, Jun 09, 2025 at 11:58:25AM -0700, Jakub Kicinski wrote:
> > On Mon, 9 Jun 2025 20:26:14 +0200 Andrew Lunn wrote:  
> > > It took me a while to get there, i wondered why you are extending the
> > > IOCTL code, rather than netlink. But netlink ethtool does not appear
> > > to support ops->set_rxnfc() calls.
> > > 
> > > Rather than extend the deprecated ioctl i think the first patch in the
> > > series should add set_rxnfc() to netlink ethtool.  
> > 
> > I suppose the fact we added at least 2 features to this API since 
> > the netlink conversion will not convince you otherwise? (input_xfrm
> > with all of its options, and GTP flow types and hashing)  
> 
> Not really. We should of asked that the first patch in those series
> added the netlink code. Why did we bother adding netlink, if we are
> going to keep extending the IOCTL interface?

The RSS settings and NFC need to be rethought, that's why it wasn't
migrated. But I can pop just the hash fields into the RSS_GET that 
we already added (and add the corresponding SET part).

The config and creation of RSS contexts should diverge from what 
the ioctl do. The IOCTL takes an indirection table of a fixed size,
but modern NICs will increasingly need to allocate the RSS table
dynamically, based on the size. As such user space preparing the table
for us is really counter productive.

The flow filters are a whole different can of warms. We had been
pushing for TC migration over the last decade, which I don't think
really happened. More recently we were trying to convince Jamal that
his P4TC work would really best fit as a netlink replacement for flow
filters. IDK where we landed, I'd really not touch that with a 10 ft
pole.

So yeah, this is not a simple "why did nobody convert this IOCTL struct
to netlink yet" problem. But as I said I think we can push the hash
config into the RSS_GET / RSS_SET..

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC net-next 4/6] eth: bnxt: support RSS on IPv6 Flow Label
  2025-06-09 18:44     ` Jakub Kicinski
@ 2025-06-09 23:03       ` Michael Chan
  0 siblings, 0 replies; 18+ messages in thread
From: Michael Chan @ 2025-06-09 23:03 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: pavan.chebbi, willemdebruijn.kernel, netdev, davem, edumazet,
	pabeni, andrew+netdev, horms

[-- Attachment #1: Type: text/plain, Size: 1343 bytes --]

On Mon, Jun 9, 2025 at 11:44 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Ah, sorry, something like this?
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
> index fd9405cadad1..4385a94d4d1e 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
> @@ -1582,9 +1582,14 @@ static u64 get_ethtool_ipv4_rss(struct bnxt *bp)
>
>  static u64 get_ethtool_ipv6_rss(struct bnxt *bp)
>  {
> +       u64 rss = 0;
> +
>         if (bp->rss_hash_cfg & VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6)
> -               return RXH_IP_SRC | RXH_IP_DST;
> -       return 0;
> +               rss |= RXH_IP_SRC | RXH_IP_DST;
> +       if (bp->rss_hash_cfg & VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6_FLOW_LABEL)
> +               rss |= RXH_IP6_FL;
> +
> +       return rss;
>  }

Yes, I think this should work.

>
>  static int bnxt_grxfh(struct bnxt *bp, struct ethtool_rxnfc *cmd)
>
> Would someone at Broadcom be able to test (per cover letter)?
> I'd love to have the test validated on bnxt if possible :(

Yes, we'll get this tested in our lab.

> I can post a v2 with the snippet merged in if that helps.

You don't need to post v2 just for this, unless you have some other
changes you want to make.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4196 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC net-next 2/6] net: ethtool: support including Flow Label in the flow hash for RSS
  2025-06-09 17:34 ` [RFC net-next 2/6] net: ethtool: support including Flow Label in the flow hash for RSS Jakub Kicinski
@ 2025-06-10 13:33   ` Willem de Bruijn
  0 siblings, 0 replies; 18+ messages in thread
From: Willem de Bruijn @ 2025-06-10 13:33 UTC (permalink / raw)
  To: Jakub Kicinski, michael.chan, pavan.chebbi
  Cc: willemdebruijn.kernel, netdev, davem, edumazet, pabeni,
	andrew+netdev, horms, Jakub Kicinski, andrew, ecree.xilinx

Jakub Kicinski wrote:
> Some modern NICs support including the IPv6 Flow Label in
> the flow hash for RSS queue selection. This is outside
> the old "Microsoft spec", but was included in the OCP NIC spec:
> 
>   [ ] RSS include ow label in the hash (configurable)
> 
> https://www.opencompute.org/documents/ocp-server-nic-core-features-specification-ocp-spec-format-1-1-pdf

Or perhaps https://www.opencompute.org/w/index.php?title=Core_Offloads#Receive_Side_Scaling

One thing to make very clear is that in this design the flow label is
an extra field to include. It does not replace the L4 fields.

This is perhaps mistaken. The IPv6 flow label definition is

"Packet classifiers can
 use the triplet of Flow Label, Source Address, and Destination
 Address fields to identify the flow to which a particular packet
 belongs."

https://datatracker.ietf.org/doc/html/rfc6437#section-2

So explicitly also hashing in the L4 fields should not be needed.

Generally the flow label includes the L4 ports in its initial value.
Though PLB, through sk_rethink_txhash, will remove that.

Similarly an IPv6 tunneled packet should no longer need hashing of
it inner layer(s) if the outer flow label is sufficiently computed by
the source to identify a single flow. AFAIK that was the entire point
of this field.

That said, it is always safe to include the L4 fields as well. And in
the end what matters is configuring what the hardware already
supports.
 
> RSS Flow Label hashing allows TCP Protective Load Balancing (PLB)
> to recover from receiver congestion / overload.
> Rx CPU/queue hotspots are relatively common for data ingest
> workloads, and so far we had to try to detect the condition
> at the RPC layer and reopen the connection. PLB lets us change
> the Flow Label and therefore Rx CPU on RTO, with minimal packet
> reordering. PLB reaction times are much faster, and can happen
> at any point in the connection, not just at RPC boundaries.
> 
> Due to the nature of host processing (relatively long queues,
> other kernel subsystems masking IRQs for 100s of msecs)
> the risk of reordering within the host is higher than in
> the network. But for applications which need it - it is far
> preferable to potentially persistent overload of subset of
> queues.
> 
> It is expected that the hash communicated to the host
> may change if the Flow Label changes. This may be surprising
> to some host software, but I don't expect the devices
> can compute two Toeplitz hashes, one with the Flow Label
> for queue selection and one without for the rx hash
> communicated to the host. Besides, changing the hash
> may potentially help to change the path thru host queues.
> User can disable NETIF_F_RXHASH if they require a stable
> flow hash.
> 
> The name RXH_IP6_FL was chosen based on what we call
> Flow Label variables in IPv6 processing (fl). I prefer
> fl_lbl but that appears to be an fbnic-only spelling.
> We could spell out RXH_IP6_FLOW_LABEL but existing
> RXH_ defines are a lot more terse.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: andrew@lunn.ch
> CC: ecree.xilinx@gmail.com
> ---
>  include/uapi/linux/ethtool.h |  1 +
>  net/ethtool/ioctl.c          | 25 +++++++++++++++++++++++++
>  2 files changed, 26 insertions(+)
> 
> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
> index 707c1844010c..fed36644eb1d 100644
> --- a/include/uapi/linux/ethtool.h
> +++ b/include/uapi/linux/ethtool.h
> @@ -2380,6 +2380,7 @@ enum {
>  #define	RXH_L4_B_0_1	(1 << 6) /* src port in case of TCP/UDP/SCTP */
>  #define	RXH_L4_B_2_3	(1 << 7) /* dst port in case of TCP/UDP/SCTP */
>  #define	RXH_GTP_TEID	(1 << 8) /* teid in case of GTP */
> +#define	RXH_IP6_FL	(1 << 9) /* IPv6 flow label */
>  #define	RXH_DISCARD	(1 << 31)
>  
>  #define	RX_CLS_FLOW_DISC	0xffffffffffffffffULL
> diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
> index e8ca70554b2e..181ec2347547 100644
> --- a/net/ethtool/ioctl.c
> +++ b/net/ethtool/ioctl.c
> @@ -1013,6 +1013,28 @@ static bool flow_type_hashable(u32 flow_type)
>  	return false;
>  }
>  
> +static bool flow_type_v6(u32 flow_type)
> +{
> +	switch (flow_type) {
> +	case TCP_V6_FLOW:
> +	case UDP_V6_FLOW:
> +	case SCTP_V6_FLOW:
> +	case AH_ESP_V6_FLOW:
> +	case AH_V6_FLOW:
> +	case ESP_V6_FLOW:
> +	case IPV6_FLOW:
> +	case GTPU_V6_FLOW:
> +	case GTPC_V6_FLOW:
> +	case GTPC_TEID_V6_FLOW:
> +	case GTPU_EH_V6_FLOW:
> +	case GTPU_UL_V6_FLOW:
> +	case GTPU_DL_V6_FLOW:
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
>  /* When adding a new type, update the assert and, if it's hashable, add it to
>   * the flow_type_hashable switch case.
>   */
> @@ -1066,6 +1088,9 @@ ethtool_srxfh_check(struct net_device *dev, const struct ethtool_rxnfc *info)
>  	const struct ethtool_ops *ops = dev->ethtool_ops;
>  	int rc;
>  
> +	if (info->data & RXH_IP6_FL && !flow_type_v6(info->flow_type))
> +		return -EINVAL;
> +
>  	if (ops->get_rxfh) {
>  		struct ethtool_rxfh_param rxfh = {};
>  
> -- 
> 2.49.0
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC net-next 4/6] eth: bnxt: support RSS on IPv6 Flow Label
  2025-06-09 17:34 ` [RFC net-next 4/6] eth: bnxt: " Jakub Kicinski
  2025-06-09 18:24   ` Michael Chan
@ 2025-06-13  0:11   ` Michael Chan
  2025-06-13  1:04     ` Jakub Kicinski
  1 sibling, 1 reply; 18+ messages in thread
From: Michael Chan @ 2025-06-13  0:11 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: pavan.chebbi, willemdebruijn.kernel, netdev, davem, edumazet,
	pabeni, andrew+netdev, horms

[-- Attachment #1: Type: text/plain, Size: 629 bytes --]

On Mon, Jun 9, 2025 at 10:34 AM Jakub Kicinski <kuba@kernel.org> wrote:

> @@ -1728,6 +1739,13 @@ static int bnxt_srxfh(struct bnxt *bp, struct ethtool_rxnfc *cmd)
>                         rss_hash_cfg |= VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6;
>                 else if (!tuple)
>                         rss_hash_cfg &= ~VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6;
> +
> +               if (cmd->data & RXH_IP6_FL)
> +                       rss_hash_cfg |=
> +                               VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6_FLOW_LABEL;

Just a quick update on this.  The FW call fails when this flag is set.
I am looking into it.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4196 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC net-next 4/6] eth: bnxt: support RSS on IPv6 Flow Label
  2025-06-13  0:11   ` Michael Chan
@ 2025-06-13  1:04     ` Jakub Kicinski
  2025-06-23  4:20       ` Michael Chan
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2025-06-13  1:04 UTC (permalink / raw)
  To: Michael Chan
  Cc: pavan.chebbi, willemdebruijn.kernel, netdev, davem, edumazet,
	pabeni, andrew+netdev, horms

On Thu, 12 Jun 2025 17:11:18 -0700 Michael Chan wrote:
> > @@ -1728,6 +1739,13 @@ static int bnxt_srxfh(struct bnxt *bp, struct ethtool_rxnfc *cmd)
> >                         rss_hash_cfg |= VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6;
> >                 else if (!tuple)
> >                         rss_hash_cfg &= ~VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6;
> > +
> > +               if (cmd->data & RXH_IP6_FL)
> > +                       rss_hash_cfg |=
> > +                               VNIC_RSS_CFG_REQ_HASH_TYPE_IPV6_FLOW_LABEL;  
> 
> Just a quick update on this.  The FW call fails when this flag is set.
> I am looking into it.

Thanks for the update! FWIW I'm doing the refactoring to add netlink
support as requested by Andrew:
https://lore.kernel.org/all/20250611145949.2674086-1-kuba@kernel.org/
There's 40 drivers to convert so even if it goes smoothly I wouldn't
be able to repost for another week.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC net-next 4/6] eth: bnxt: support RSS on IPv6 Flow Label
  2025-06-13  1:04     ` Jakub Kicinski
@ 2025-06-23  4:20       ` Michael Chan
  0 siblings, 0 replies; 18+ messages in thread
From: Michael Chan @ 2025-06-23  4:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: pavan.chebbi, willemdebruijn.kernel, netdev, davem, edumazet,
	pabeni, andrew+netdev, horms

[-- Attachment #1: Type: text/plain, Size: 1072 bytes --]

On Thu, Jun 12, 2025 at 6:04 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 12 Jun 2025 17:11:18 -0700 Michael Chan wrote:
> > Just a quick update on this.  The FW call fails when this flag is set.
> > I am looking into it.
>
> Thanks for the update! FWIW I'm doing the refactoring to add netlink
> support as requested by Andrew:
> https://lore.kernel.org/all/20250611145949.2674086-1-kuba@kernel.org/
> There's 40 drivers to convert so even if it goes smoothly I wouldn't
> be able to repost for another week.

I've looked into this.  The way flow label works on our chip is that
the flow label can be hashed with the IPV6 source and destination
addresses only, effectively becoming a 3-tuple hash.  For example,
this would be valid:

ethtool -N eth0 rx-flow-hash tcp6 sdl

4-tuple hash cannot include the flow label.  2-tuple and 3-tuple are
mutually exclusive for IPV6.  For example, once tcp6 is configured for
3-tuple, udp6 cannot be configured for 2-tuple.

I can post the patch that I am testing after some more testing.  Thanks.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4196 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-06-23  4:20 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-09 17:34 [RFC net-next 0/6] net: ethtool: support including Flow Label in the flow hash for RSS Jakub Kicinski
2025-06-09 17:34 ` [RFC net-next 1/6] net: ethtool: factor out the validation for ETHTOOL_SRXFH Jakub Kicinski
2025-06-09 17:34 ` [RFC net-next 2/6] net: ethtool: support including Flow Label in the flow hash for RSS Jakub Kicinski
2025-06-10 13:33   ` Willem de Bruijn
2025-06-09 17:34 ` [RFC net-next 3/6] eth: fbnic: support RSS on IPv6 Flow Label Jakub Kicinski
2025-06-09 17:34 ` [RFC net-next 4/6] eth: bnxt: " Jakub Kicinski
2025-06-09 18:24   ` Michael Chan
2025-06-09 18:44     ` Jakub Kicinski
2025-06-09 23:03       ` Michael Chan
2025-06-13  0:11   ` Michael Chan
2025-06-13  1:04     ` Jakub Kicinski
2025-06-23  4:20       ` Michael Chan
2025-06-09 17:34 ` [RFC net-next 5/6] selftests: drv-net: import things in lib one by one Jakub Kicinski
2025-06-09 17:34 ` [RFC net-next 6/6] selftests: drv-net: add test for RSS on flow label Jakub Kicinski
2025-06-09 18:26 ` [RFC net-next 0/6] net: ethtool: support including Flow Label in the flow hash for RSS Andrew Lunn
2025-06-09 18:58   ` Jakub Kicinski
2025-06-09 19:30     ` Andrew Lunn
2025-06-09 19:53       ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).