[PATCH net-next v2 0/9] net: stats, tools, driver tests for HW GRO

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next v2 0/9] net: stats, tools, driver tests for HW GRO
@ 2026-02-07  0:35 Jakub Kicinski
  2026-02-07  0:35 ` [PATCH net-next v2 1/9] eth: bnxt: gather and report HW-GRO stats Jakub Kicinski
                   ` (8 more replies)
  0 siblings, 9 replies; 22+ messages in thread
From: Jakub Kicinski @ 2026-02-07  0:35 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, shuah, willemb,
	petrm, donald.hunter, michael.chan, pavan.chebbi, linux-kselftest,
	Jakub Kicinski

Add miscellaneous pieces related to production use of HW-GRO:
 - report standard stats from drivers (bnxt included here,
   Gal recently posted patches for mlx5 which is great)
 - CLI tool for calculating HW GRO savings / effectiveness
 - tests for the stats, packet ordering and depth

v2:
 - [patch 1] fix ethtool -S and unnecessary feature check
 - [patch 3] use %1$s
 - [patch 6] do not enable SO_TXTIME for multi-flow tests
 - [patch 9] move the test to cover SW GRO
v1: https://lore.kernel.org/20260205220541.2992807-1-kuba@kernel.org

Jakub Kicinski (9):
  eth: bnxt: gather and report HW-GRO stats
  tools: ynltool: factor out qstat dumping
  tools: ynltool: add qstats analysis for HW-GRO efficiency / savings
  selftests: net: move gro to lib for HW vs SW reuse
  selftests: drv-net: give HW stats sync time extra 25% of margin
  selftests: drv-net: gro: use SO_TXTIME to schedule packets together
  selftests: drv-net: gro: test GRO stats
  selftests: drv-net: gro: add test for packet ordering
  selftests: drv-net: gro: add a test for GRO depth

 tools/testing/selftests/drivers/net/Makefile  |   1 -
 .../testing/selftests/drivers/net/hw/Makefile |   1 +
 tools/testing/selftests/net/lib/Makefile      |   1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |   6 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  15 +-
 tools/net/ynl/ynltool/qstats.c                | 171 ++++++----
 .../selftests/{drivers/net => net/lib}/gro.c  | 252 ++++++++++++++-
 .../testing/selftests/drivers/net/.gitignore  |   1 -
 tools/testing/selftests/drivers/net/gro.py    | 203 ++++++++++--
 .../selftests/drivers/net/hw/gro_hw.py        | 294 ++++++++++++++++++
 .../selftests/drivers/net/lib/py/env.py       |   4 +-
 tools/testing/selftests/net/lib/.gitignore    |   1 +
 12 files changed, 859 insertions(+), 91 deletions(-)
 rename tools/testing/selftests/{drivers/net => net/lib}/gro.c (86%)
 create mode 100755 tools/testing/selftests/drivers/net/hw/gro_hw.py

-- 
2.53.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH net-next v2 1/9] eth: bnxt: gather and report HW-GRO stats
  2026-02-07  0:35 [PATCH net-next v2 0/9] net: stats, tools, driver tests for HW GRO Jakub Kicinski
@ 2026-02-07  0:35 ` Jakub Kicinski
  2026-02-08  0:09   ` Michael Chan
  2026-02-07  0:35 ` [PATCH net-next v2 2/9] tools: ynltool: factor out qstat dumping Jakub Kicinski
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2026-02-07  0:35 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, shuah, willemb,
	petrm, donald.hunter, michael.chan, pavan.chebbi, linux-kselftest,
	Jakub Kicinski

Count and report HW-GRO stats as seen by the kernel.
The device stats for GRO seem to not reflect the reality,
perhaps they count sessions which did not actually result
in any aggregation. Also they count wire packets, so we
have to count super-frames, anyway.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v2:
 - remove unnecessary condition for feature check (Michael)
 - move the stats to the end of array to avoid messing up
   ethtool -S output (AI)
v1: https://lore.kernel.org/20260205220541.2992807-2-kuba@kernel.org
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  6 ++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 15 +++++++++++++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index f036ef60230b..9a41b9e0423c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1126,8 +1126,11 @@ struct bnxt_rx_sw_stats {
 	u64			rx_l4_csum_errors;
 	u64			rx_resets;
 	u64			rx_buf_errors;
+	/* end of ethtool -S stats */
 	u64			rx_oom_discards;
 	u64			rx_netpoll_discards;
+	u64			rx_hw_gro_packets;
+	u64			rx_hw_gro_wire_packets;
 };
 
 struct bnxt_tx_sw_stats {
@@ -1154,6 +1157,9 @@ struct bnxt_total_ring_err_stats {
 	u64			tx_total_resets;
 	u64			tx_total_ring_discards;
 	u64			total_missed_irqs;
+	/* end of ethtool -S stats */
+	u64			rx_total_hw_gro_packets;
+	u64			rx_total_hw_gro_wire_packets;
 };
 
 struct bnxt_stats_mem {
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 466e0fc6141f..44c46bac92d9 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1804,7 +1804,8 @@ static inline struct sk_buff *bnxt_gro_skb(struct bnxt *bp,
 					   struct bnxt_tpa_info *tpa_info,
 					   struct rx_tpa_end_cmp *tpa_end,
 					   struct rx_tpa_end_cmp_ext *tpa_end1,
-					   struct sk_buff *skb)
+					   struct sk_buff *skb,
+					   struct bnxt_rx_sw_stats *rx_stats)
 {
 #ifdef CONFIG_INET
 	int payload_off;
@@ -1814,6 +1815,9 @@ static inline struct sk_buff *bnxt_gro_skb(struct bnxt *bp,
 	if (segs == 1)
 		return skb;
 
+	rx_stats->rx_hw_gro_packets++;
+	rx_stats->rx_hw_gro_wire_packets += segs;
+
 	NAPI_GRO_CB(skb)->count = segs;
 	skb_shinfo(skb)->gso_size =
 		le32_to_cpu(tpa_end1->rx_tpa_end_cmp_seg_len);
@@ -1987,7 +1991,8 @@ static inline struct sk_buff *bnxt_tpa_end(struct bnxt *bp,
 	}
 
 	if (gro)
-		skb = bnxt_gro_skb(bp, tpa_info, tpa_end, tpa_end1, skb);
+		skb = bnxt_gro_skb(bp, tpa_info, tpa_end, tpa_end1, skb,
+				   &cpr->sw_stats->rx);
 
 	return skb;
 }
@@ -13492,6 +13497,8 @@ static void bnxt_get_one_ring_err_stats(struct bnxt *bp,
 	stats->rx_total_netpoll_discards += sw_stats->rx.rx_netpoll_discards;
 	stats->rx_total_ring_discards +=
 		BNXT_GET_RING_STATS64(hw_stats, rx_discard_pkts);
+	stats->rx_total_hw_gro_packets += sw_stats->rx.rx_hw_gro_packets;
+	stats->rx_total_hw_gro_wire_packets += sw_stats->rx.rx_hw_gro_wire_packets;
 	stats->tx_total_resets += sw_stats->tx.tx_resets;
 	stats->tx_total_ring_discards +=
 		BNXT_GET_RING_STATS64(hw_stats, tx_discard_pkts);
@@ -15931,6 +15938,8 @@ static void bnxt_get_queue_stats_rx(struct net_device *dev, int i,
 	stats->bytes += BNXT_GET_RING_STATS64(sw, rx_bcast_bytes);
 
 	stats->alloc_fail = cpr->sw_stats->rx.rx_oom_discards;
+	stats->hw_gro_packets = cpr->sw_stats->rx.rx_hw_gro_packets;
+	stats->hw_gro_wire_packets = cpr->sw_stats->rx.rx_hw_gro_wire_packets;
 }
 
 static void bnxt_get_queue_stats_tx(struct net_device *dev, int i,
@@ -15966,6 +15975,8 @@ static void bnxt_get_base_stats(struct net_device *dev,
 	rx->packets = bp->net_stats_prev.rx_packets;
 	rx->bytes = bp->net_stats_prev.rx_bytes;
 	rx->alloc_fail = bp->ring_err_stats_prev.rx_total_oom_discards;
+	rx->hw_gro_packets = bp->ring_err_stats_prev.rx_total_hw_gro_packets;
+	rx->hw_gro_wire_packets = bp->ring_err_stats_prev.rx_total_hw_gro_wire_packets;
 
 	tx->packets = bp->net_stats_prev.tx_packets;
 	tx->bytes = bp->net_stats_prev.tx_bytes;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v2 2/9] tools: ynltool: factor out qstat dumping
  2026-02-07  0:35 [PATCH net-next v2 0/9] net: stats, tools, driver tests for HW GRO Jakub Kicinski
  2026-02-07  0:35 ` [PATCH net-next v2 1/9] eth: bnxt: gather and report HW-GRO stats Jakub Kicinski
@ 2026-02-07  0:35 ` Jakub Kicinski
  2026-02-07  0:35 ` [PATCH net-next v2 3/9] tools: ynltool: add qstats analysis for HW-GRO efficiency / savings Jakub Kicinski
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Jakub Kicinski @ 2026-02-07  0:35 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, shuah, willemb,
	petrm, donald.hunter, michael.chan, pavan.chebbi, linux-kselftest,
	Jakub Kicinski

The logic to open a socket and dump the queues is the same
across sub-commands. Factor it out, we'll need it again.

No functional changes intended.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 tools/net/ynl/ynltool/qstats.c | 95 +++++++++++++++-------------------
 1 file changed, 41 insertions(+), 54 deletions(-)

diff --git a/tools/net/ynl/ynltool/qstats.c b/tools/net/ynl/ynltool/qstats.c
index 31fb45709ffa..d19acab0bf2a 100644
--- a/tools/net/ynl/ynltool/qstats.c
+++ b/tools/net/ynl/ynltool/qstats.c
@@ -237,13 +237,47 @@ static void print_plain_qstats(struct netdev_qstats_get_list *qstats)
 	}
 }
 
-static int do_show(int argc, char **argv)
+static struct netdev_qstats_get_list *
+qstats_dump(enum netdev_qstats_scope scope)
 {
 	struct netdev_qstats_get_list *qstats;
 	struct netdev_qstats_get_req *req;
 	struct ynl_error yerr;
 	struct ynl_sock *ys;
-	int ret = 0;
+
+	ys = ynl_sock_create(&ynl_netdev_family, &yerr);
+	if (!ys) {
+		p_err("YNL: %s", yerr.msg);
+		return NULL;
+	}
+
+	req = netdev_qstats_get_req_alloc();
+	if (!req) {
+		p_err("failed to allocate qstats request");
+		goto err_close;
+	}
+
+	if (scope)
+		netdev_qstats_get_req_set_scope(req, scope);
+
+	qstats = netdev_qstats_get_dump(ys, req);
+	netdev_qstats_get_req_free(req);
+	if (!qstats) {
+		p_err("failed to get queue stats: %s", ys->err.msg);
+		goto err_close;
+	}
+
+	ynl_sock_destroy(ys);
+	return qstats;
+
+err_close:
+	ynl_sock_destroy(ys);
+	return NULL;
+}
+
+static int do_show(int argc, char **argv)
+{
+	struct netdev_qstats_get_list *qstats;
 
 	/* Parse options */
 	while (argc > 0) {
@@ -268,29 +302,9 @@ static int do_show(int argc, char **argv)
 		}
 	}
 
-	ys = ynl_sock_create(&ynl_netdev_family, &yerr);
-	if (!ys) {
-		p_err("YNL: %s", yerr.msg);
+	qstats = qstats_dump(scope);
+	if (!qstats)
 		return -1;
-	}
-
-	req = netdev_qstats_get_req_alloc();
-	if (!req) {
-		p_err("failed to allocate qstats request");
-		ret = -1;
-		goto exit_close;
-	}
-
-	if (scope)
-		netdev_qstats_get_req_set_scope(req, scope);
-
-	qstats = netdev_qstats_get_dump(ys, req);
-	netdev_qstats_get_req_free(req);
-	if (!qstats) {
-		p_err("failed to get queue stats: %s", ys->err.msg);
-		ret = -1;
-		goto exit_close;
-	}
 
 	/* Print the stats as returned by the kernel */
 	if (json_output)
@@ -299,9 +313,7 @@ static int do_show(int argc, char **argv)
 		print_plain_qstats(qstats);
 
 	netdev_qstats_get_list_free(qstats);
-exit_close:
-	ynl_sock_destroy(ys);
-	return ret;
+	return 0;
 }
 
 static void compute_stats(__u64 *values, unsigned int count,
@@ -406,10 +418,7 @@ static int cmp_ifindex_type(const void *a, const void *b)
 static int do_balance(int argc, char **argv __attribute__((unused)))
 {
 	struct netdev_qstats_get_list *qstats;
-	struct netdev_qstats_get_req *req;
 	struct netdev_qstats_get_rsp **sorted;
-	struct ynl_error yerr;
-	struct ynl_sock *ys;
 	unsigned int count = 0;
 	unsigned int i, j;
 	int ret = 0;
@@ -419,29 +428,9 @@ static int do_balance(int argc, char **argv __attribute__((unused)))
 		return -1;
 	}
 
-	ys = ynl_sock_create(&ynl_netdev_family, &yerr);
-	if (!ys) {
-		p_err("YNL: %s", yerr.msg);
+	qstats = qstats_dump(NETDEV_QSTATS_SCOPE_QUEUE);
+	if (!qstats)
 		return -1;
-	}
-
-	req = netdev_qstats_get_req_alloc();
-	if (!req) {
-		p_err("failed to allocate qstats request");
-		ret = -1;
-		goto exit_close;
-	}
-
-	/* Always use queue scope for balance analysis */
-	netdev_qstats_get_req_set_scope(req, NETDEV_QSTATS_SCOPE_QUEUE);
-
-	qstats = netdev_qstats_get_dump(ys, req);
-	netdev_qstats_get_req_free(req);
-	if (!qstats) {
-		p_err("failed to get queue stats: %s", ys->err.msg);
-		ret = -1;
-		goto exit_close;
-	}
 
 	/* Count and sort queues */
 	ynl_dump_foreach(qstats, qs)
@@ -576,8 +565,6 @@ static int do_balance(int argc, char **argv __attribute__((unused)))
 	free(sorted);
 exit_free_qstats:
 	netdev_qstats_get_list_free(qstats);
-exit_close:
-	ynl_sock_destroy(ys);
 	return ret;
 }
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v2 3/9] tools: ynltool: add qstats analysis for HW-GRO efficiency / savings
  2026-02-07  0:35 [PATCH net-next v2 0/9] net: stats, tools, driver tests for HW GRO Jakub Kicinski
  2026-02-07  0:35 ` [PATCH net-next v2 1/9] eth: bnxt: gather and report HW-GRO stats Jakub Kicinski
  2026-02-07  0:35 ` [PATCH net-next v2 2/9] tools: ynltool: factor out qstat dumping Jakub Kicinski
@ 2026-02-07  0:35 ` Jakub Kicinski
  2026-02-09  9:43   ` Petr Machata
  2026-02-07  0:35 ` [PATCH net-next v2 4/9] selftests: net: move gro to lib for HW vs SW reuse Jakub Kicinski
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2026-02-07  0:35 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, shuah, willemb,
	petrm, donald.hunter, michael.chan, pavan.chebbi, linux-kselftest,
	Jakub Kicinski

Extend ynltool to compute HW GRO savings metric - how many
packets has HW GRO been able to save the kernel from seeing.

Note that this definition does not actually take into account
whether the segments were or weren't eligible for HW GRO.
If a machine is receiving all-UDP traffic - new metric will show
HW-GRO savings of 0%. Conversely since the super-packet still
counts as a received packet, savings of 100% is not achievable.
Perfect HW-GRO on a machine with 4k MTU and 64kB super-frames
would show ~93.75% savings. With 1.5k MTU we may see up to
~97.8% savings (if my math is right).

Example after 10 sec of iperf on a freshly booted machine
with 1.5k MTU:

  $ ynltool qstats show
  eth0     rx-packets:  40681280               rx-bytes:   61575208437
        rx-alloc-fail:         0      rx-hw-gro-packets:       1225133
                                 rx-hw-gro-wire-packets:      40656633
  $ ynltool qstats hw-gro
  eth0: 96.9% savings

None of the NICs I have access to can report "missed" HW-GRO
opportunities so computing a true "effectiveness" metric
is not possible. One could also argue that effectiveness metric
is inferior in environments where we control both senders and
receivers, the savings metrics will capture both regressions
in receiver's HW GRO effectiveness but also regressions in senders
sending smaller TSO trains. And we care about both. The main
downside is that it's hard to tell at a glance how well the NIC
is doing because the savings will be dependent on traffic patterns.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v2:
 - use %1$s
v1: https://lore.kernel.org/20260205220541.2992807-4-kuba@kernel.org
---
 tools/net/ynl/ynltool/qstats.c | 76 +++++++++++++++++++++++++++++++---
 1 file changed, 71 insertions(+), 5 deletions(-)

diff --git a/tools/net/ynl/ynltool/qstats.c b/tools/net/ynl/ynltool/qstats.c
index d19acab0bf2a..a6c28ba4f25c 100644
--- a/tools/net/ynl/ynltool/qstats.c
+++ b/tools/net/ynl/ynltool/qstats.c
@@ -568,6 +568,65 @@ static int do_balance(int argc, char **argv __attribute__((unused)))
 	return ret;
 }
 
+static int do_hw_gro(int argc, char **argv __attribute__((unused)))
+{
+	struct netdev_qstats_get_list *qstats;
+
+	if (argc > 0) {
+		p_err("hw-gro command takes no arguments");
+		return -1;
+	}
+
+	qstats = qstats_dump(0);
+	if (!qstats)
+		return -1;
+
+	if (json_output)
+		jsonw_start_array(json_wtr);
+
+	ynl_dump_foreach(qstats, qs) {
+		char ifname[IF_NAMESIZE];
+		const char *name;
+		double savings;
+
+		if (!qs->_present.rx_packets ||
+		    !qs->_present.rx_hw_gro_packets ||
+		    !qs->_present.rx_hw_gro_wire_packets)
+			continue;
+
+		if (!qs->rx_packets)
+			continue;
+
+		/* How many skbs did we avoid allocating thanks to HW GRO */
+		savings = (double)(qs->rx_hw_gro_wire_packets -
+				   qs->rx_hw_gro_packets) /
+			qs->rx_packets * 100.0;
+
+		name = if_indextoname(qs->ifindex, ifname);
+
+		if (json_output) {
+			jsonw_start_object(json_wtr);
+			jsonw_uint_field(json_wtr, "ifindex", qs->ifindex);
+			if (name)
+				jsonw_string_field(json_wtr, "ifname", name);
+			jsonw_float_field(json_wtr, "savings", savings);
+			jsonw_end_object(json_wtr);
+		} else {
+			if (name)
+				printf("%s", name);
+			else
+				printf("ifindex:%u", qs->ifindex);
+			printf(": %.1f%% savings\n", savings);
+		}
+	}
+
+	if (json_output)
+		jsonw_end_array(json_wtr);
+
+	netdev_qstats_get_list_free(qstats);
+	return 0;
+}
+
 static int do_help(int argc __attribute__((unused)),
 		   char **argv __attribute__((unused)))
 {
@@ -577,9 +636,10 @@ static int do_help(int argc __attribute__((unused)),
 	}
 
 	fprintf(stderr,
-		"Usage: %s qstats { COMMAND | help }\n"
-		"       %s qstats [ show ] [ OPTIONS ]\n"
-		"       %s qstats balance\n"
+		"Usage: %1$s qstats { COMMAND | help }\n"
+		"       %1$s qstats [ show ] [ OPTIONS ]\n"
+		"       %1$s qstats balance\n"
+		"       %1$s qstats hw-gro\n"
 		"\n"
 		"       OPTIONS := { scope queue | group-by { device | queue } }\n"
 		"\n"
@@ -588,9 +648,14 @@ static int do_help(int argc __attribute__((unused)),
 		"       show scope queue      - Display per-queue statistics\n"
 		"       show group-by device  - Display device-aggregated statistics (default)\n"
 		"       show group-by queue   - Display per-queue statistics\n"
-		"       balance               - Analyze traffic distribution balance.\n"
+		"\n"
+		"  Analysis:\n"
+		"       balance               - Traffic distribution between queues.\n"
+		"       hw-gro                - HW GRO effectiveness analysis\n"
+		"                               - savings - delta between packets received\n"
+		"                                 on the wire and packets seen by the kernel.\n"
 		"",
-		bin_name, bin_name, bin_name);
+		bin_name);
 
 	return 0;
 }
@@ -598,6 +663,7 @@ static int do_help(int argc __attribute__((unused)),
 static const struct cmd qstats_cmds[] = {
 	{ "show",	do_show },
 	{ "balance",	do_balance },
+	{ "hw-gro",	do_hw_gro },
 	{ "help",	do_help },
 	{ 0 }
 };
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v2 4/9] selftests: net: move gro to lib for HW vs SW reuse
  2026-02-07  0:35 [PATCH net-next v2 0/9] net: stats, tools, driver tests for HW GRO Jakub Kicinski
                   ` (2 preceding siblings ...)
  2026-02-07  0:35 ` [PATCH net-next v2 3/9] tools: ynltool: add qstats analysis for HW-GRO efficiency / savings Jakub Kicinski
@ 2026-02-07  0:35 ` Jakub Kicinski
  2026-02-09  2:36   ` Willem de Bruijn
  2026-02-07  0:35 ` [PATCH net-next v2 5/9] selftests: drv-net: give HW stats sync time extra 25% of margin Jakub Kicinski
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2026-02-07  0:35 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, shuah, willemb,
	petrm, donald.hunter, michael.chan, pavan.chebbi, linux-kselftest,
	Jakub Kicinski

The gro.c packet sender is used for SW testing but bulk of incoming
new tests will be HW-specific. So it's better to put them under
drivers/net/hw/, to avoid tip-toeing around netdevsim. Move gro.c
to lib so we can reuse it.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 tools/testing/selftests/drivers/net/Makefile           | 1 -
 tools/testing/selftests/net/lib/Makefile               | 1 +
 tools/testing/selftests/{drivers/net => net/lib}/gro.c | 2 +-
 tools/testing/selftests/drivers/net/.gitignore         | 1 -
 tools/testing/selftests/drivers/net/gro.py             | 2 +-
 tools/testing/selftests/net/lib/.gitignore             | 1 +
 6 files changed, 4 insertions(+), 4 deletions(-)
 rename tools/testing/selftests/{drivers/net => net/lib}/gro.c (99%)

diff --git a/tools/testing/selftests/drivers/net/Makefile b/tools/testing/selftests/drivers/net/Makefile
index 8154d6d429d3..7c7fa75b80c2 100644
--- a/tools/testing/selftests/drivers/net/Makefile
+++ b/tools/testing/selftests/drivers/net/Makefile
@@ -6,7 +6,6 @@ TEST_INCLUDES := $(wildcard lib/py/*.py) \
 		 ../../net/lib.sh \
 
 TEST_GEN_FILES := \
-	gro \
 	napi_id_helper \
 # end of TEST_GEN_FILES
 
diff --git a/tools/testing/selftests/net/lib/Makefile b/tools/testing/selftests/net/lib/Makefile
index 5339f56329e1..ff83603397d0 100644
--- a/tools/testing/selftests/net/lib/Makefile
+++ b/tools/testing/selftests/net/lib/Makefile
@@ -14,6 +14,7 @@ TEST_FILES := \
 TEST_GEN_FILES := \
 	$(patsubst %.c,%.o,$(wildcard *.bpf.c)) \
 	csum \
+	gro \
 	xdp_helper \
 # end of TEST_GEN_FILES
 
diff --git a/tools/testing/selftests/drivers/net/gro.c b/tools/testing/selftests/net/lib/gro.c
similarity index 99%
rename from tools/testing/selftests/drivers/net/gro.c
rename to tools/testing/selftests/net/lib/gro.c
index 3c0745b68bfa..02e29509fbea 100644
--- a/tools/testing/selftests/drivers/net/gro.c
+++ b/tools/testing/selftests/net/lib/gro.c
@@ -77,7 +77,7 @@
 #include <unistd.h>
 
 #include "kselftest.h"
-#include "../../net/lib/ksft.h"
+#include "ksft.h"
 
 #define DPORT 8000
 #define SPORT 1500
diff --git a/tools/testing/selftests/drivers/net/.gitignore b/tools/testing/selftests/drivers/net/.gitignore
index 3633c7a3ed65..585ecb4d5dc4 100644
--- a/tools/testing/selftests/drivers/net/.gitignore
+++ b/tools/testing/selftests/drivers/net/.gitignore
@@ -1,4 +1,3 @@
 # SPDX-License-Identifier: GPL-2.0-only
-gro
 napi_id_helper
 psp_responder
diff --git a/tools/testing/selftests/drivers/net/gro.py b/tools/testing/selftests/drivers/net/gro.py
index cbc1b19dbc91..2da53686354f 100755
--- a/tools/testing/selftests/drivers/net/gro.py
+++ b/tools/testing/selftests/drivers/net/gro.py
@@ -117,7 +117,7 @@ from lib.py import ksft_variants
     """ Setup hardware loopback mode for GRO testing. """
 
     if not hasattr(cfg, "bin_remote"):
-        cfg.bin_local = cfg.test_dir / "gro"
+        cfg.bin_local = cfg.net_lib_dir / "gro"
         cfg.bin_remote = cfg.remote.deploy(cfg.bin_local)
 
     if not hasattr(cfg, "feat"):
diff --git a/tools/testing/selftests/net/lib/.gitignore b/tools/testing/selftests/net/lib/.gitignore
index bbc97d6bf556..6cd2b762af5d 100644
--- a/tools/testing/selftests/net/lib/.gitignore
+++ b/tools/testing/selftests/net/lib/.gitignore
@@ -1,3 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
 csum
+gro
 xdp_helper
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v2 5/9] selftests: drv-net: give HW stats sync time extra 25% of margin
  2026-02-07  0:35 [PATCH net-next v2 0/9] net: stats, tools, driver tests for HW GRO Jakub Kicinski
                   ` (3 preceding siblings ...)
  2026-02-07  0:35 ` [PATCH net-next v2 4/9] selftests: net: move gro to lib for HW vs SW reuse Jakub Kicinski
@ 2026-02-07  0:35 ` Jakub Kicinski
  2026-02-09  2:37   ` Willem de Bruijn
  2026-02-07  0:35 ` [PATCH net-next v2 6/9] selftests: drv-net: gro: use SO_TXTIME to schedule packets together Jakub Kicinski
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2026-02-07  0:35 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, shuah, willemb,
	petrm, donald.hunter, michael.chan, pavan.chebbi, linux-kselftest,
	Jakub Kicinski

There are transient failures for devices which update stats
periodically, especially if it's the FW DMA'ing the stats
rather than host periodic work querying the FW. Wait 25%
longer than strictly necessary.

For devices which don't report stats-block-usecs we retain
25 msec as the default wait time (0.025sec == 20,000usec * 1.25).

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 tools/testing/selftests/drivers/net/lib/py/env.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/lib/py/env.py b/tools/testing/selftests/drivers/net/lib/py/env.py
index 41cc248ac848..39a98eb2592e 100644
--- a/tools/testing/selftests/drivers/net/lib/py/env.py
+++ b/tools/testing/selftests/drivers/net/lib/py/env.py
@@ -285,7 +285,7 @@ from .remote import Remote
                 if "Operation not supported" not in e.cmd.stderr:
                     raise
 
-            self._stats_settle_time = 0.025 + \
-                data.get('stats-block-usecs', 0) / 1000 / 1000
+            self._stats_settle_time = \
+                1.25 * data.get('stats-block-usecs', 20000) / 1000 / 1000
 
         time.sleep(self._stats_settle_time)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v2 6/9] selftests: drv-net: gro: use SO_TXTIME to schedule packets together
  2026-02-07  0:35 [PATCH net-next v2 0/9] net: stats, tools, driver tests for HW GRO Jakub Kicinski
                   ` (4 preceding siblings ...)
  2026-02-07  0:35 ` [PATCH net-next v2 5/9] selftests: drv-net: give HW stats sync time extra 25% of margin Jakub Kicinski
@ 2026-02-07  0:35 ` Jakub Kicinski
  2026-02-09  2:39   ` Willem de Bruijn
  2026-02-07  0:35 ` [PATCH net-next v2 7/9] selftests: drv-net: gro: test GRO stats Jakub Kicinski
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2026-02-07  0:35 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, shuah, willemb,
	petrm, donald.hunter, michael.chan, pavan.chebbi, linux-kselftest,
	Jakub Kicinski

Longer packet sequence tests are quite flaky when the test is run
over a real network. Try to avoid at least the jitter on the sender
side by scheduling all the packets to be sent at once using SO_TXTIME.
Use hardcoded tx time of 5msec in the future. In my test increasing
this time past 2msec makes no difference so 5msec is plenty of margin.
Since we now expect more output buffering make sure to raise SNDBUF.

Experimenting with long sequences I see frequent failures when sending
200 packets, only 50-100 packets get coalesced. With this change
up to 1000 packets get coalesced relatively reliably.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v2:
 - only enable TXTIME when we are sending a single flow
v1: https://lore.kernel.org/20260205220541.2992807-7-kuba@kernel.org
---
 tools/testing/selftests/net/lib/gro.c | 57 +++++++++++++++++++++++++--
 1 file changed, 54 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/net/lib/gro.c b/tools/testing/selftests/net/lib/gro.c
index 02e29509fbea..5b3f8fca08e6 100644
--- a/tools/testing/selftests/net/lib/gro.c
+++ b/tools/testing/selftests/net/lib/gro.c
@@ -63,6 +63,7 @@
 #include <linux/filter.h>
 #include <linux/if_packet.h>
 #include <linux/ipv6.h>
+#include <linux/net_tstamp.h>
 #include <net/ethernet.h>
 #include <net/if.h>
 #include <netinet/in.h>
@@ -74,6 +75,7 @@
 #include <stdio.h>
 #include <stdarg.h>
 #include <string.h>
+#include <time.h>
 #include <unistd.h>
 
 #include "kselftest.h"
@@ -123,6 +125,9 @@ static int tcp_offset = -1;
 static int total_hdr_len = -1;
 static int ethhdr_proto = -1;
 static bool ipip;
+static uint64_t txtime_ns;
+
+#define TXTIME_DELAY_MS 5
 
 static void vlog(const char *fmt, ...)
 {
@@ -330,13 +335,37 @@ static void fill_transportlayer(void *buf, int seq_offset, int ack_offset,
 
 static void write_packet(int fd, char *buf, int len, struct sockaddr_ll *daddr)
 {
+	char control[CMSG_SPACE(sizeof(uint64_t))];
+	struct msghdr msg = {};
+	struct iovec iov = {};
+	struct cmsghdr *cm;
 	int ret = -1;
 
-	ret = sendto(fd, buf, len, 0, (struct sockaddr *)daddr, sizeof(*daddr));
+	iov.iov_base = buf;
+	iov.iov_len = len;
+
+	msg.msg_iov = &iov;
+	msg.msg_iovlen = 1;
+	msg.msg_name = daddr;
+	msg.msg_namelen = sizeof(*daddr);
+
+	if (txtime_ns) {
+		memset(control, 0, sizeof(control));
+		msg.msg_control = control;
+		msg.msg_controllen = sizeof(control);
+
+		cm = CMSG_FIRSTHDR(&msg);
+		cm->cmsg_level = SOL_SOCKET;
+		cm->cmsg_type = SCM_TXTIME;
+		cm->cmsg_len = CMSG_LEN(sizeof(uint64_t));
+		memcpy(CMSG_DATA(cm), &txtime_ns, sizeof(txtime_ns));
+	}
+
+	ret = sendmsg(fd, &msg, 0);
 	if (ret == -1)
-		error(1, errno, "sendto failure");
+		error(1, errno, "sendmsg failure");
 	if (ret != len)
-		error(1, errno, "sendto wrong length");
+		error(1, 0, "sendmsg wrong length: %d vs %d", ret, len);
 }
 
 static void create_packet(void *buf, int seq_offset, int ack_offset,
@@ -1058,6 +1087,7 @@ static void check_recv_pkts(int fd, int *correct_payload,
 
 static void gro_sender(void)
 {
+	int bufsize = 4 * 1024 * 1024; /* 4 MB */
 	const int fin_delay_us = 100 * 1000;
 	static char fin_pkt[MAX_HDR_LEN];
 	struct sockaddr_ll daddr = {};
@@ -1067,6 +1097,27 @@ static void gro_sender(void)
 	if (txfd < 0)
 		error(1, errno, "socket creation");
 
+	if (setsockopt(txfd, SOL_SOCKET, SO_SNDBUF, &bufsize, sizeof(bufsize)))
+		error(1, errno, "cannot set sndbuf size, setsockopt failed");
+
+	/* Enable SO_TXTIME unless test case generates more than one flow
+	 * SO_TXTIME could result in qdisc layer sorting the packets at sender.
+	 */
+	if (true) {
+		struct sock_txtime so_txtime = { .clockid = CLOCK_MONOTONIC, };
+		struct timespec ts;
+
+		if (setsockopt(txfd, SOL_SOCKET, SO_TXTIME,
+			       &so_txtime, sizeof(so_txtime)))
+			error(1, errno, "setsockopt SO_TXTIME");
+
+		if (clock_gettime(CLOCK_MONOTONIC, &ts))
+			error(1, errno, "clock_gettime");
+
+		txtime_ns = ts.tv_sec * 1000000000ULL + ts.tv_nsec;
+		txtime_ns += TXTIME_DELAY_MS * 1000000ULL;
+	}
+
 	memset(&daddr, 0, sizeof(daddr));
 	daddr.sll_ifindex = if_nametoindex(ifname);
 	if (daddr.sll_ifindex == 0)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v2 7/9] selftests: drv-net: gro: test GRO stats
  2026-02-07  0:35 [PATCH net-next v2 0/9] net: stats, tools, driver tests for HW GRO Jakub Kicinski
                   ` (5 preceding siblings ...)
  2026-02-07  0:35 ` [PATCH net-next v2 6/9] selftests: drv-net: gro: use SO_TXTIME to schedule packets together Jakub Kicinski
@ 2026-02-07  0:35 ` Jakub Kicinski
  2026-02-07  0:35 ` [PATCH net-next v2 8/9] selftests: drv-net: gro: add test for packet ordering Jakub Kicinski
  2026-02-07  0:35 ` [PATCH net-next v2 9/9] selftests: drv-net: gro: add a test for GRO depth Jakub Kicinski
  8 siblings, 0 replies; 22+ messages in thread
From: Jakub Kicinski @ 2026-02-07  0:35 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, shuah, willemb,
	petrm, donald.hunter, michael.chan, pavan.chebbi, linux-kselftest,
	Jakub Kicinski

Test accuracy of GRO stats. We want to cover two potentially tricky
cases:
 - single segment GRO
 - packets which were eligible but didn't get GRO'd

The first case is trivial, teach gro.c to send one packet, and check
GRO stats didn't move.

Second case requires gro.c to send a lot of flows expecting the NIC
to run out of GRO flow capacity.

To avoid system traffic noise we steer the packets to a dedicated
queue and operate on qstat.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 .../testing/selftests/drivers/net/hw/Makefile |   1 +
 tools/testing/selftests/net/lib/gro.c         | 163 ++++++++++-
 .../selftests/drivers/net/hw/gro_hw.py        | 271 ++++++++++++++++++
 3 files changed, 433 insertions(+), 2 deletions(-)
 create mode 100755 tools/testing/selftests/drivers/net/hw/gro_hw.py

diff --git a/tools/testing/selftests/drivers/net/hw/Makefile b/tools/testing/selftests/drivers/net/hw/Makefile
index a64140333a46..19e5ca223273 100644
--- a/tools/testing/selftests/drivers/net/hw/Makefile
+++ b/tools/testing/selftests/drivers/net/hw/Makefile
@@ -26,6 +26,7 @@ TEST_PROGS = \
 	ethtool_extended_state.sh \
 	ethtool_mm.sh \
 	ethtool_rmon.sh \
+	gro_hw.py \
 	hw_stats_l3.sh \
 	hw_stats_l3_gre.sh \
 	iou-zcrx.py \
diff --git a/tools/testing/selftests/net/lib/gro.c b/tools/testing/selftests/net/lib/gro.c
index 5b3f8fca08e6..41794b9f6f8a 100644
--- a/tools/testing/selftests/net/lib/gro.c
+++ b/tools/testing/selftests/net/lib/gro.c
@@ -43,6 +43,10 @@
  *   - large_max: exceeding max size
  *   - large_rem: remainder handling
  *
+ * single, capacity:
+ *  Boring cases used to test coalescing machinery itself and stats
+ *  more than protocol behavior.
+ *
  * MSS is defined as 4096 - header because if it is too small
  * (i.e. 1500 MTU - header), it will result in many packets,
  * increasing the "large" test case's flakiness. This is because
@@ -126,6 +130,9 @@ static int total_hdr_len = -1;
 static int ethhdr_proto = -1;
 static bool ipip;
 static uint64_t txtime_ns;
+static int num_flows = 4;
+
+#define CAPACITY_PAYLOAD_LEN 200
 
 #define TXTIME_DELAY_MS 5
 
@@ -389,6 +396,45 @@ static void create_packet(void *buf, int seq_offset, int ack_offset,
 	fill_datalinklayer(buf);
 }
 
+static void create_capacity_packet(void *buf, int flow_id, int pkt_idx, int psh)
+{
+	int seq_offset = pkt_idx * CAPACITY_PAYLOAD_LEN;
+	struct tcphdr *tcph;
+
+	create_packet(buf, seq_offset, 0, CAPACITY_PAYLOAD_LEN, 0);
+
+	/* Customize for this flow id */
+	memset(buf + total_hdr_len, 'a' + flow_id, CAPACITY_PAYLOAD_LEN);
+
+	tcph = buf + tcp_offset;
+	tcph->source = htons(SPORT + flow_id);
+	tcph->psh = psh;
+	tcph->check = 0;
+	tcph->check = tcp_checksum(tcph, CAPACITY_PAYLOAD_LEN);
+}
+
+/* Send a capacity test, 2 packets per flow, all first packets then all second:
+ *  A1 B1 C1 D1 ... A2 B2 C2 D2 ...
+ */
+static void send_capacity(int fd, struct sockaddr_ll *daddr)
+{
+	static char buf[MAX_HDR_LEN + CAPACITY_PAYLOAD_LEN];
+	int pkt_size = total_hdr_len + CAPACITY_PAYLOAD_LEN;
+	int i;
+
+	/* Send first packet of each flow (no PSH) */
+	for (i = 0; i < num_flows; i++) {
+		create_capacity_packet(buf, i, 0, 0);
+		write_packet(fd, buf, pkt_size, daddr);
+	}
+
+	/* Send second packet of each flow (with PSH to flush) */
+	for (i = 0; i < num_flows; i++) {
+		create_capacity_packet(buf, i, 1, 1);
+		write_packet(fd, buf, pkt_size, daddr);
+	}
+}
+
 #ifndef TH_CWR
 #define TH_CWR 0x80
 #endif
@@ -1085,6 +1131,93 @@ static void check_recv_pkts(int fd, int *correct_payload,
 	printf("Test succeeded\n\n");
 }
 
+static void check_capacity_pkts(int fd)
+{
+	static char buffer[IP_MAXPACKET + ETH_HLEN + 1];
+	struct iphdr *iph = (struct iphdr *)(buffer + ETH_HLEN);
+	struct ipv6hdr *ip6h = (struct ipv6hdr *)(buffer + ETH_HLEN);
+	const char *fail_reason = NULL;
+	int flow_order[num_flows * 2];
+	int coalesced[num_flows];
+	struct tcphdr *tcph;
+	int ip_ext_len = 0;
+	int total_data = 0;
+	int pkt_size = -1;
+	int data_len = 0;
+	int num_pkt = 0;
+	int num_coal = 0;
+	int flow_id;
+	int sport;
+
+	memset(coalesced, 0, sizeof(coalesced));
+	memset(flow_order, -1, sizeof(flow_order));
+
+	while (total_data < num_flows * CAPACITY_PAYLOAD_LEN * 2) {
+		ip_ext_len = 0;
+		pkt_size = recv(fd, buffer, IP_MAXPACKET + ETH_HLEN + 1, 0);
+		if (pkt_size < 0)
+			recv_error(fd, errno);
+
+		if (iph->version == 4)
+			ip_ext_len = (iph->ihl - 5) * 4;
+		else if (ip6h->version == 6 && ip6h->nexthdr != IPPROTO_TCP)
+			ip_ext_len = MIN_EXTHDR_SIZE;
+
+		tcph = (struct tcphdr *)(buffer + tcp_offset + ip_ext_len);
+
+		/* FIN packet terminates reception */
+		if (tcph->fin)
+			break;
+
+		sport = ntohs(tcph->source);
+		flow_id = sport - SPORT;
+
+		if (flow_id < 0 || flow_id >= num_flows) {
+			vlog("Invalid flow_id %d from sport %d\n",
+			     flow_id, sport);
+			fail_reason = fail_reason ?: "invalid packet";
+			continue;
+		}
+
+		/* Calculate payload length */
+		if (pkt_size == ETH_ZLEN && iph->version == 4) {
+			data_len = ntohs(iph->tot_len)
+				- sizeof(struct tcphdr) - sizeof(struct iphdr);
+		} else {
+			data_len = pkt_size - total_hdr_len - ip_ext_len;
+		}
+
+		flow_order[num_pkt] = flow_id;
+		coalesced[flow_id] = data_len;
+
+		if (data_len == CAPACITY_PAYLOAD_LEN * 2) {
+			num_coal++;
+		} else {
+			vlog("Pkt %d: flow %d, sport %d, len %d (expected %d)\n",
+			     num_pkt, flow_id, sport, data_len,
+			     CAPACITY_PAYLOAD_LEN * 2);
+			fail_reason = fail_reason ?: "not coalesced";
+		}
+
+		num_pkt++;
+		total_data += data_len;
+	}
+
+	if (!fail_reason) {
+		vlog("All %d flows coalesced correctly\n", num_flows);
+		printf("Test succeeded\n\n");
+	} else {
+		printf("FAILED\n");
+	}
+
+	/* Always print stats for external validation */
+	printf("STATS: received=%d wire=%d coalesced=%d\n",
+	       num_pkt, num_pkt + num_coal, num_coal);
+
+	if (fail_reason)
+		error(1, 0, "capacity test failed %s", fail_reason);
+}
+
 static void gro_sender(void)
 {
 	int bufsize = 4 * 1024 * 1024; /* 4 MB */
@@ -1103,7 +1236,7 @@ static void gro_sender(void)
 	/* Enable SO_TXTIME unless test case generates more than one flow
 	 * SO_TXTIME could result in qdisc layer sorting the packets at sender.
 	 */
-	if (true) {
+	if (strcmp(testname, "single") && strcmp(testname, "capacity")) {
 		struct sock_txtime so_txtime = { .clockid = CLOCK_MONOTONIC, };
 		struct timespec ts;
 
@@ -1250,6 +1383,19 @@ static void gro_sender(void)
 
 		send_large(txfd, &daddr, remainder + 1);
 		write_packet(txfd, fin_pkt, total_hdr_len, &daddr);
+
+	/* machinery sub-tests */
+	} else if (strcmp(testname, "single") == 0) {
+		static char buf[MAX_HDR_LEN + PAYLOAD_LEN];
+
+		create_packet(buf, 0, 0, PAYLOAD_LEN, 0);
+		write_packet(txfd, buf, total_hdr_len + PAYLOAD_LEN, &daddr);
+		write_packet(txfd, fin_pkt, total_hdr_len, &daddr);
+	} else if (strcmp(testname, "capacity") == 0) {
+		send_capacity(txfd, &daddr);
+		usleep(fin_delay_us);
+		write_packet(txfd, fin_pkt, total_hdr_len, &daddr);
+
 	} else {
 		error(1, 0, "Unknown testcase: %s", testname);
 	}
@@ -1445,6 +1591,15 @@ static void gro_receiver(void)
 		correct_payload[2] = remainder + 1;
 		printf("last segment sent individually: ");
 		check_recv_pkts(rxfd, correct_payload, 3);
+
+	/* machinery sub-tests */
+	} else if (strcmp(testname, "single") == 0) {
+		printf("single data packet: ");
+		correct_payload[0] = PAYLOAD_LEN;
+		check_recv_pkts(rxfd, correct_payload, 1);
+	} else if (strcmp(testname, "capacity") == 0) {
+		check_capacity_pkts(rxfd);
+
 	} else {
 		error(1, 0, "Test case error: unknown testname %s", testname);
 	}
@@ -1462,6 +1617,7 @@ static void parse_args(int argc, char **argv)
 		{ "ipv4", no_argument, NULL, '4' },
 		{ "ipv6", no_argument, NULL, '6' },
 		{ "ipip", no_argument, NULL, 'e' },
+		{ "num-flows", required_argument, NULL, 'n' },
 		{ "rx", no_argument, NULL, 'r' },
 		{ "saddr", required_argument, NULL, 's' },
 		{ "smac", required_argument, NULL, 'S' },
@@ -1471,7 +1627,7 @@ static void parse_args(int argc, char **argv)
 	};
 	int c;
 
-	while ((c = getopt_long(argc, argv, "46d:D:ei:rs:S:t:v", opts, NULL)) != -1) {
+	while ((c = getopt_long(argc, argv, "46d:D:ei:n:rs:S:t:v", opts, NULL)) != -1) {
 		switch (c) {
 		case '4':
 			proto = PF_INET;
@@ -1495,6 +1651,9 @@ static void parse_args(int argc, char **argv)
 		case 'i':
 			ifname = optarg;
 			break;
+		case 'n':
+			num_flows = atoi(optarg);
+			break;
 		case 'r':
 			tx_socket = false;
 			break;
diff --git a/tools/testing/selftests/drivers/net/hw/gro_hw.py b/tools/testing/selftests/drivers/net/hw/gro_hw.py
new file mode 100755
index 000000000000..3bca19e8f339
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/hw/gro_hw.py
@@ -0,0 +1,271 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""
+HW GRO tests focusing on device machinery like stats, rather than protocol
+processing.
+"""
+
+import glob
+import re
+
+from lib.py import ksft_run, ksft_exit, ksft_pr
+from lib.py import ksft_eq, ksft_ge
+from lib.py import NetDrvEpEnv, NetdevFamily
+from lib.py import KsftSkipEx
+from lib.py import bkg, cmd, defer, ethtool, ip
+
+
+# gro.c uses hardcoded DPORT=8000
+GRO_DPORT = 8000
+
+
+def _get_queue_stats(cfg, queue_id):
+    """Get stats for a specific Rx queue."""
+    cfg.wait_hw_stats_settle()
+    data = cfg.netnl.qstats_get({"ifindex": cfg.ifindex, "scope": ["queue"]},
+                                dump=True)
+    for q in data:
+        if q.get('queue-type') == 'rx' and q.get('queue-id') == queue_id:
+            return q
+    return {}
+
+
+def _resolve_dmac(cfg, ipver):
+    """Find the destination MAC address for sending packets."""
+    attr = "dmac" + ipver
+    if hasattr(cfg, attr):
+        return getattr(cfg, attr)
+
+    route = ip(f"-{ipver} route get {cfg.addr_v[ipver]}",
+               json=True, host=cfg.remote)[0]
+    gw = route.get("gateway")
+    if not gw:
+        setattr(cfg, attr, cfg.dev['address'])
+        return getattr(cfg, attr)
+
+    cmd(f"ping -c1 -W0 -I{cfg.remote_ifname} {gw}", host=cfg.remote)
+    neigh = ip(f"neigh get {gw} dev {cfg.remote_ifname}",
+               json=True, host=cfg.remote)[0]
+    setattr(cfg, attr, neigh['lladdr'])
+    return getattr(cfg, attr)
+
+
+def _setup_isolated_queue(cfg):
+    """Set up an isolated queue for testing using ntuple filter.
+
+    Remove queue 1 from the default RSS context and steer test traffic to it.
+    """
+    test_queue = 1
+
+    qcnt = len(glob.glob(f"/sys/class/net/{cfg.ifname}/queues/rx-*"))
+    if qcnt < 2:
+        raise KsftSkipEx(f"Need at least 2 queues, have {qcnt}")
+
+    # Remove queue 1 from default RSS context by setting its weight to 0
+    weights = ["1"] * qcnt
+    weights[test_queue] = "0"
+    ethtool(f"-X {cfg.ifname} weight " + " ".join(weights))
+    defer(ethtool, f"-X {cfg.ifname} default")
+
+    # Set up ntuple filter to steer our test traffic to the isolated queue
+    flow  = f"flow-type tcp{cfg.addr_ipver} "
+    flow += f"dst-ip {cfg.addr} dst-port {GRO_DPORT} action {test_queue}"
+    output = ethtool(f"-N {cfg.ifname} {flow}").stdout
+    ntuple_id = int(output.split()[-1])
+    defer(ethtool, f"-N {cfg.ifname} delete {ntuple_id}")
+
+    return test_queue
+
+
+def _run_gro_test(cfg, test_name, num_flows=None, ignore_fail=False):
+    """Run gro binary with given test and return output."""
+    if not hasattr(cfg, "bin_remote"):
+        cfg.bin_local = cfg.net_lib_dir / "gro"
+        cfg.bin_remote = cfg.remote.deploy(cfg.bin_local)
+
+    ipver = cfg.addr_ipver
+    protocol = f"--ipv{ipver}"
+    dmac = _resolve_dmac(cfg, ipver)
+
+    base_args = [
+        protocol,
+        f"--dmac {dmac}",
+        f"--smac {cfg.remote_dev['address']}",
+        f"--daddr {cfg.addr}",
+        f"--saddr {cfg.remote_addr_v[ipver]}",
+        f"--test {test_name}",
+    ]
+    if num_flows:
+        base_args.append(f"--num-flows {num_flows}")
+
+    args = " ".join(base_args)
+
+    rx_cmd = f"{cfg.bin_local} {args} --rx --iface {cfg.ifname}"
+    tx_cmd = f"{cfg.bin_remote} {args} --iface {cfg.remote_ifname}"
+
+    with bkg(rx_cmd, ksft_ready=True, exit_wait=True, fail=False) as rx_proc:
+        cmd(tx_cmd, host=cfg.remote)
+
+    if not ignore_fail:
+        ksft_eq(rx_proc.ret, 0)
+        if rx_proc.ret != 0:
+            ksft_pr(rx_proc)
+
+    return rx_proc.stdout
+
+
+def _require_hw_gro_stats(cfg, queue_id):
+    """Check if device reports HW GRO stats for the queue."""
+    stats = _get_queue_stats(cfg, queue_id)
+    required = ['rx-packets', 'rx-hw-gro-packets', 'rx-hw-gro-wire-packets']
+    for stat in required:
+        if stat not in stats:
+            raise KsftSkipEx(f"Driver does not report '{stat}' via qstats")
+
+
+def _set_ethtool_feat(cfg, current, feats):
+    """Set ethtool features with defer to restore original state."""
+    s2n = {True: "on", False: "off"}
+
+    new = ["-K", cfg.ifname]
+    old = ["-K", cfg.ifname]
+    no_change = True
+    for name, state in feats.items():
+        new += [name, s2n[state]]
+        old += [name, s2n[current[name]["active"]]]
+
+        if current[name]["active"] != state:
+            no_change = False
+            if current[name]["fixed"]:
+                raise KsftSkipEx(f"Device does not support {name}")
+    if no_change:
+        return
+
+    eth_cmd = ethtool(" ".join(new))
+    defer(ethtool, " ".join(old))
+
+    # If ethtool printed something kernel must have modified some features
+    if eth_cmd.stdout:
+        ksft_pr(eth_cmd)
+
+
+def _setup_hw_gro(cfg):
+    """Enable HW GRO on the device, disabling SW GRO."""
+    feat = ethtool(f"-k {cfg.ifname}", json=True)[0]
+
+    # Try to disable SW GRO and enable HW GRO
+    _set_ethtool_feat(cfg, feat,
+                      {"generic-receive-offload": False,
+                       "rx-gro-hw": True,
+                       "large-receive-offload": False})
+
+    # Some NICs treat HW GRO as a GRO sub-feature so disabling GRO
+    # will also clear HW GRO. Use a hack of installing XDP generic
+    # to skip SW GRO, even when enabled.
+    feat = ethtool(f"-k {cfg.ifname}", json=True)[0]
+    if not feat["rx-gro-hw"]["active"]:
+        ksft_pr("Driver clears HW GRO when SW GRO is cleared, using generic XDP workaround")
+        prog = cfg.net_lib_dir / "xdp_dummy.bpf.o"
+        ip(f"link set dev {cfg.ifname} xdpgeneric obj {prog} sec xdp")
+        defer(ip, f"link set dev {cfg.ifname} xdpgeneric off")
+
+        # Attaching XDP may change features, fetch the latest state
+        feat = ethtool(f"-k {cfg.ifname}", json=True)[0]
+
+        _set_ethtool_feat(cfg, feat,
+                          {"generic-receive-offload": True,
+                           "rx-gro-hw": True,
+                           "large-receive-offload": False})
+
+
+def _check_gro_stats(cfg, test_queue, stats_before,
+                     expect_rx, expect_gro, expect_wire):
+    """Validate GRO stats against expected values."""
+    stats_after = _get_queue_stats(cfg, test_queue)
+
+    rx_delta = (stats_after.get('rx-packets', 0) -
+                stats_before.get('rx-packets', 0))
+    gro_delta = (stats_after.get('rx-hw-gro-packets', 0) -
+                 stats_before.get('rx-hw-gro-packets', 0))
+    wire_delta = (stats_after.get('rx-hw-gro-wire-packets', 0) -
+                  stats_before.get('rx-hw-gro-wire-packets', 0))
+
+    ksft_eq(rx_delta, expect_rx, comment="rx-packets")
+    ksft_eq(gro_delta, expect_gro, comment="rx-hw-gro-packets")
+    ksft_eq(wire_delta, expect_wire, comment="rx-hw-gro-wire-packets")
+
+
+def test_gro_stats_single(cfg):
+    """
+    Test that a single packet doesn't affect GRO stats.
+
+    Send a single packet that cannot be coalesced (nothing to coalesce with).
+    GRO stats should not increase since no coalescing occurred.
+    rx-packets should increase by 2 (1 data + 1 FIN).
+    """
+    _setup_hw_gro(cfg)
+
+    test_queue = _setup_isolated_queue(cfg)
+    _require_hw_gro_stats(cfg, test_queue)
+
+    stats_before = _get_queue_stats(cfg, test_queue)
+
+    _run_gro_test(cfg, "single")
+
+    # 1 data + 1 FIN = 2 rx-packets, no coalescing
+    _check_gro_stats(cfg, test_queue, stats_before,
+                     expect_rx=2, expect_gro=0, expect_wire=0)
+
+
+def test_gro_stats_full(cfg):
+    """
+    Test GRO stats when overwhelming HW GRO capacity.
+
+    Send 500 flows to exceed HW GRO flow capacity on a single queue.
+    This should result in some packets not being coalesced.
+    Validate that qstats match what gro.c observed.
+    """
+    _setup_hw_gro(cfg)
+
+    test_queue = _setup_isolated_queue(cfg)
+    _require_hw_gro_stats(cfg, test_queue)
+
+    num_flows = 500
+    stats_before = _get_queue_stats(cfg, test_queue)
+
+    # Run capacity test - will likely fail because not all packets coalesce
+    output = _run_gro_test(cfg, "capacity", num_flows=num_flows,
+                           ignore_fail=True)
+
+    # Parse gro.c output: "STATS: received=X wire=Y coalesced=Z"
+    match = re.search(r'STATS: received=(\d+) wire=(\d+) coalesced=(\d+)',
+                      output)
+    if not match:
+        raise KsftSkipEx(f"Could not parse gro.c output: {output}")
+
+    rx_frames = int(match.group(2))
+    gro_coalesced = int(match.group(3))
+
+    ksft_ge(gro_coalesced, 1,
+            comment="At least some packets should coalesce")
+
+    # received + 1 FIN, coalesced super-packets, coalesced * 2 wire packets
+    _check_gro_stats(cfg, test_queue, stats_before,
+                     expect_rx=rx_frames + 1,
+                     expect_gro=gro_coalesced,
+                     expect_wire=gro_coalesced * 2)
+
+
+def main() -> None:
+    """ Ksft boiler plate main """
+
+    with NetDrvEpEnv(__file__, nsim_test=False) as cfg:
+        cfg.netnl = NetdevFamily()
+        ksft_run([test_gro_stats_single,
+                  test_gro_stats_full], args=(cfg,))
+    ksft_exit()
+
+
+if __name__ == "__main__":
+    main()
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v2 8/9] selftests: drv-net: gro: add test for packet ordering
  2026-02-07  0:35 [PATCH net-next v2 0/9] net: stats, tools, driver tests for HW GRO Jakub Kicinski
                   ` (6 preceding siblings ...)
  2026-02-07  0:35 ` [PATCH net-next v2 7/9] selftests: drv-net: gro: test GRO stats Jakub Kicinski
@ 2026-02-07  0:35 ` Jakub Kicinski
  2026-02-07  0:35 ` [PATCH net-next v2 9/9] selftests: drv-net: gro: add a test for GRO depth Jakub Kicinski
  8 siblings, 0 replies; 22+ messages in thread
From: Jakub Kicinski @ 2026-02-07  0:35 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, shuah, willemb,
	petrm, donald.hunter, michael.chan, pavan.chebbi, linux-kselftest,
	Jakub Kicinski

Add a test to check if the NIC reorders packets if the hit GRO.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 tools/testing/selftests/net/lib/gro.c         | 38 +++++++++++++++++--
 .../selftests/drivers/net/hw/gro_hw.py        | 29 ++++++++++++--
 2 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/net/lib/gro.c b/tools/testing/selftests/net/lib/gro.c
index 41794b9f6f8a..3e611ae25f61 100644
--- a/tools/testing/selftests/net/lib/gro.c
+++ b/tools/testing/selftests/net/lib/gro.c
@@ -131,6 +131,7 @@ static int ethhdr_proto = -1;
 static bool ipip;
 static uint64_t txtime_ns;
 static int num_flows = 4;
+static bool order_check;
 
 #define CAPACITY_PAYLOAD_LEN 200
 
@@ -1136,6 +1137,7 @@ static void check_capacity_pkts(int fd)
 	static char buffer[IP_MAXPACKET + ETH_HLEN + 1];
 	struct iphdr *iph = (struct iphdr *)(buffer + ETH_HLEN);
 	struct ipv6hdr *ip6h = (struct ipv6hdr *)(buffer + ETH_HLEN);
+	int num_pkt = 0, num_coal = 0, pkt_idx;
 	const char *fail_reason = NULL;
 	int flow_order[num_flows * 2];
 	int coalesced[num_flows];
@@ -1144,8 +1146,6 @@ static void check_capacity_pkts(int fd)
 	int total_data = 0;
 	int pkt_size = -1;
 	int data_len = 0;
-	int num_pkt = 0;
-	int num_coal = 0;
 	int flow_id;
 	int sport;
 
@@ -1203,6 +1203,34 @@ static void check_capacity_pkts(int fd)
 		total_data += data_len;
 	}
 
+	/* Check flow ordering. We expect to see all non-coalesced first segs
+	 * then interleaved coalesced and non-coalesced second frames.
+	 */
+	pkt_idx = 0;
+	for (flow_id = 0; order_check && flow_id < num_flows; flow_id++) {
+		bool coaled = coalesced[flow_id] > CAPACITY_PAYLOAD_LEN;
+
+		if (coaled)
+			continue;
+
+		if (flow_order[pkt_idx] != flow_id) {
+			vlog("Flow order mismatch (non-coalesced) at position %d: expected flow %d, got flow %d\n",
+			     pkt_idx, flow_id, flow_order[pkt_idx]);
+			fail_reason = fail_reason ?: "bad packet order (1)";
+		}
+		pkt_idx++;
+	}
+	for (flow_id = 0; order_check && flow_id < num_flows; flow_id++) {
+		bool coaled = coalesced[flow_id] > CAPACITY_PAYLOAD_LEN;
+
+		if (flow_order[pkt_idx] != flow_id) {
+			vlog("Flow order mismatch at position %d: expected flow %d, got flow %d, coalesced: %d\n",
+			     pkt_idx, flow_id, flow_order[pkt_idx], coaled);
+			fail_reason = fail_reason ?: "bad packet order (2)";
+		}
+		pkt_idx++;
+	}
+
 	if (!fail_reason) {
 		vlog("All %d flows coalesced correctly\n", num_flows);
 		printf("Test succeeded\n\n");
@@ -1622,12 +1650,13 @@ static void parse_args(int argc, char **argv)
 		{ "saddr", required_argument, NULL, 's' },
 		{ "smac", required_argument, NULL, 'S' },
 		{ "test", required_argument, NULL, 't' },
+		{ "order-check", no_argument, NULL, 'o' },
 		{ "verbose", no_argument, NULL, 'v' },
 		{ 0, 0, 0, 0 }
 	};
 	int c;
 
-	while ((c = getopt_long(argc, argv, "46d:D:ei:n:rs:S:t:v", opts, NULL)) != -1) {
+	while ((c = getopt_long(argc, argv, "46d:D:ei:n:rs:S:t:ov", opts, NULL)) != -1) {
 		switch (c) {
 		case '4':
 			proto = PF_INET;
@@ -1666,6 +1695,9 @@ static void parse_args(int argc, char **argv)
 		case 't':
 			testname = optarg;
 			break;
+		case 'o':
+			order_check = true;
+			break;
 		case 'v':
 			verbose = true;
 			break;
diff --git a/tools/testing/selftests/drivers/net/hw/gro_hw.py b/tools/testing/selftests/drivers/net/hw/gro_hw.py
index 3bca19e8f339..10e08b22ee0e 100755
--- a/tools/testing/selftests/drivers/net/hw/gro_hw.py
+++ b/tools/testing/selftests/drivers/net/hw/gro_hw.py
@@ -10,7 +10,7 @@ import glob
 import re
 
 from lib.py import ksft_run, ksft_exit, ksft_pr
-from lib.py import ksft_eq, ksft_ge
+from lib.py import ksft_eq, ksft_ge, ksft_variants
 from lib.py import NetDrvEpEnv, NetdevFamily
 from lib.py import KsftSkipEx
 from lib.py import bkg, cmd, defer, ethtool, ip
@@ -78,7 +78,8 @@ GRO_DPORT = 8000
     return test_queue
 
 
-def _run_gro_test(cfg, test_name, num_flows=None, ignore_fail=False):
+def _run_gro_test(cfg, test_name, num_flows=None, ignore_fail=False,
+                  order_check=False):
     """Run gro binary with given test and return output."""
     if not hasattr(cfg, "bin_remote"):
         cfg.bin_local = cfg.net_lib_dir / "gro"
@@ -98,6 +99,8 @@ GRO_DPORT = 8000
     ]
     if num_flows:
         base_args.append(f"--num-flows {num_flows}")
+    if order_check:
+        base_args.append("--order-check")
 
     args = " ".join(base_args)
 
@@ -257,13 +260,33 @@ def _check_gro_stats(cfg, test_queue, stats_before,
                      expect_wire=gro_coalesced * 2)
 
 
+@ksft_variants([4, 32, 512])
+def test_gro_order(cfg, num_flows):
+    """
+    Test that HW GRO preserves packet ordering between flows.
+
+    Packets may get delayed until the aggregate is released,
+    but reordering between aggregates and packet terminating
+    the aggregate and normal packets should not happen.
+
+    Note that this test is stricter than truly required.
+    Reordering packets between flows should not cause issues.
+    This test will also fail if traffic is run over an ECMP fabric.
+    """
+    _setup_hw_gro(cfg)
+    _setup_isolated_queue(cfg)
+
+    _run_gro_test(cfg, "capacity", num_flows=num_flows, order_check=True)
+
+
 def main() -> None:
     """ Ksft boiler plate main """
 
     with NetDrvEpEnv(__file__, nsim_test=False) as cfg:
         cfg.netnl = NetdevFamily()
         ksft_run([test_gro_stats_single,
-                  test_gro_stats_full], args=(cfg,))
+                  test_gro_stats_full,
+                  test_gro_order], args=(cfg,))
     ksft_exit()
 
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v2 9/9] selftests: drv-net: gro: add a test for GRO depth
  2026-02-07  0:35 [PATCH net-next v2 0/9] net: stats, tools, driver tests for HW GRO Jakub Kicinski
                   ` (7 preceding siblings ...)
  2026-02-07  0:35 ` [PATCH net-next v2 8/9] selftests: drv-net: gro: add test for packet ordering Jakub Kicinski
@ 2026-02-07  0:35 ` Jakub Kicinski
  8 siblings, 0 replies; 22+ messages in thread
From: Jakub Kicinski @ 2026-02-07  0:35 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, shuah, willemb,
	petrm, donald.hunter, michael.chan, pavan.chebbi, linux-kselftest,
	Jakub Kicinski

Reuse the long sequence test to max out the GRO contexts.
Repeat for a single queue, 8 queues, and default number
of queues but flow steering to just one.

The SW GRO's capacity should be around 64 per queue
(8 buckets, up to 8 skbs in a chain).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v2:
 - move this "test" to SW, we can check SW GRO capacity too
v1: https://lore.kernel.org/20260205220541.2992807-10-kuba@kernel.org
---
 tools/testing/selftests/drivers/net/gro.py | 201 +++++++++++++++++++--
 1 file changed, 181 insertions(+), 20 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/gro.py b/tools/testing/selftests/drivers/net/gro.py
index 2da53686354f..795dbcc583a3 100755
--- a/tools/testing/selftests/drivers/net/gro.py
+++ b/tools/testing/selftests/drivers/net/gro.py
@@ -35,11 +35,18 @@ coalescing behavior.
   - large_rem: Large packet remainder handling
 """
 
+import glob
 import os
+import re
 from lib.py import ksft_run, ksft_exit, ksft_pr
 from lib.py import NetDrvEpEnv, KsftXfailEx
+from lib.py import NetdevFamily, EthtoolFamily
 from lib.py import bkg, cmd, defer, ethtool, ip
-from lib.py import ksft_variants
+from lib.py import ksft_variants, KsftNamedVariant
+
+
+# gro.c uses hardcoded DPORT=8000
+GRO_DPORT = 8000
 
 
 def _resolve_dmac(cfg, ipver):
@@ -113,6 +120,98 @@ from lib.py import ksft_variants
         ksft_pr(eth_cmd)
 
 
+def _get_queue_stats(cfg, queue_id):
+    """Get stats for a specific Rx queue."""
+    cfg.wait_hw_stats_settle()
+    data = cfg.netnl.qstats_get({"ifindex": cfg.ifindex, "scope": ["queue"]},
+                                dump=True)
+    for q in data:
+        if q.get('queue-type') == 'rx' and q.get('queue-id') == queue_id:
+            return q
+    return {}
+
+
+def _setup_isolated_queue(cfg):
+    """Set up an isolated queue for testing using ntuple filter.
+
+    Remove queue 1 from the default RSS context and steer test traffic to it.
+    """
+    test_queue = 1
+
+    qcnt = len(glob.glob(f"/sys/class/net/{cfg.ifname}/queues/rx-*"))
+    if qcnt < 2:
+        raise KsftXfailEx(f"Need at least 2 queues, have {qcnt}")
+
+    # Remove queue 1 from default RSS context by setting its weight to 0
+    weights = ["1"] * qcnt
+    weights[test_queue] = "0"
+    ethtool(f"-X {cfg.ifname} weight " + " ".join(weights))
+    defer(ethtool, f"-X {cfg.ifname} default")
+
+    # Set up ntuple filter to steer our test traffic to the isolated queue
+    flow  = f"flow-type tcp{cfg.addr_ipver} "
+    flow += f"dst-ip {cfg.addr} dst-port {GRO_DPORT} action {test_queue}"
+    output = ethtool(f"-N {cfg.ifname} {flow}").stdout
+    ntuple_id = int(output.split()[-1])
+    defer(ethtool, f"-N {cfg.ifname} delete {ntuple_id}")
+
+    return test_queue
+
+
+def _setup_queue_count(cfg, num_queues):
+    """Configure the NIC to use a specific number of queues."""
+    channels = cfg.ethnl.channels_get({'header': {'dev-index': cfg.ifindex}})
+    ch_max = channels.get('combined-max', 0)
+    qcnt = channels['combined-count']
+
+    if ch_max < num_queues:
+        raise KsftXfailEx(f"Need at least {num_queues} queues, max={ch_max}")
+
+    defer(ethtool, f"-L {cfg.ifname} combined {qcnt}")
+    ethtool(f"-L {cfg.ifname} combined {num_queues}")
+
+
+def _run_gro_bin(cfg, test_name, protocol=None, num_flows=None,
+                 order_check=False, verbose=False, fail=False):
+    """Run gro binary with given test and return the process result."""
+    if not hasattr(cfg, "bin_remote"):
+        cfg.bin_local = cfg.net_lib_dir / "gro"
+        cfg.bin_remote = cfg.remote.deploy(cfg.bin_local)
+
+    if protocol is None:
+        ipver = cfg.addr_ipver
+        protocol = f"ipv{ipver}"
+    else:
+        ipver = "6" if protocol[-1] == "6" else "4"
+
+    dmac = _resolve_dmac(cfg, ipver)
+
+    base_args = [
+        f"--{protocol}",
+        f"--dmac {dmac}",
+        f"--smac {cfg.remote_dev['address']}",
+        f"--daddr {cfg.addr_v[ipver]}",
+        f"--saddr {cfg.remote_addr_v[ipver]}",
+        f"--test {test_name}",
+    ]
+    if num_flows:
+        base_args.append(f"--num-flows {num_flows}")
+    if order_check:
+        base_args.append("--order-check")
+    if verbose:
+        base_args.append("--verbose")
+
+    args = " ".join(base_args)
+
+    rx_cmd = f"{cfg.bin_local} {args} --rx --iface {cfg.ifname}"
+    tx_cmd = f"{cfg.bin_remote} {args} --iface {cfg.remote_ifname}"
+
+    with bkg(rx_cmd, ksft_ready=True, exit_wait=True, fail=fail) as rx_proc:
+        cmd(tx_cmd, host=cfg.remote)
+
+    return rx_proc
+
+
 def _setup(cfg, mode, test_name):
     """ Setup hardware loopback mode for GRO testing. """
 
@@ -233,30 +332,14 @@ from lib.py import ksft_variants
 
     _setup(cfg, mode, test_name)
 
-    base_cmd_args = [
-        f"--{protocol}",
-        f"--dmac {_resolve_dmac(cfg, ipver)}",
-        f"--smac {cfg.remote_dev['address']}",
-        f"--daddr {cfg.addr_v[ipver]}",
-        f"--saddr {cfg.remote_addr_v[ipver]}",
-        f"--test {test_name}",
-        "--verbose"
-    ]
-    base_args = " ".join(base_cmd_args)
-
     # Each test is run 6 times to deflake, because given the receive timing,
     # not all packets that should coalesce will be considered in the same flow
     # on every try.
     max_retries = 6
     for attempt in range(max_retries):
-        rx_cmd = f"{cfg.bin_local} {base_args} --rx --iface {cfg.ifname}"
-        tx_cmd = f"{cfg.bin_remote} {base_args} --iface {cfg.remote_ifname}"
-
         fail_now = attempt >= max_retries - 1
-
-        with bkg(rx_cmd, ksft_ready=True, exit_wait=True,
-                 fail=fail_now) as rx_proc:
-            cmd(tx_cmd, host=cfg.remote)
+        rx_proc = _run_gro_bin(cfg, test_name, protocol=protocol,
+                               verbose=True, fail=fail_now)
 
         if rx_proc.ret == 0:
             return
@@ -270,11 +353,89 @@ from lib.py import ksft_variants
         ksft_pr(f"Attempt {attempt + 1}/{max_retries} failed, retrying...")
 
 
+def _capacity_variants():
+    """Generate variants for capacity test: mode x queue setup."""
+    setups = [
+        ("isolated", _setup_isolated_queue),
+        ("1q", lambda cfg: _setup_queue_count(cfg, 1)),
+        ("8q", lambda cfg: _setup_queue_count(cfg, 8)),
+    ]
+    for mode in ["sw", "hw", "lro"]:
+        for name, func in setups:
+            yield KsftNamedVariant(f"{mode}_{name}", mode, func)
+
+
+@ksft_variants(_capacity_variants())
+def test_gro_capacity(cfg, mode, setup_func):
+    """
+    Probe GRO capacity.
+
+    Start with 8 flows and increase by 2x on each successful run.
+    Retry up to 3 times on failure.
+
+    Variants combine mode (sw, hw, lro) with queue setup:
+      - isolated: Use a single queue isolated from RSS
+      - 1q: Configure NIC to use 1 queue
+      - 8q: Configure NIC to use 8 queues
+    """
+    max_retries = 3
+
+    _setup(cfg, mode, "capacity")
+    queue_id = setup_func(cfg)
+
+    num_flows = 8
+    while True:
+        success = False
+        for attempt in range(max_retries):
+            if queue_id is not None:
+                stats_before = _get_queue_stats(cfg, queue_id)
+
+            rx_proc = _run_gro_bin(cfg, "capacity", num_flows=num_flows)
+            output = rx_proc.stdout
+
+            if queue_id is not None:
+                stats_after = _get_queue_stats(cfg, queue_id)
+                qstat_pkts = (stats_after.get('rx-packets', 0) -
+                              stats_before.get('rx-packets', 0))
+                gro_pkts = (stats_after.get('rx-hw-gro-packets', 0) -
+                            stats_before.get('rx-hw-gro-packets', 0))
+                qstat_str = f" qstat={qstat_pkts} hw-gro={gro_pkts}"
+            else:
+                qstat_str = ""
+
+            # Parse and print STATS line
+            match = re.search(
+                r'STATS: received=(\d+) wire=(\d+) coalesced=(\d+)', output)
+            if match:
+                received = int(match.group(1))
+                wire = int(match.group(2))
+                coalesced = int(match.group(3))
+                status = "PASS" if received == num_flows else "FAIL"
+                ksft_pr(f"flows={num_flows} attempt={attempt + 1} "
+                        f"received={received} wire={wire} "
+                        f"coalesced={coalesced}{qstat_str} [{status}]")
+                if received == num_flows:
+                    success = True
+                    break
+            else:
+                ksft_pr(rx_proc)
+                ksft_pr(f"flows={num_flows} attempt={attempt + 1}"
+                        f"{qstat_str} [FAIL - can't parse stats]")
+
+        if not success:
+            ksft_pr(f"Stopped at {num_flows} flows")
+            break
+
+        num_flows *= 2
+
+
 def main() -> None:
     """ Ksft boiler plate main """
 
     with NetDrvEpEnv(__file__) as cfg:
-        ksft_run(cases=[test], args=(cfg,))
+        cfg.ethnl = EthtoolFamily()
+        cfg.netnl = NetdevFamily()
+        ksft_run(cases=[test, test_gro_capacity], args=(cfg,))
     ksft_exit()
 
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 1/9] eth: bnxt: gather and report HW-GRO stats
  2026-02-07  0:35 ` [PATCH net-next v2 1/9] eth: bnxt: gather and report HW-GRO stats Jakub Kicinski
@ 2026-02-08  0:09   ` Michael Chan
  2026-02-11  1:51     ` Jakub Kicinski
  0 siblings, 1 reply; 22+ messages in thread
From: Michael Chan @ 2026-02-08  0:09 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
	willemb, petrm, donald.hunter, pavan.chebbi, linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 1305 bytes --]

On Fri, Feb 6, 2026 at 4:35 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Count and report HW-GRO stats as seen by the kernel.
> The device stats for GRO seem to not reflect the reality,
> perhaps they count sessions which did not actually result
> in any aggregation. Also they count wire packets, so we
> have to count super-frames, anyway.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Michael Chan <michael.chan@broadcom.com>

I'm suggesting a minor naming change below if you need to do v3.

> @@ -13492,6 +13497,8 @@ static void bnxt_get_one_ring_err_stats(struct bnxt *bp,

With the new GRO counters, this function is no longer limited to error
stats.  So maybe rename it to something like
bnxt_get_one_ring_misc_stats()?

>         stats->rx_total_netpoll_discards += sw_stats->rx.rx_netpoll_discards;
>         stats->rx_total_ring_discards +=
>                 BNXT_GET_RING_STATS64(hw_stats, rx_discard_pkts);
> +       stats->rx_total_hw_gro_packets += sw_stats->rx.rx_hw_gro_packets;
> +       stats->rx_total_hw_gro_wire_packets += sw_stats->rx.rx_hw_gro_wire_packets;
>         stats->tx_total_resets += sw_stats->tx.tx_resets;
>         stats->tx_total_ring_discards +=
>                 BNXT_GET_RING_STATS64(hw_stats, tx_discard_pkts);

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5469 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 4/9] selftests: net: move gro to lib for HW vs SW reuse
  2026-02-07  0:35 ` [PATCH net-next v2 4/9] selftests: net: move gro to lib for HW vs SW reuse Jakub Kicinski
@ 2026-02-09  2:36   ` Willem de Bruijn
  0 siblings, 0 replies; 22+ messages in thread
From: Willem de Bruijn @ 2026-02-09  2:36 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, shuah, willemb,
	petrm, donald.hunter, michael.chan, pavan.chebbi, linux-kselftest,
	Jakub Kicinski

Jakub Kicinski wrote:
> The gro.c packet sender is used for SW testing but bulk of incoming
> new tests will be HW-specific. So it's better to put them under
> drivers/net/hw/, to avoid tip-toeing around netdevsim. Move gro.c
> to lib so we can reuse it.
> 
> Reviewed-by: Petr Machata <petrm@nvidia.com>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 5/9] selftests: drv-net: give HW stats sync time extra 25% of margin
  2026-02-07  0:35 ` [PATCH net-next v2 5/9] selftests: drv-net: give HW stats sync time extra 25% of margin Jakub Kicinski
@ 2026-02-09  2:37   ` Willem de Bruijn
  0 siblings, 0 replies; 22+ messages in thread
From: Willem de Bruijn @ 2026-02-09  2:37 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, shuah, willemb,
	petrm, donald.hunter, michael.chan, pavan.chebbi, linux-kselftest,
	Jakub Kicinski

Jakub Kicinski wrote:
> There are transient failures for devices which update stats
> periodically, especially if it's the FW DMA'ing the stats
> rather than host periodic work querying the FW. Wait 25%
> longer than strictly necessary.
> 
> For devices which don't report stats-block-usecs we retain
> 25 msec as the default wait time (0.025sec == 20,000usec * 1.25).
> 
> Reviewed-by: Petr Machata <petrm@nvidia.com>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 6/9] selftests: drv-net: gro: use SO_TXTIME to schedule packets together
  2026-02-07  0:35 ` [PATCH net-next v2 6/9] selftests: drv-net: gro: use SO_TXTIME to schedule packets together Jakub Kicinski
@ 2026-02-09  2:39   ` Willem de Bruijn
  2026-02-11  1:56     ` Jakub Kicinski
  0 siblings, 1 reply; 22+ messages in thread
From: Willem de Bruijn @ 2026-02-09  2:39 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, shuah, willemb,
	petrm, donald.hunter, michael.chan, pavan.chebbi, linux-kselftest,
	Jakub Kicinski

Jakub Kicinski wrote:
> Longer packet sequence tests are quite flaky when the test is run
> over a real network. Try to avoid at least the jitter on the sender
> side by scheduling all the packets to be sent at once using SO_TXTIME.
> Use hardcoded tx time of 5msec in the future. In my test increasing
> this time past 2msec makes no difference so 5msec is plenty of margin.
> Since we now expect more output buffering make sure to raise SNDBUF.
> 
> Experimenting with long sequences I see frequent failures when sending
> 200 packets, only 50-100 packets get coalesced. With this change
> up to 1000 packets get coalesced relatively reliably.
> 
> Reviewed-by: Petr Machata <petrm@nvidia.com>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Does this require having FQ installed? I don't see any qdisc config
in the GRO test.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 3/9] tools: ynltool: add qstats analysis for HW-GRO efficiency / savings
  2026-02-07  0:35 ` [PATCH net-next v2 3/9] tools: ynltool: add qstats analysis for HW-GRO efficiency / savings Jakub Kicinski
@ 2026-02-09  9:43   ` Petr Machata
  0 siblings, 0 replies; 22+ messages in thread
From: Petr Machata @ 2026-02-09  9:43 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
	willemb, petrm, donald.hunter, michael.chan, pavan.chebbi,
	linux-kselftest


Jakub Kicinski <kuba@kernel.org> writes:

> Extend ynltool to compute HW GRO savings metric - how many
> packets has HW GRO been able to save the kernel from seeing.
>
> Note that this definition does not actually take into account
> whether the segments were or weren't eligible for HW GRO.
> If a machine is receiving all-UDP traffic - new metric will show
> HW-GRO savings of 0%. Conversely since the super-packet still
> counts as a received packet, savings of 100% is not achievable.
> Perfect HW-GRO on a machine with 4k MTU and 64kB super-frames
> would show ~93.75% savings. With 1.5k MTU we may see up to
> ~97.8% savings (if my math is right).
>
> Example after 10 sec of iperf on a freshly booted machine
> with 1.5k MTU:
>
>   $ ynltool qstats show
>   eth0     rx-packets:  40681280               rx-bytes:   61575208437
>         rx-alloc-fail:         0      rx-hw-gro-packets:       1225133
>                                  rx-hw-gro-wire-packets:      40656633
>   $ ynltool qstats hw-gro
>   eth0: 96.9% savings
>
> None of the NICs I have access to can report "missed" HW-GRO
> opportunities so computing a true "effectiveness" metric
> is not possible. One could also argue that effectiveness metric
> is inferior in environments where we control both senders and
> receivers, the savings metrics will capture both regressions
> in receiver's HW GRO effectiveness but also regressions in senders
> sending smaller TSO trains. And we care about both. The main
> downside is that it's hard to tell at a glance how well the NIC
> is doing because the savings will be dependent on traffic patterns.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Petr Machata <petrm@nvidia.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 1/9] eth: bnxt: gather and report HW-GRO stats
  2026-02-08  0:09   ` Michael Chan
@ 2026-02-11  1:51     ` Jakub Kicinski
  0 siblings, 0 replies; 22+ messages in thread
From: Jakub Kicinski @ 2026-02-11  1:51 UTC (permalink / raw)
  To: Michael Chan
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
	willemb, petrm, donald.hunter, pavan.chebbi, linux-kselftest

On Sat, 7 Feb 2026 16:09:13 -0800 Michael Chan wrote:
> > @@ -13492,6 +13497,8 @@ static void bnxt_get_one_ring_err_stats(struct bnxt *bp,  
> 
> With the new GRO counters, this function is no longer limited to error
> stats.  So maybe rename it to something like
> bnxt_get_one_ring_misc_stats()?

Ack, I'll queue up a cleanup after the merge window.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 6/9] selftests: drv-net: gro: use SO_TXTIME to schedule packets together
  2026-02-09  2:39   ` Willem de Bruijn
@ 2026-02-11  1:56     ` Jakub Kicinski
  2026-02-11  3:15       ` Willem de Bruijn
  0 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2026-02-11  1:56 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
	willemb, petrm, donald.hunter, michael.chan, pavan.chebbi,
	linux-kselftest

On Sun, 08 Feb 2026 21:39:38 -0500 Willem de Bruijn wrote:
> Jakub Kicinski wrote:
> > Longer packet sequence tests are quite flaky when the test is run
> > over a real network. Try to avoid at least the jitter on the sender
> > side by scheduling all the packets to be sent at once using SO_TXTIME.
> > Use hardcoded tx time of 5msec in the future. In my test increasing
> > this time past 2msec makes no difference so 5msec is plenty of margin.
> > Since we now expect more output buffering make sure to raise SNDBUF.
> > 
> > Experimenting with long sequences I see frequent failures when sending
> > 200 packets, only 50-100 packets get coalesced. With this change
> > up to 1000 packets get coalesced relatively reliably.
> > 
> > Reviewed-by: Petr Machata <petrm@nvidia.com>
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>  
> 
> Does this require having FQ installed? I don't see any qdisc config
> in the GRO test.

It's a bit of an opportunistic optimization.

I initially intended it for for the "long sequence of packets"
test. But I failed to get AF_PACKET+FQ to cooperate sufficiently
to queue all of the packets in the same bucket. Otherwise FQ "sorts"
the packets, and breaks what the test is trying to do :(

Oh, and as mentioned in the commit msg - this improvement is intended
for HW-GRO, which may have very low timeouts. The test already
configures timeout for SW GRO to a very high value, so don't think
we would gain anything from setting up FQ on veth/netdevsim for the 
SW test.

So IDK what to do with this patch. Maybe I should just drop it?
It _seemed_ useful, but I don't have enough datapoints to do a real
comparison of how much it improves reliability.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 6/9] selftests: drv-net: gro: use SO_TXTIME to schedule packets together
  2026-02-11  1:56     ` Jakub Kicinski
@ 2026-02-11  3:15       ` Willem de Bruijn
  2026-02-11  3:48         ` Jakub Kicinski
  0 siblings, 1 reply; 22+ messages in thread
From: Willem de Bruijn @ 2026-02-11  3:15 UTC (permalink / raw)
  To: Jakub Kicinski, Willem de Bruijn
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
	willemb, petrm, donald.hunter, michael.chan, pavan.chebbi,
	linux-kselftest

Jakub Kicinski wrote:
> On Sun, 08 Feb 2026 21:39:38 -0500 Willem de Bruijn wrote:
> > Jakub Kicinski wrote:
> > > Longer packet sequence tests are quite flaky when the test is run
> > > over a real network. Try to avoid at least the jitter on the sender
> > > side by scheduling all the packets to be sent at once using SO_TXTIME.
> > > Use hardcoded tx time of 5msec in the future. In my test increasing
> > > this time past 2msec makes no difference so 5msec is plenty of margin.
> > > Since we now expect more output buffering make sure to raise SNDBUF.
> > > 
> > > Experimenting with long sequences I see frequent failures when sending
> > > 200 packets, only 50-100 packets get coalesced. With this change
> > > up to 1000 packets get coalesced relatively reliably.
> > > 
> > > Reviewed-by: Petr Machata <petrm@nvidia.com>
> > > Signed-off-by: Jakub Kicinski <kuba@kernel.org>  
> > 
> > Does this require having FQ installed? I don't see any qdisc config
> > in the GRO test.
> 
> It's a bit of an opportunistic optimization.
> 
> I initially intended it for for the "long sequence of packets"
> test. But I failed to get AF_PACKET+FQ to cooperate sufficiently
> to queue all of the packets in the same bucket. Otherwise FQ "sorts"
> the packets, and breaks what the test is trying to do :(

I wonder what's going wrong here.

fq_classify should pick the queue based on skb->sk also for packet
sockets.

And flow_queue_add should add the packets to the tail of the linear
list if the delivery time is identical to that of the tail.
 
> Oh, and as mentioned in the commit msg - this improvement is intended
> for HW-GRO, which may have very low timeouts. The test already
> configures timeout for SW GRO to a very high value, so don't think
> we would gain anything from setting up FQ on veth/netdevsim for the 
> SW test.
> 
> So IDK what to do with this patch. Maybe I should just drop it?
> It _seemed_ useful, but I don't have enough datapoints to do a real
> comparison of how much it improves reliability.

It seems useful indeed.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 6/9] selftests: drv-net: gro: use SO_TXTIME to schedule packets together
  2026-02-11  3:15       ` Willem de Bruijn
@ 2026-02-11  3:48         ` Jakub Kicinski
  2026-02-11  4:21           ` Willem de Bruijn
  0 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2026-02-11  3:48 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
	willemb, petrm, donald.hunter, michael.chan, pavan.chebbi,
	linux-kselftest

On Tue, 10 Feb 2026 22:15:25 -0500 Willem de Bruijn wrote:
> > It's a bit of an opportunistic optimization.
> > 
> > I initially intended it for for the "long sequence of packets"
> > test. But I failed to get AF_PACKET+FQ to cooperate sufficiently
> > to queue all of the packets in the same bucket. Otherwise FQ "sorts"
> > the packets, and breaks what the test is trying to do :(  
> 
> I wonder what's going wrong here.
> 
> fq_classify should pick the queue based on skb->sk also for packet
> sockets.
> 
> And flow_queue_add should add the packets to the tail of the linear
> list if the delivery time is identical to that of the tail.

It works but requires that we either modify the qdisc config to set
a orphan_mask of 1, or somehow set the skb->hash on the AF_PACKET skbs.
The test sends out multiple flows (src ports) so if we let fq compute
the real hash we end up in different buckets.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 6/9] selftests: drv-net: gro: use SO_TXTIME to schedule packets together
  2026-02-11  3:48         ` Jakub Kicinski
@ 2026-02-11  4:21           ` Willem de Bruijn
  2026-02-11 17:00             ` Jakub Kicinski
  0 siblings, 1 reply; 22+ messages in thread
From: Willem de Bruijn @ 2026-02-11  4:21 UTC (permalink / raw)
  To: Jakub Kicinski, Willem de Bruijn
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
	willemb, petrm, donald.hunter, michael.chan, pavan.chebbi,
	linux-kselftest

Jakub Kicinski wrote:
> On Tue, 10 Feb 2026 22:15:25 -0500 Willem de Bruijn wrote:
> > > It's a bit of an opportunistic optimization.
> > > 
> > > I initially intended it for for the "long sequence of packets"
> > > test. But I failed to get AF_PACKET+FQ to cooperate sufficiently
> > > to queue all of the packets in the same bucket. Otherwise FQ "sorts"
> > > the packets, and breaks what the test is trying to do :(  
> > 
> > I wonder what's going wrong here.
> > 
> > fq_classify should pick the queue based on skb->sk also for packet
> > sockets.
> > 
> > And flow_queue_add should add the packets to the tail of the linear
> > list if the delivery time is identical to that of the tail.
> 
> It works but requires that we either modify the qdisc config to set
> a orphan_mask of 1, or somehow set the skb->hash on the AF_PACKET skbs.

Oh right, fq_classify does not use skb->sk for packet sockets because
they are in default sk_state TCP_CLOSE.

And this is by design, as clearly documented, as packet sockets should
not be assumed to be a single flow:

        } else if (sk->sk_state == TCP_CLOSE) {
                unsigned long hash = skb_get_hash(skb) & q->orphan_mask;
                /*
                 * Sockets in TCP_CLOSE are non connected.
                 * Typical use case is UDP sockets, they can send packets
                 * with sendto() to many different destinations.
                 * We probably could use a generic bit advertising
                 * non connected sockets, instead of sk_state == TCP_CLOSE,
                 * if we care enough.
                 */
                sk = (struct sock *)((hash << 1) | 1UL);
        }

An orphan_mask of 1 sounds like an effective workaround.

I don't see a way to force a specific skb_get_hash result across
flows, given hashrnd.

> The test sends out multiple flows (src ports) so if we let fq compute
> the real hash we end up in different buckets.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 6/9] selftests: drv-net: gro: use SO_TXTIME to schedule packets together
  2026-02-11  4:21           ` Willem de Bruijn
@ 2026-02-11 17:00             ` Jakub Kicinski
  2026-02-11 17:22               ` Willem de Bruijn
  0 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2026-02-11 17:00 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
	willemb, petrm, donald.hunter, michael.chan, pavan.chebbi,
	linux-kselftest

On Tue, 10 Feb 2026 23:21:07 -0500 Willem de Bruijn wrote:
> > > I wonder what's going wrong here.
> > > 
> > > fq_classify should pick the queue based on skb->sk also for packet
> > > sockets.
> > > 
> > > And flow_queue_add should add the packets to the tail of the linear
> > > list if the delivery time is identical to that of the tail.  
> > 
> > It works but requires that we either modify the qdisc config to set
> > a orphan_mask of 1, or somehow set the skb->hash on the AF_PACKET skbs.  
> 
> Oh right, fq_classify does not use skb->sk for packet sockets because
> they are in default sk_state TCP_CLOSE.
> 
> And this is by design, as clearly documented, as packet sockets should
> not be assumed to be a single flow:
> 
>         } else if (sk->sk_state == TCP_CLOSE) {
>                 unsigned long hash = skb_get_hash(skb) & q->orphan_mask;
>                 /*
>                  * Sockets in TCP_CLOSE are non connected.
>                  * Typical use case is UDP sockets, they can send packets
>                  * with sendto() to many different destinations.
>                  * We probably could use a generic bit advertising
>                  * non connected sockets, instead of sk_state == TCP_CLOSE,
>                  * if we care enough.
>                  */
>                 sk = (struct sock *)((hash << 1) | 1UL);
>         }
> 
> An orphan_mask of 1 sounds like an effective workaround.
> 
> I don't see a way to force a specific skb_get_hash result across
> flows, given hashrnd.

So WDYT about the patch? I don't wanna tweak qdiscs on real interfaces.
It's way to hard to undo. IMHO either we keep the patch as is with its
limited effect or just drop it.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v2 6/9] selftests: drv-net: gro: use SO_TXTIME to schedule packets together
  2026-02-11 17:00             ` Jakub Kicinski
@ 2026-02-11 17:22               ` Willem de Bruijn
  0 siblings, 0 replies; 22+ messages in thread
From: Willem de Bruijn @ 2026-02-11 17:22 UTC (permalink / raw)
  To: Jakub Kicinski, Willem de Bruijn
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
	willemb, petrm, donald.hunter, michael.chan, pavan.chebbi,
	linux-kselftest

Jakub Kicinski wrote:
> On Tue, 10 Feb 2026 23:21:07 -0500 Willem de Bruijn wrote:
> > > > I wonder what's going wrong here.
> > > > 
> > > > fq_classify should pick the queue based on skb->sk also for packet
> > > > sockets.
> > > > 
> > > > And flow_queue_add should add the packets to the tail of the linear
> > > > list if the delivery time is identical to that of the tail.  
> > > 
> > > It works but requires that we either modify the qdisc config to set
> > > a orphan_mask of 1, or somehow set the skb->hash on the AF_PACKET skbs.  
> > 
> > Oh right, fq_classify does not use skb->sk for packet sockets because
> > they are in default sk_state TCP_CLOSE.
> > 
> > And this is by design, as clearly documented, as packet sockets should
> > not be assumed to be a single flow:
> > 
> >         } else if (sk->sk_state == TCP_CLOSE) {
> >                 unsigned long hash = skb_get_hash(skb) & q->orphan_mask;
> >                 /*
> >                  * Sockets in TCP_CLOSE are non connected.
> >                  * Typical use case is UDP sockets, they can send packets
> >                  * with sendto() to many different destinations.
> >                  * We probably could use a generic bit advertising
> >                  * non connected sockets, instead of sk_state == TCP_CLOSE,
> >                  * if we care enough.
> >                  */
> >                 sk = (struct sock *)((hash << 1) | 1UL);
> >         }
> > 
> > An orphan_mask of 1 sounds like an effective workaround.
> > 
> > I don't see a way to force a specific skb_get_hash result across
> > flows, given hashrnd.
> 
> So WDYT about the patch? I don't wanna tweak qdiscs on real interfaces.
> It's way to hard to undo. IMHO either we keep the patch as is with its
> limited effect or just drop it.

Reviewed-by: Willem de Bruijn <willemb@google.com>

I would say keep it. When respinning, maybe add an explicit note that
for this to be effective FQ needs to be installed.

Aside: there currently is no API for the kernel to communicate whether
a cmsg SO_TXTIME request was honored or ignored. But if a caller cares
it can either request a Tx timestamp.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2026-02-11 17:22 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-07  0:35 [PATCH net-next v2 0/9] net: stats, tools, driver tests for HW GRO Jakub Kicinski
2026-02-07  0:35 ` [PATCH net-next v2 1/9] eth: bnxt: gather and report HW-GRO stats Jakub Kicinski
2026-02-08  0:09   ` Michael Chan
2026-02-11  1:51     ` Jakub Kicinski
2026-02-07  0:35 ` [PATCH net-next v2 2/9] tools: ynltool: factor out qstat dumping Jakub Kicinski
2026-02-07  0:35 ` [PATCH net-next v2 3/9] tools: ynltool: add qstats analysis for HW-GRO efficiency / savings Jakub Kicinski
2026-02-09  9:43   ` Petr Machata
2026-02-07  0:35 ` [PATCH net-next v2 4/9] selftests: net: move gro to lib for HW vs SW reuse Jakub Kicinski
2026-02-09  2:36   ` Willem de Bruijn
2026-02-07  0:35 ` [PATCH net-next v2 5/9] selftests: drv-net: give HW stats sync time extra 25% of margin Jakub Kicinski
2026-02-09  2:37   ` Willem de Bruijn
2026-02-07  0:35 ` [PATCH net-next v2 6/9] selftests: drv-net: gro: use SO_TXTIME to schedule packets together Jakub Kicinski
2026-02-09  2:39   ` Willem de Bruijn
2026-02-11  1:56     ` Jakub Kicinski
2026-02-11  3:15       ` Willem de Bruijn
2026-02-11  3:48         ` Jakub Kicinski
2026-02-11  4:21           ` Willem de Bruijn
2026-02-11 17:00             ` Jakub Kicinski
2026-02-11 17:22               ` Willem de Bruijn
2026-02-07  0:35 ` [PATCH net-next v2 7/9] selftests: drv-net: gro: test GRO stats Jakub Kicinski
2026-02-07  0:35 ` [PATCH net-next v2 8/9] selftests: drv-net: gro: add test for packet ordering Jakub Kicinski
2026-02-07  0:35 ` [PATCH net-next v2 9/9] selftests: drv-net: gro: add a test for GRO depth Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox