linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 1/2] selftests: drv-net: rss_ctx: use Netlink for timed reconfig
@ 2025-08-29 22:07 Jakub Kicinski
  2025-08-29 22:07 ` [PATCH net-next 2/2] selftests: drv-net: rss_ctx: make the test pass with few queues Jakub Kicinski
  2025-09-01 13:50 ` [PATCH net-next 1/2] selftests: drv-net: rss_ctx: use Netlink for timed reconfig Simon Horman
  0 siblings, 2 replies; 5+ messages in thread
From: Jakub Kicinski @ 2025-08-29 22:07 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, ecree.xilinx, gal,
	joe, linux-kselftest, shuah, Jakub Kicinski

The rss_ctx test has gotten pretty flaky after I increased
the queue count in NIPA 2->3. Not 100% clear why. We get
a lot of failures in the rss_ctx.test_hitless_key_update case.

Looking closer it appears that the failures are mostly due
to startup costs. I measured the following timing for ethtool -X:
 - python cmd(shell=True)  : 150-250msec
 - python cmd(shell=False) :  50- 70msec
 - timed in bash           :  45- 55msec
 - YNL Netlink call        :   2-  4msec
 - .set_rxfh callback      :   1-  2msec

The target in the test was set to 200msec. We were mostly measuring
ethtool startup cost it seems. Switch to YNL since it's 100x faster.

Lower the pass criteria to ~75msec, no real science behind this number
but we removed ~150msec of overhead, and the old target was 200msec.
So any driver that was passing previously should still pass with 75msec.

Separately we should probably follow up on defaulting to shell=False,
when script doesn't explicitly ask for True, because the overhead
is rather significant.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 tools/testing/selftests/drivers/net/hw/rss_ctx.py | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/hw/rss_ctx.py b/tools/testing/selftests/drivers/net/hw/rss_ctx.py
index 9838b8457e5a..3fc5688605b5 100755
--- a/tools/testing/selftests/drivers/net/hw/rss_ctx.py
+++ b/tools/testing/selftests/drivers/net/hw/rss_ctx.py
@@ -335,19 +335,20 @@ from lib.py import ethtool, ip, defer, GenerateTraffic, CmdExitFailure
     data = get_rss(cfg)
     key_len = len(data['rss-hash-key'])
 
-    key = _rss_key_rand(key_len)
+    ethnl = EthtoolFamily()
+    key = random.randbytes(key_len)
 
     tgen = GenerateTraffic(cfg)
     try:
         errors0, carrier0 = get_drop_err_sum(cfg)
         t0 = datetime.datetime.now()
-        ethtool(f"-X {cfg.ifname} hkey " + _rss_key_str(key))
+        ethnl.rss_set({"header": {"dev-index": cfg.ifindex}, "hkey": key})
         t1 = datetime.datetime.now()
         errors1, carrier1 = get_drop_err_sum(cfg)
     finally:
         tgen.wait_pkts_and_stop(5000)
 
-    ksft_lt((t1 - t0).total_seconds(), 0.2)
+    ksft_lt((t1 - t0).total_seconds(), 0.075)
     ksft_eq(errors1 - errors1, 0)
     ksft_eq(carrier1 - carrier0, 0)
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net-next 2/2] selftests: drv-net: rss_ctx: make the test pass with few queues
  2025-08-29 22:07 [PATCH net-next 1/2] selftests: drv-net: rss_ctx: use Netlink for timed reconfig Jakub Kicinski
@ 2025-08-29 22:07 ` Jakub Kicinski
  2025-09-01 13:50 ` [PATCH net-next 1/2] selftests: drv-net: rss_ctx: use Netlink for timed reconfig Simon Horman
  1 sibling, 0 replies; 5+ messages in thread
From: Jakub Kicinski @ 2025-08-29 22:07 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, ecree.xilinx, gal,
	joe, linux-kselftest, shuah, Jakub Kicinski

rss_ctx.test_rss_key_indir implicitly expects at least 5 queues,
as it checks that the traffic on first 2 queues is lower than
the remaining queues when we use all queues. Special case fewer
queues.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 tools/testing/selftests/drivers/net/hw/rss_ctx.py | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/hw/rss_ctx.py b/tools/testing/selftests/drivers/net/hw/rss_ctx.py
index 3fc5688605b5..4fa8c7f198a8 100755
--- a/tools/testing/selftests/drivers/net/hw/rss_ctx.py
+++ b/tools/testing/selftests/drivers/net/hw/rss_ctx.py
@@ -178,8 +178,13 @@ from lib.py import ethtool, ip, defer, GenerateTraffic, CmdExitFailure
     cnts = _get_rx_cnts(cfg)
     GenerateTraffic(cfg).wait_pkts_and_stop(20000)
     cnts = _get_rx_cnts(cfg, prev=cnts)
-    # First two queues get less traffic than all the rest
-    ksft_lt(sum(cnts[:2]), sum(cnts[2:]), "traffic distributed: " + str(cnts))
+    if qcnt > 4:
+        # First two queues get less traffic than all the rest
+        ksft_lt(sum(cnts[:2]), sum(cnts[2:]),
+                "traffic distributed: " + str(cnts))
+    else:
+        # When queue count is low make sure third queue got significant pkts
+        ksft_ge(cnts[2], 3500, "traffic distributed: " + str(cnts))
 
 
 def test_rss_queue_reconfigure(cfg, main_ctx=True):
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next 1/2] selftests: drv-net: rss_ctx: use Netlink for timed reconfig
  2025-08-29 22:07 [PATCH net-next 1/2] selftests: drv-net: rss_ctx: use Netlink for timed reconfig Jakub Kicinski
  2025-08-29 22:07 ` [PATCH net-next 2/2] selftests: drv-net: rss_ctx: make the test pass with few queues Jakub Kicinski
@ 2025-09-01 13:50 ` Simon Horman
  2025-09-01 17:26   ` Jakub Kicinski
  1 sibling, 1 reply; 5+ messages in thread
From: Simon Horman @ 2025-09-01 13:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, ecree.xilinx, gal,
	joe, linux-kselftest, shuah

On Fri, Aug 29, 2025 at 03:07:11PM -0700, Jakub Kicinski wrote:
> The rss_ctx test has gotten pretty flaky after I increased
> the queue count in NIPA 2->3. Not 100% clear why. We get
> a lot of failures in the rss_ctx.test_hitless_key_update case.
> 
> Looking closer it appears that the failures are mostly due
> to startup costs. I measured the following timing for ethtool -X:
>  - python cmd(shell=True)  : 150-250msec
>  - python cmd(shell=False) :  50- 70msec
>  - timed in bash           :  45- 55msec
>  - YNL Netlink call        :   2-  4msec
>  - .set_rxfh callback      :   1-  2msec
> 
> The target in the test was set to 200msec. We were mostly measuring
> ethtool startup cost it seems. Switch to YNL since it's 100x faster.
> 
> Lower the pass criteria to ~75msec, no real science behind this number
> but we removed ~150msec of overhead, and the old target was 200msec.
> So any driver that was passing previously should still pass with 75msec.
> 
> Separately we should probably follow up on defaulting to shell=False,
> when script doesn't explicitly ask for True, because the overhead
> is rather significant.

+1

> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
>  tools/testing/selftests/drivers/net/hw/rss_ctx.py | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/testing/selftests/drivers/net/hw/rss_ctx.py b/tools/testing/selftests/drivers/net/hw/rss_ctx.py
> index 9838b8457e5a..3fc5688605b5 100755
> --- a/tools/testing/selftests/drivers/net/hw/rss_ctx.py
> +++ b/tools/testing/selftests/drivers/net/hw/rss_ctx.py
> @@ -335,19 +335,20 @@ from lib.py import ethtool, ip, defer, GenerateTraffic, CmdExitFailure
>      data = get_rss(cfg)
>      key_len = len(data['rss-hash-key'])
>  
> -    key = _rss_key_rand(key_len)
> +    ethnl = EthtoolFamily()
> +    key = random.randbytes(key_len)

Is the update to the generation of key intended?
It's not clear to me how it relates to the rest of the patch.

>  
>      tgen = GenerateTraffic(cfg)
>      try:
>          errors0, carrier0 = get_drop_err_sum(cfg)
>          t0 = datetime.datetime.now()
> -        ethtool(f"-X {cfg.ifname} hkey " + _rss_key_str(key))
> +        ethnl.rss_set({"header": {"dev-index": cfg.ifindex}, "hkey": key})
>          t1 = datetime.datetime.now()
>          errors1, carrier1 = get_drop_err_sum(cfg)
>      finally:
>          tgen.wait_pkts_and_stop(5000)
>  
> -    ksft_lt((t1 - t0).total_seconds(), 0.2)
> +    ksft_lt((t1 - t0).total_seconds(), 0.075)
>      ksft_eq(errors1 - errors1, 0)
>      ksft_eq(carrier1 - carrier0, 0)
>  
> -- 
> 2.51.0
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next 1/2] selftests: drv-net: rss_ctx: use Netlink for timed reconfig
  2025-09-01 13:50 ` [PATCH net-next 1/2] selftests: drv-net: rss_ctx: use Netlink for timed reconfig Simon Horman
@ 2025-09-01 17:26   ` Jakub Kicinski
  2025-09-02  9:01     ` Simon Horman
  0 siblings, 1 reply; 5+ messages in thread
From: Jakub Kicinski @ 2025-09-01 17:26 UTC (permalink / raw)
  To: Simon Horman
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, ecree.xilinx, gal,
	joe, linux-kselftest, shuah

On Mon, 1 Sep 2025 14:50:08 +0100 Simon Horman wrote:
> > -    key = _rss_key_rand(key_len)
> > +    ethnl = EthtoolFamily()
> > +    key = random.randbytes(key_len)  
> 
> Is the update to the generation of key intended?
> It's not clear to me how it relates to the rest of the patch.

_rss_key_rand() gives us an array of integers in the range 0-255 while
randomg.randbytes() gives us a bytearray. Difference in return type.

Let me respin and add this to the commit msg. Looks like I was too
aggressive with the decrease in 75msec timing, CI hit a 120msec run :(

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next 1/2] selftests: drv-net: rss_ctx: use Netlink for timed reconfig
  2025-09-01 17:26   ` Jakub Kicinski
@ 2025-09-02  9:01     ` Simon Horman
  0 siblings, 0 replies; 5+ messages in thread
From: Simon Horman @ 2025-09-02  9:01 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, ecree.xilinx, gal,
	joe, linux-kselftest, shuah

On Mon, Sep 01, 2025 at 10:26:07AM -0700, Jakub Kicinski wrote:
> On Mon, 1 Sep 2025 14:50:08 +0100 Simon Horman wrote:
> > > -    key = _rss_key_rand(key_len)
> > > +    ethnl = EthtoolFamily()
> > > +    key = random.randbytes(key_len)  
> > 
> > Is the update to the generation of key intended?
> > It's not clear to me how it relates to the rest of the patch.
> 
> _rss_key_rand() gives us an array of integers in the range 0-255 while
> randomg.randbytes() gives us a bytearray. Difference in return type.

Thanks, it is clear now :)

> Let me respin and add this to the commit msg. Looks like I was too
> aggressive with the decrease in 75msec timing, CI hit a 120msec run :(
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-09-02  9:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-29 22:07 [PATCH net-next 1/2] selftests: drv-net: rss_ctx: use Netlink for timed reconfig Jakub Kicinski
2025-08-29 22:07 ` [PATCH net-next 2/2] selftests: drv-net: rss_ctx: make the test pass with few queues Jakub Kicinski
2025-09-01 13:50 ` [PATCH net-next 1/2] selftests: drv-net: rss_ctx: use Netlink for timed reconfig Simon Horman
2025-09-01 17:26   ` Jakub Kicinski
2025-09-02  9:01     ` Simon Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).