public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] selftests: drv-net: rss: validate min RSS table size
@ 2026-01-30 19:29 Jakub Kicinski
  2026-01-31 17:18 ` Willem de Bruijn
  2026-01-31 17:32 ` Eric Dumazet
  0 siblings, 2 replies; 5+ messages in thread
From: Jakub Kicinski @ 2026-01-30 19:29 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski,
	shuah, willemb, linux-kselftest

Add a test which checks that the RSS table is at least 4x the max
queue count supported by the device. The original RSS spec from
Microsoft stated that the RSS indirection table should be 2 to 8
times the CPU count, presumably assuming queue per CPU. If the
CPU count is not a power of two, however, a power-of-2 table
2x larger than queue count results in a 33% traffic imbalance.
Validate that the indirection table is at least 4x the queue
count. This lowers the imbalance to 16% which empirically
appears to be more acceptable to memcache-like workloads.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: shuah@kernel.org
CC: willemb@google.com
CC: linux-kselftest@vger.kernel.org
---
 .../testing/selftests/drivers/net/hw/Makefile |  1 +
 .../selftests/drivers/net/hw/rss_drv.py       | 88 +++++++++++++++++++
 2 files changed, 89 insertions(+)
 create mode 100755 tools/testing/selftests/drivers/net/hw/rss_drv.py

diff --git a/tools/testing/selftests/drivers/net/hw/Makefile b/tools/testing/selftests/drivers/net/hw/Makefile
index 9c163ba6feee..a64140333a46 100644
--- a/tools/testing/selftests/drivers/net/hw/Makefile
+++ b/tools/testing/selftests/drivers/net/hw/Makefile
@@ -35,6 +35,7 @@ TEST_PROGS = \
 	pp_alloc_fail.py \
 	rss_api.py \
 	rss_ctx.py \
+	rss_drv.py \
 	rss_flow_label.py \
 	rss_input_xfrm.py \
 	toeplitz.py \
diff --git a/tools/testing/selftests/drivers/net/hw/rss_drv.py b/tools/testing/selftests/drivers/net/hw/rss_drv.py
new file mode 100755
index 000000000000..2d1a33189076
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/hw/rss_drv.py
@@ -0,0 +1,88 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""
+Driver-related behavior tests for RSS.
+"""
+
+from lib.py import ksft_run, ksft_exit, ksft_ge
+from lib.py import ksft_variants, KsftNamedVariant, KsftSkipEx
+from lib.py import defer, ethtool
+from lib.py import EthtoolFamily, NlError
+from lib.py import NetDrvEnv
+
+
+def _is_power_of_two(n):
+    return n > 0 and (n & (n - 1)) == 0
+
+
+def _get_rss(cfg, context=0):
+    return ethtool(f"-x {cfg.ifname} context {context}", json=True)[0]
+
+
+def _test_rss_indir_size(cfg, qcnt, context=0):
+    """Test that indirection table size is at least 4x queue count."""
+    ethtool(f"-L {cfg.ifname} combined {qcnt}")
+
+    rss = _get_rss(cfg, context=context)
+    indir = rss['rss-indirection-table']
+    ksft_ge(len(indir), 4 * qcnt, "Table smaller than 4x")
+    return len(indir)
+
+
+def _maybe_create_context(cfg, create_context):
+    """ Either create a context and return its ID or return 0 for main ctx """
+    if not create_context:
+        return 0
+    try:
+        ctx = cfg.ethnl.rss_create_act({'header': {'dev-index': cfg.ifindex}})
+        ctx_id = ctx['context']
+        defer(cfg.ethnl.rss_delete_act,
+              {'header': {'dev-index': cfg.ifindex}, 'context': ctx_id})
+    except NlError:
+        raise KsftSkipEx("Device does not support additional RSS contexts")
+
+    return ctx_id
+
+
+@ksft_variants([
+    KsftNamedVariant("main", False),
+    KsftNamedVariant("ctx", True),
+])
+def indir_size_4x(cfg, create_context):
+    """
+    Test that the indirection table has at least 4 entries per queue.
+    Empirically network-heavy workloads like memcache suffer with the 33%
+    imbalance of a 2x indirection table size.
+    4x table translates to a 16% imbalance.
+    """
+    channels = cfg.ethnl.channels_get({'header': {'dev-index': cfg.ifindex}})
+    ch_max = channels.get('combined-max', 0)
+    qcnt = channels['combined-count']
+
+    if ch_max < 3:
+        raise KsftSkipEx(f"Not enough queues for the test: max={ch_max}")
+
+    defer(ethtool, f"-L {cfg.ifname} combined {qcnt}")
+    ethtool(f"-L {cfg.ifname} combined 3")
+
+    ctx_id = _maybe_create_context(cfg, create_context)
+
+    indir_sz = _test_rss_indir_size(cfg, 3, context=ctx_id)
+
+    # Test with max queue count (max - 1 if max is a power of two)
+    test_max = ch_max - 1 if _is_power_of_two(ch_max) else ch_max
+    if test_max > 3 and indir_sz < test_max * 4:
+        _test_rss_indir_size(cfg, test_max, context=ctx_id)
+
+
+def main() -> None:
+    """ Ksft boiler plate main """
+    with NetDrvEnv(__file__) as cfg:
+        cfg.ethnl = EthtoolFamily()
+        ksft_run([indir_size_4x], args=(cfg, ))
+    ksft_exit()
+
+
+if __name__ == "__main__":
+    main()
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] selftests: drv-net: rss: validate min RSS table size
  2026-01-30 19:29 [PATCH net-next] selftests: drv-net: rss: validate min RSS table size Jakub Kicinski
@ 2026-01-31 17:18 ` Willem de Bruijn
  2026-01-31 17:50   ` Jakub Kicinski
  2026-01-31 17:32 ` Eric Dumazet
  1 sibling, 1 reply; 5+ messages in thread
From: Willem de Bruijn @ 2026-01-31 17:18 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski,
	shuah, willemb, linux-kselftest

Jakub Kicinski wrote:
> Add a test which checks that the RSS table is at least 4x the max
> queue count supported by the device. The original RSS spec from
> Microsoft stated that the RSS indirection table should be 2 to 8
> times the CPU count, presumably assuming queue per CPU. If the
> CPU count is not a power of two, however, a power-of-2 table
> 2x larger than queue count results in a 33% traffic imbalance.
> Validate that the indirection table is at least 4x the queue
> count. This lowers the imbalance to 16% which empirically
> appears to be more acceptable to memcache-like workloads.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

> +def _test_rss_indir_size(cfg, qcnt, context=0):
> +    """Test that indirection table size is at least 4x queue count."""
> +    ethtool(f"-L {cfg.ifname} combined {qcnt}")

Remind me: does this work with devices that advertise RX N TX N rather
than combined N?

> +
> +    rss = _get_rss(cfg, context=context)
> +    indir = rss['rss-indirection-table']
> +    ksft_ge(len(indir), 4 * qcnt, "Table smaller than 4x")
> +    return len(indir)
> +
> +
> +@ksft_variants([
> +    KsftNamedVariant("main", False),
> +    KsftNamedVariant("ctx", True),
> +])
> +def indir_size_4x(cfg, create_context):
> +    """
> +    Test that the indirection table has at least 4 entries per queue.
> +    Empirically network-heavy workloads like memcache suffer with the 33%
> +    imbalance of a 2x indirection table size.
> +    4x table translates to a 16% imbalance.
> +    """
> +    channels = cfg.ethnl.channels_get({'header': {'dev-index': cfg.ifindex}})
> +    ch_max = channels.get('combined-max', 0)

Same here: not all drivers set this.

Perhaps we should skip if absent?

And does combined-max mean all queues across all contexts, or per
context? The test seems to imply the second. My intuition was the
first. Is it clearly defined across devices. per ethtool_channels,
seems per device?

  * @max_combined: Read only. Maximum number of combined channel the driver
  *      support. Set of queues RX, TX or other.


> +    qcnt = channels['combined-count']
> +
> +    if ch_max < 3:
> +        raise KsftSkipEx(f"Not enough queues for the test: max={ch_max}")
> +
> +    defer(ethtool, f"-L {cfg.ifname} combined {qcnt}")
> +    ethtool(f"-L {cfg.ifname} combined 3")
> +
> +    ctx_id = _maybe_create_context(cfg, create_context)
> +
> +    indir_sz = _test_rss_indir_size(cfg, 3, context=ctx_id)
> +
> +    # Test with max queue count (max - 1 if max is a power of two)
> +    test_max = ch_max - 1 if _is_power_of_two(ch_max) else ch_max
> +    if test_max > 3 and indir_sz < test_max * 4:
> +        _test_rss_indir_size(cfg, test_max, context=ctx_id)
> +


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] selftests: drv-net: rss: validate min RSS table size
  2026-01-30 19:29 [PATCH net-next] selftests: drv-net: rss: validate min RSS table size Jakub Kicinski
  2026-01-31 17:18 ` Willem de Bruijn
@ 2026-01-31 17:32 ` Eric Dumazet
  1 sibling, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2026-01-31 17:32 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, pabeni, andrew+netdev, horms, shuah, willemb,
	linux-kselftest

On Fri, Jan 30, 2026 at 8:29 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Add a test which checks that the RSS table is at least 4x the max
> queue count supported by the device. The original RSS spec from
> Microsoft stated that the RSS indirection table should be 2 to 8
> times the CPU count, presumably assuming queue per CPU. If the
> CPU count is not a power of two, however, a power-of-2 table
> 2x larger than queue count results in a 33% traffic imbalance.
> Validate that the indirection table is at least 4x the queue
> count. This lowers the imbalance to 16% which empirically
> appears to be more acceptable to memcache-like workloads.

I think this nice observation could be captured in
Documentation/networking/scaling.rst
amending your commit ?

commit b91f2e13c972c3f2f33ecc873b0ff0ada3fa1854    docs: networking:
document multi-RSS context

Thanks !

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] selftests: drv-net: rss: validate min RSS table size
  2026-01-31 17:18 ` Willem de Bruijn
@ 2026-01-31 17:50   ` Jakub Kicinski
  2026-01-31 18:38     ` Willem de Bruijn
  0 siblings, 1 reply; 5+ messages in thread
From: Jakub Kicinski @ 2026-01-31 17:50 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
	willemb, linux-kselftest

On Sat, 31 Jan 2026 12:18:55 -0500 Willem de Bruijn wrote:
> > +def _test_rss_indir_size(cfg, qcnt, context=0):
> > +    """Test that indirection table size is at least 4x queue count."""
> > +    ethtool(f"-L {cfg.ifname} combined {qcnt}")  
> 
> Remind me: does this work with devices that advertise RX N TX N rather
> than combined N?

It doesn't. But I haven't seen a device which doesn't at least
advertise combined made in this decade. I started typing this with
support for "rx" vs "combined" initially, but it complicates the code
for no real benefit.

If a driver which doesn't have "combined" starts reporting results 
to NIPA I'll care, right now - CBA.

> > +    rss = _get_rss(cfg, context=context)
> > +    indir = rss['rss-indirection-table']
> > +    ksft_ge(len(indir), 4 * qcnt, "Table smaller than 4x")
> > +    return len(indir)
> > +
> > +
> > +@ksft_variants([
> > +    KsftNamedVariant("main", False),
> > +    KsftNamedVariant("ctx", True),
> > +])
> > +def indir_size_4x(cfg, create_context):
> > +    """
> > +    Test that the indirection table has at least 4 entries per queue.
> > +    Empirically network-heavy workloads like memcache suffer with the 33%
> > +    imbalance of a 2x indirection table size.
> > +    4x table translates to a 16% imbalance.
> > +    """
> > +    channels = cfg.ethnl.channels_get({'header': {'dev-index': cfg.ifindex}})
> > +    ch_max = channels.get('combined-max', 0)  
> 
> Same here: not all drivers set this.
> 
> Perhaps we should skip if absent?
> 
> And does combined-max mean all queues across all contexts, or per
> context?

Could you rephrase? Not sure I understand.

channels are interrupts, we use it in place of max Rx queues because 
we don't have an API to allocate queues directly.

When queue count is changed and the user did not set the indirection
table of the main context the main context's indir table is auto-
-repopulated. It may also be resized.

The indir tables of additional contexts are not repopulated.
We do not have the concept of "default" indirection table in 
an additional context because it has no practical use (just use
the main table if you don't care about the queue selection!?)
Since we don't have an explicit API to size them, (yet),
we expect the size of additional contexts to follow the size 
of the main indirection table.

If a table is resized the expectation should be that the driver
folds/unfolds the existing table eg [0, 1, 0, 1] can fold into [0, 1]
or unfold into [0, 1, 0, 1, 0, 1, 0, 1]. Resizing tables of additional
contexts is currently broken / no possible. My colleague is supposed
to be working on fixing that but appears to be making slow progress :/

> The test seems to imply the second. My intuition was the
> first. Is it clearly defined across devices. per ethtool_channels,
> seems per device?
> 
>   * @max_combined: Read only. Maximum number of combined channel the driver
>   *      support. Set of queues RX, TX or other.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] selftests: drv-net: rss: validate min RSS table size
  2026-01-31 17:50   ` Jakub Kicinski
@ 2026-01-31 18:38     ` Willem de Bruijn
  0 siblings, 0 replies; 5+ messages in thread
From: Willem de Bruijn @ 2026-01-31 18:38 UTC (permalink / raw)
  To: Jakub Kicinski, Willem de Bruijn
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah,
	willemb, linux-kselftest

Reviewed-by: Willem de Bruijn <willemb@google.com>

Jakub Kicinski wrote:
> On Sat, 31 Jan 2026 12:18:55 -0500 Willem de Bruijn wrote:
> > > +def _test_rss_indir_size(cfg, qcnt, context=0):
> > > +    """Test that indirection table size is at least 4x queue count."""
> > > +    ethtool(f"-L {cfg.ifname} combined {qcnt}")  
> > 
> > Remind me: does this work with devices that advertise RX N TX N rather
> > than combined N?
> 
> It doesn't. But I haven't seen a device which doesn't at least
> advertise combined made in this decade. I started typing this with

I was indeed thinking of mlx4 as one example.

> support for "rx" vs "combined" initially, but it complicates the code
> for no real benefit.
> 
> If a driver which doesn't have "combined" starts reporting results 
> to NIPA I'll care, right now - CBA.

Fair. It's good to gracefully and skip on older platforms.
I guess the test does that by virtue of the KsftSkipEx if
combined-max < 3. Ack.

> > > +    rss = _get_rss(cfg, context=context)
> > > +    indir = rss['rss-indirection-table']
> > > +    ksft_ge(len(indir), 4 * qcnt, "Table smaller than 4x")
> > > +    return len(indir)
> > > +
> > > +
> > > +@ksft_variants([
> > > +    KsftNamedVariant("main", False),
> > > +    KsftNamedVariant("ctx", True),
> > > +])
> > > +def indir_size_4x(cfg, create_context):
> > > +    """
> > > +    Test that the indirection table has at least 4 entries per queue.
> > > +    Empirically network-heavy workloads like memcache suffer with the 33%
> > > +    imbalance of a 2x indirection table size.
> > > +    4x table translates to a 16% imbalance.
> > > +    """
> > > +    channels = cfg.ethnl.channels_get({'header': {'dev-index': cfg.ifindex}})
> > > +    ch_max = channels.get('combined-max', 0)  
> > 
> > Same here: not all drivers set this.
> > 
> > Perhaps we should skip if absent?
> > 
> > And does combined-max mean all queues across all contexts, or per
> > context?
> 
> Could you rephrase? Not sure I understand.

I implicitly assumed mutually exclusive sets of queues associated with
the two contexts.

If max channels is a driver global limit, then cannot 
exclusively assign max channels to both contexts.

But my assumption was wrong of course. Multiple RSS contexts can have
the same queues in their indirection tables. And this test does not
include any flow steering to an exclusive (set of) queue(s) for the
additional context.

I had not previously given any thought to how additional contexts'
indirection tables are (expected by the kernel to be) initialized.

> channels are interrupts, we use it in place of max Rx queues because 
> we don't have an API to allocate queues directly.
> 
> When queue count is changed and the user did not set the indirection
> table of the main context the main context's indir table is auto-
> -repopulated. It may also be resized.
> 
> The indir tables of additional contexts are not repopulated.
> We do not have the concept of "default" indirection table in 
> an additional context because it has no practical use (just use
> the main table if you don't care about the queue selection!?)
> Since we don't have an explicit API to size them, (yet),
> we expect the size of additional contexts to follow the size 
> of the main indirection table.
> 
> If a table is resized the expectation should be that the driver
> folds/unfolds the existing table eg [0, 1, 0, 1] can fold into [0, 1]
> or unfold into [0, 1, 0, 1, 0, 1, 0, 1]. Resizing tables of additional
> contexts is currently broken / no possible. My colleague is supposed
> to be working on fixing that but appears to be making slow progress :/
> 
> > The test seems to imply the second. My intuition was the
> > first. Is it clearly defined across devices. per ethtool_channels,
> > seems per device?
> > 
> >   * @max_combined: Read only. Maximum number of combined channel the driver
> >   *      support. Set of queues RX, TX or other.
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-01-31 18:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-30 19:29 [PATCH net-next] selftests: drv-net: rss: validate min RSS table size Jakub Kicinski
2026-01-31 17:18 ` Willem de Bruijn
2026-01-31 17:50   ` Jakub Kicinski
2026-01-31 18:38     ` Willem de Bruijn
2026-01-31 17:32 ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox