From: Tariq Toukan <tariqt@nvidia.com>
To: Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>
Cc: Donald Hunter <donald.hunter@gmail.com>,
Simon Horman <horms@kernel.org>, Jiri Pirko <jiri@resnulli.us>,
Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Saeed Mahameed <saeedm@nvidia.com>,
"Leon Romanovsky" <leon@kernel.org>,
Tariq Toukan <tariqt@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
Chuck Lever <chuck.lever@oracle.com>,
"Matthieu Baerts (NGI0)" <matttbe@kernel.org>,
Cosmin Ratiu <cratiu@nvidia.com>,
"Carolina Jubran" <cjubran@nvidia.com>,
Daniel Zahka <daniel.zahka@gmail.com>,
"Shay Drory" <shayd@nvidia.com>, Kees Cook <kees@kernel.org>,
Daniel Jurgens <danielj@nvidia.com>,
Moshe Shemesh <moshe@nvidia.com>,
Adithya Jayachandran <ajayachandra@nvidia.com>,
Willem de Bruijn <willemb@google.com>, David Wei <dw@davidwei.uk>,
Petr Machata <petrm@nvidia.com>,
Stanislav Fomichev <sdf@fomichev.me>,
Vadim Fedorenko <vadim.fedorenko@linux.dev>,
<netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<linux-doc@vger.kernel.org>, <linux-rdma@vger.kernel.org>,
<linux-kselftest@vger.kernel.org>, Gal Pressman <gal@nvidia.com>,
Jiri Pirko <jiri@nvidia.com>
Subject: [PATCH net-next V8 13/14] selftests: drv-net: Add test for cross-esw rate scheduling
Date: Tue, 24 Mar 2026 14:28:47 +0200 [thread overview]
Message-ID: <20260324122848.36731-14-tariqt@nvidia.com> (raw)
In-Reply-To: <20260324122848.36731-1-tariqt@nvidia.com>
From: Cosmin Ratiu <cratiu@nvidia.com>
Adds a Python selftest using the YNL devlink API to verify the devlink
rate ops. The test requires a bond device given in the config as NETIF
containing two PFs. Test setup will then create 1 VF on each PF and
verify the various rate commands.
./devlink_rate_cross_esw.py
TAP version 13
1..3
ok 1 devlink_rate_cross_esw.test_same_esw_parent
ok 2 devlink_rate_cross_esw.test_cross_esw_parent
ok 3 devlink_rate_cross_esw.test_tx_rates_on_cross_esw
Tests will be skipped when the preconditions aren't met, when the
devlink API is too old or when the devices don't appear to support
cross-esw scheduling (detected via EOPNOTSUPP).
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../testing/selftests/drivers/net/hw/Makefile | 1 +
.../drivers/net/hw/devlink_rate_cross_esw.py | 300 ++++++++++++++++++
2 files changed, 301 insertions(+)
create mode 100755 tools/testing/selftests/drivers/net/hw/devlink_rate_cross_esw.py
diff --git a/tools/testing/selftests/drivers/net/hw/Makefile b/tools/testing/selftests/drivers/net/hw/Makefile
index 3c97dac9baaa..361fbb9fd44b 100644
--- a/tools/testing/selftests/drivers/net/hw/Makefile
+++ b/tools/testing/selftests/drivers/net/hw/Makefile
@@ -20,6 +20,7 @@ TEST_GEN_FILES := \
TEST_PROGS = \
csum.py \
devlink_port_split.py \
+ devlink_rate_cross_esw.py \
devlink_rate_tc_bw.py \
devmem.py \
ethtool.sh \
diff --git a/tools/testing/selftests/drivers/net/hw/devlink_rate_cross_esw.py b/tools/testing/selftests/drivers/net/hw/devlink_rate_cross_esw.py
new file mode 100755
index 000000000000..0f3b4516c3b7
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/hw/devlink_rate_cross_esw.py
@@ -0,0 +1,300 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""
+Devlink Rate Cross-eswitch Scheduling Test Suite
+==================================================
+
+Control-plane tests for cross-eswitch TX scheduling via devlink-rate.
+Validates that VFs from different PFs on the same chip can share
+rate groups using the cross-device parent-dev attribute.
+
+Preconditions:
+- NETIF points to a bond device with exactly two interfaces.
+- the interfaces must be two PFs from different devices sharing the same chip.
+- (for mlx5): the two interfaces are in switchdev mode and configured in a LAG:
+ - devlink dev eswitch set $DEV1 mode switchdev
+ - devlink dev eswitch set $DEV2 mode switchdev
+ - devlink dev param set $DEV1 name esw_multiport value 1 cmode runtime
+ - devlink dev param set $DEV2 name esw_multiport value 1 cmode runtime
+- test cases will be skipped if:
+ - the number of interfaces in the bond device is != 2.
+ - the kernel doesn't support devlink rates.
+ - the devlink API doesn't support cross-device parents (ENODEV).
+ - cross-esw rate scheduling returns EOPNOTSUPP.
+"""
+
+import errno
+import glob
+import os
+import time
+
+from lib.py import ksft_pr, ksft_eq, ksft_run, ksft_exit
+from lib.py import KsftSkipEx, KsftFailEx
+from lib.py import NetDrvEnv, DevlinkFamily
+from lib.py import NlError
+from lib.py import cmd, defer, ip, tool
+
+
+# --- Discovery and setup ---
+
+
+def get_bond_slaves(bond_ifname):
+ """Returns sorted list of slave netdev names for a bond."""
+ pattern = f"/sys/class/net/{bond_ifname}/lower_*"
+ lowers = glob.glob(pattern)
+ if not lowers:
+ raise KsftSkipEx(f"No bond slaves for {bond_ifname}")
+ slaves = []
+ for path in sorted(lowers):
+ name = os.path.basename(path)
+ if name.startswith("lower_"):
+ name = name[len("lower_"):]
+ slaves.append(name)
+ return slaves
+
+
+def discover_pfs(cfg):
+ """Discovers both PFs from bond slaves."""
+ slaves = get_bond_slaves(cfg.ifname)
+ if len(slaves) != 2:
+ raise KsftSkipEx(f"Need 2 bond slaves, found {len(slaves)}")
+
+ pf0, pf1 = slaves[0], slaves[1]
+ ksft_pr(f"PF0: {pf0} PF1: {pf1}")
+ return pf0, pf1
+
+
+def get_pci_addr(ifname):
+ """Resolves PCI address for a network interface."""
+ return os.path.basename(os.path.realpath(f"/sys/class/net/{ifname}/device"))
+
+
+def get_vf_port_index(pf_pci):
+ """Finds devlink port-index for vf0 under pf_pci."""
+ ports = tool("devlink", "port show", json=True)["port"]
+ for port_name, props in ports.items():
+ if port_name.startswith(f"pci/{pf_pci}/") and props.get("vfnum") == 0:
+ return int(port_name.split("/")[-1])
+ raise KsftSkipEx(f"VF port not found for {pf_pci}")
+
+
+def cleanup_esw(pf):
+ """Removes VFs if created by tests."""
+ cmd(f"echo 0 > /sys/class/net/{pf}/device/sriov_numvfs", shell=True, fail=False)
+
+
+def setup_esw(pf):
+ """Creates 1 VF on 'pf'."""
+ path = f"/sys/class/net/{pf}/device/sriov_numvfs"
+ cmd(f"echo 0 > {path}", shell=True)
+ cmd(f"echo 1 > {path}", shell=True)
+ time.sleep(2)
+
+ vf_dir = f"/sys/class/net/{pf}/device/virtfn0/net"
+ entries = os.listdir(vf_dir) if os.path.isdir(vf_dir) else []
+ if not entries:
+ raise KsftSkipEx(f"VF not found for {pf}")
+ ip(f"link set dev {entries[0]} up")
+
+ pf_pci = get_pci_addr(pf)
+ vf_idx = get_vf_port_index(pf_pci)
+ ksft_pr(f"Created VF {vf_idx} on PF {pf} ({pf_pci})")
+ return pf_pci, vf_idx
+
+
+# --- Rate operation helpers ---
+
+
+def rate_new(devnl, dev_pci, node_name, **kwargs):
+ """Creates rate node."""
+ params = {
+ "bus-name": "pci",
+ "dev-name": dev_pci,
+ "rate-node-name": node_name,
+ }
+ params.update(kwargs)
+ try:
+ devnl.rate_new(params)
+ except NlError as e:
+ if e.error == errno.EOPNOTSUPP:
+ raise KsftSkipEx("rate_new not supported") from e
+ raise KsftFailEx("rate_new failed") from e
+
+
+def rate_get(devnl, dev_pci, node_name):
+ """Gets rate node."""
+ params = {
+ "bus-name": "pci",
+ "dev-name": dev_pci,
+ "rate-node-name": node_name,
+ }
+ return devnl.rate_get(params)
+
+
+def rate_get_leaf(devnl, dev_pci, port_index):
+ """Gets rate leaf (VF)."""
+ params = {
+ "bus-name": "pci",
+ "dev-name": dev_pci,
+ "port-index": port_index,
+ }
+ return devnl.rate_get(params)
+
+
+def rate_del(devnl, dev_pci, node_name):
+ """Deletes rate node."""
+ devnl.rate_del({
+ "bus-name": "pci",
+ "dev-name": dev_pci,
+ "rate-node-name": node_name,
+ })
+
+
+def rate_set_leaf(devnl, dev_pci, port_index, **kwargs):
+ """Sets rate attributes on a leaf (VF)."""
+ params = {
+ "bus-name": "pci",
+ "dev-name": dev_pci,
+ "port-index": port_index,
+ }
+ params.update(kwargs)
+ try:
+ devnl.rate_set(params)
+ except NlError as e:
+ if e.error == errno.EOPNOTSUPP:
+ raise KsftSkipEx("rate_set not supported") from e
+ raise KsftFailEx("rate_set failed") from e
+
+
+def rate_set_leaf_parent(devnl, dev_pci, port_index,
+ parent_name, parent_dev_pci=None):
+ """Sets a leaf's parent, optionally cross-esw."""
+ params = {
+ "bus-name": "pci",
+ "dev-name": dev_pci,
+ "port-index": port_index,
+ "rate-parent-node-name": parent_name,
+ }
+ if parent_dev_pci:
+ params["parent-dev"] = {
+ "bus-name": "pci",
+ "dev-name": parent_dev_pci,
+ }
+ try:
+ devnl.rate_set(params)
+ except NlError as e:
+ if e.error == errno.EOPNOTSUPP:
+ raise KsftSkipEx("rate_set not supported") from e
+ if parent_dev_pci and e.error == errno.ENODEV:
+ raise KsftSkipEx("Cross-esw scheduling not supported") from e
+ raise KsftFailEx("rate_set failed") from e
+
+
+def rate_clear_leaf_parent(devnl, dev_pci, port_index):
+ """Clears a leaf's parent."""
+ rate_set_leaf_parent(devnl, dev_pci, port_index, "")
+
+
+def rate_set_node(devnl, dev_pci, node_name, **kwargs):
+ """Sets rate attributes on a node."""
+ params = {
+ "bus-name": "pci",
+ "dev-name": dev_pci,
+ "rate-node-name": node_name,
+ }
+ params.update(kwargs)
+ devnl.rate_set(params)
+
+
+# --- Test cases ---
+
+
+def test_same_esw_parent(cfg):
+ """Assigns PF0's VF to PF0's group (same esw baseline)."""
+ pf0, _ = discover_pfs(cfg)
+ pf0_pci, vf0_idx = setup_esw(pf0)
+ defer(cleanup_esw, pf0)
+
+ rate_new(cfg.devnl, pf0_pci, "group0")
+ defer(rate_del, cfg.devnl, pf0_pci, "group0")
+ ksft_pr("rate-new succeeded")
+
+ rate_set_leaf_parent(cfg.devnl, pf0_pci, vf0_idx, "group0")
+ defer(rate_clear_leaf_parent, cfg.devnl, pf0_pci, vf0_idx)
+
+ ksft_pr("Same-esw parent assignment succeeded")
+
+
+def test_cross_esw_parent(cfg):
+ """Sets cross-esw parent, then clear it."""
+ pf0, pf1 = discover_pfs(cfg)
+ pf0_pci, _ = setup_esw(pf0)
+ defer(cleanup_esw, pf0)
+ pf1_pci, vf1_idx = setup_esw(pf1)
+ defer(cleanup_esw, pf1)
+
+ rate_new(cfg.devnl, pf0_pci, "group1")
+ defer(rate_del, cfg.devnl, pf0_pci, "group1")
+ ksft_pr("rate-new succeeded")
+
+ rate_set_leaf_parent(cfg.devnl, pf1_pci, vf1_idx,
+ "group1", parent_dev_pci=pf0_pci)
+ defer(rate_clear_leaf_parent, cfg.devnl, pf1_pci, vf1_idx)
+
+ ksft_pr("Cross-esw parent set and clear succeeded")
+
+
+def test_tx_rates_on_cross_esw(cfg):
+ """Sets tx_max on group and tx_share on leaves in a cross-esw setup."""
+ pf0, pf1 = discover_pfs(cfg)
+ pf0_pci, vf0_idx = setup_esw(pf0)
+ defer(cleanup_esw, pf0)
+ pf1_pci, vf1_idx = setup_esw(pf1)
+ defer(cleanup_esw, pf1)
+
+ rate_new(cfg.devnl, pf0_pci, "group2", **{"rate-tx-max": 10000000})
+ defer(rate_del, cfg.devnl, pf0_pci, "group2")
+ ksft_pr("rate-new succeeded")
+
+ rate_set_leaf_parent(cfg.devnl, pf1_pci, vf1_idx,
+ "group2", parent_dev_pci=pf0_pci)
+ defer(rate_clear_leaf_parent, cfg.devnl, pf1_pci, vf1_idx)
+ ksft_pr("set parent cross-esw succeeded")
+
+ rate_set_leaf_parent(cfg.devnl, pf0_pci, vf0_idx, "group2")
+ defer(rate_clear_leaf_parent, cfg.devnl, pf0_pci, vf0_idx)
+ ksft_pr("set parent same esw succeeded")
+
+ rate_set_leaf(cfg.devnl, pf0_pci, vf0_idx, **{"rate-tx-share": 1000000})
+ rate = rate_get_leaf(cfg.devnl, pf0_pci, vf0_idx)
+ ksft_eq(rate["rate-tx-share"], 1000000)
+ rate_set_leaf(cfg.devnl, pf1_pci, vf1_idx, **{"rate-tx-share": 2000000})
+ rate = rate_get_leaf(cfg.devnl, pf1_pci, vf1_idx)
+ ksft_eq(rate["rate-tx-share"], 2000000)
+ rate_set_node(cfg.devnl, pf0_pci, "group2", **{"rate-tx-max": 250000000})
+ rate = rate_get(cfg.devnl, pf0_pci, "group2")
+ ksft_eq(rate["rate-tx-max"], 250000000)
+
+ ksft_pr("tx_max and tx_share set on cross-esw group")
+
+
+def main() -> None:
+ """Main function."""
+
+ with NetDrvEnv(__file__, nsim_test=False) as cfg:
+ cfg.devnl = DevlinkFamily()
+
+ ksft_run(
+ cases=[
+ test_same_esw_parent,
+ test_cross_esw_parent,
+ test_tx_rates_on_cross_esw,
+ ],
+ args=(cfg,),
+ )
+ ksft_exit()
+
+
+if __name__ == "__main__":
+ main()
--
2.44.0
next prev parent reply other threads:[~2026-03-24 12:31 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-24 12:28 [PATCH net-next V8 00/14] devlink and mlx5: Support cross-function rate scheduling Tariq Toukan
2026-03-24 12:28 ` [PATCH net-next V8 01/14] devlink: Update nested instance locking comment Tariq Toukan
2026-03-24 13:05 ` Jiri Pirko
2026-03-24 12:28 ` [PATCH net-next V8 02/14] devlink: Add helpers to lock nested-in instances Tariq Toukan
2026-03-24 12:28 ` [PATCH net-next V8 03/14] devlink: Migrate from info->user_ptr to info->ctx Tariq Toukan
2026-03-24 13:05 ` Jiri Pirko
2026-03-24 12:28 ` [PATCH net-next V8 04/14] devlink: Decouple rate storage from associated devlink object Tariq Toukan
2026-03-24 12:28 ` [PATCH net-next V8 05/14] devlink: Add parent dev to devlink API Tariq Toukan
2026-03-24 12:28 ` [PATCH net-next V8 06/14] devlink: Allow parent dev for rate-set and rate-new Tariq Toukan
2026-03-24 19:16 ` Cosmin Ratiu
2026-03-24 12:28 ` [PATCH net-next V8 07/14] devlink: Allow rate node parents from other devlinks Tariq Toukan
2026-03-24 12:28 ` [PATCH net-next V8 08/14] net/mlx5: qos: Use mlx5_lag_query_bond_speed to query LAG speed Tariq Toukan
2026-03-24 12:28 ` [PATCH net-next V8 09/14] net/mlx5: qos: Expose a function to clear a vport's parent Tariq Toukan
2026-03-24 12:28 ` [PATCH net-next V8 10/14] net/mlx5: qos: Model the root node in the scheduling hierarchy Tariq Toukan
2026-03-24 12:28 ` [PATCH net-next V8 11/14] net/mlx5: qos: Remove qos domains and use shd lock Tariq Toukan
2026-03-24 12:28 ` [PATCH net-next V8 12/14] net/mlx5: qos: Support cross-device tx scheduling Tariq Toukan
2026-03-24 12:28 ` Tariq Toukan [this message]
2026-03-24 12:28 ` [PATCH net-next V8 14/14] net/mlx5: Document devlink rates Tariq Toukan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260324122848.36731-14-tariqt@nvidia.com \
--to=tariqt@nvidia.com \
--cc=ajayachandra@nvidia.com \
--cc=andrew+netdev@lunn.ch \
--cc=chuck.lever@oracle.com \
--cc=cjubran@nvidia.com \
--cc=corbet@lwn.net \
--cc=cratiu@nvidia.com \
--cc=daniel.zahka@gmail.com \
--cc=danielj@nvidia.com \
--cc=davem@davemloft.net \
--cc=donald.hunter@gmail.com \
--cc=dw@davidwei.uk \
--cc=edumazet@google.com \
--cc=gal@nvidia.com \
--cc=horms@kernel.org \
--cc=jiri@nvidia.com \
--cc=jiri@resnulli.us \
--cc=kees@kernel.org \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=matttbe@kernel.org \
--cc=mbloch@nvidia.com \
--cc=moshe@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=petrm@nvidia.com \
--cc=saeedm@nvidia.com \
--cc=sdf@fomichev.me \
--cc=shayd@nvidia.com \
--cc=skhan@linuxfoundation.org \
--cc=vadim.fedorenko@linux.dev \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox