public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Tariq Toukan <tariqt@nvidia.com>
To: Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>
Cc: Donald Hunter <donald.hunter@gmail.com>,
	Simon Horman <horms@kernel.org>, Jiri Pirko <jiri@resnulli.us>,
	Jonathan Corbet <corbet@lwn.net>,
	Shuah Khan <skhan@linuxfoundation.org>,
	Saeed Mahameed <saeedm@nvidia.com>,
	"Leon Romanovsky" <leon@kernel.org>,
	Tariq Toukan <tariqt@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
	Chuck Lever <chuck.lever@oracle.com>,
	"Matthieu Baerts (NGI0)" <matttbe@kernel.org>,
	Cosmin Ratiu <cratiu@nvidia.com>,
	"Carolina Jubran" <cjubran@nvidia.com>,
	Daniel Zahka <daniel.zahka@gmail.com>,
	"Shay Drory" <shayd@nvidia.com>, Kees Cook <kees@kernel.org>,
	Daniel Jurgens <danielj@nvidia.com>,
	Moshe Shemesh <moshe@nvidia.com>,
	Adithya Jayachandran <ajayachandra@nvidia.com>,
	Willem de Bruijn <willemb@google.com>, David Wei <dw@davidwei.uk>,
	Petr Machata <petrm@nvidia.com>,
	Stanislav Fomichev <sdf@fomichev.me>,
	Vadim Fedorenko <vadim.fedorenko@linux.dev>,
	<netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-doc@vger.kernel.org>, <linux-rdma@vger.kernel.org>,
	<linux-kselftest@vger.kernel.org>, Gal Pressman <gal@nvidia.com>,
	Jiri Pirko <jiri@nvidia.com>
Subject: [PATCH net-next V8 13/14] selftests: drv-net: Add test for cross-esw rate scheduling
Date: Tue, 24 Mar 2026 14:28:47 +0200	[thread overview]
Message-ID: <20260324122848.36731-14-tariqt@nvidia.com> (raw)
In-Reply-To: <20260324122848.36731-1-tariqt@nvidia.com>

From: Cosmin Ratiu <cratiu@nvidia.com>

Adds a Python selftest using the YNL devlink API to verify the devlink
rate ops. The test requires a bond device given in the config as NETIF
containing two PFs. Test setup will then create 1 VF on each PF and
verify the various rate commands.

./devlink_rate_cross_esw.py
TAP version 13
1..3
ok 1 devlink_rate_cross_esw.test_same_esw_parent
ok 2 devlink_rate_cross_esw.test_cross_esw_parent
ok 3 devlink_rate_cross_esw.test_tx_rates_on_cross_esw

Tests will be skipped when the preconditions aren't met, when the
devlink API is too old or when the devices don't appear to support
cross-esw scheduling (detected via EOPNOTSUPP).

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../testing/selftests/drivers/net/hw/Makefile |   1 +
 .../drivers/net/hw/devlink_rate_cross_esw.py  | 300 ++++++++++++++++++
 2 files changed, 301 insertions(+)
 create mode 100755 tools/testing/selftests/drivers/net/hw/devlink_rate_cross_esw.py

diff --git a/tools/testing/selftests/drivers/net/hw/Makefile b/tools/testing/selftests/drivers/net/hw/Makefile
index 3c97dac9baaa..361fbb9fd44b 100644
--- a/tools/testing/selftests/drivers/net/hw/Makefile
+++ b/tools/testing/selftests/drivers/net/hw/Makefile
@@ -20,6 +20,7 @@ TEST_GEN_FILES := \
 TEST_PROGS = \
 	csum.py \
 	devlink_port_split.py \
+	devlink_rate_cross_esw.py \
 	devlink_rate_tc_bw.py \
 	devmem.py \
 	ethtool.sh \
diff --git a/tools/testing/selftests/drivers/net/hw/devlink_rate_cross_esw.py b/tools/testing/selftests/drivers/net/hw/devlink_rate_cross_esw.py
new file mode 100755
index 000000000000..0f3b4516c3b7
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/hw/devlink_rate_cross_esw.py
@@ -0,0 +1,300 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""
+Devlink Rate Cross-eswitch Scheduling Test Suite
+==================================================
+
+Control-plane tests for cross-eswitch TX scheduling via devlink-rate.
+Validates that VFs from different PFs on the same chip can share
+rate groups using the cross-device parent-dev attribute.
+
+Preconditions:
+- NETIF points to a bond device with exactly two interfaces.
+- the interfaces must be two PFs from different devices sharing the same chip.
+- (for mlx5): the two interfaces are in switchdev mode and configured in a LAG:
+  - devlink dev eswitch set $DEV1 mode switchdev
+  - devlink dev eswitch set $DEV2 mode switchdev
+  - devlink dev param set $DEV1 name esw_multiport value 1 cmode runtime
+  - devlink dev param set $DEV2 name esw_multiport value 1 cmode runtime
+- test cases will be skipped if:
+  - the number of interfaces in the bond device is != 2.
+  - the kernel doesn't support devlink rates.
+  - the devlink API doesn't support cross-device parents (ENODEV).
+  - cross-esw rate scheduling returns EOPNOTSUPP.
+"""
+
+import errno
+import glob
+import os
+import time
+
+from lib.py import ksft_pr, ksft_eq, ksft_run, ksft_exit
+from lib.py import KsftSkipEx, KsftFailEx
+from lib.py import NetDrvEnv, DevlinkFamily
+from lib.py import NlError
+from lib.py import cmd, defer, ip, tool
+
+
+# --- Discovery and setup ---
+
+
+def get_bond_slaves(bond_ifname):
+    """Returns sorted list of slave netdev names for a bond."""
+    pattern = f"/sys/class/net/{bond_ifname}/lower_*"
+    lowers = glob.glob(pattern)
+    if not lowers:
+        raise KsftSkipEx(f"No bond slaves for {bond_ifname}")
+    slaves = []
+    for path in sorted(lowers):
+        name = os.path.basename(path)
+        if name.startswith("lower_"):
+            name = name[len("lower_"):]
+        slaves.append(name)
+    return slaves
+
+
+def discover_pfs(cfg):
+    """Discovers both PFs from bond slaves."""
+    slaves = get_bond_slaves(cfg.ifname)
+    if len(slaves) != 2:
+        raise KsftSkipEx(f"Need 2 bond slaves, found {len(slaves)}")
+
+    pf0, pf1 = slaves[0], slaves[1]
+    ksft_pr(f"PF0: {pf0} PF1: {pf1}")
+    return pf0, pf1
+
+
+def get_pci_addr(ifname):
+    """Resolves PCI address for a network interface."""
+    return os.path.basename(os.path.realpath(f"/sys/class/net/{ifname}/device"))
+
+
+def get_vf_port_index(pf_pci):
+    """Finds devlink port-index for vf0 under pf_pci."""
+    ports = tool("devlink", "port show", json=True)["port"]
+    for port_name, props in ports.items():
+        if port_name.startswith(f"pci/{pf_pci}/") and props.get("vfnum") == 0:
+            return int(port_name.split("/")[-1])
+    raise KsftSkipEx(f"VF port not found for {pf_pci}")
+
+
+def cleanup_esw(pf):
+    """Removes VFs if created by tests."""
+    cmd(f"echo 0 > /sys/class/net/{pf}/device/sriov_numvfs", shell=True, fail=False)
+
+
+def setup_esw(pf):
+    """Creates 1 VF on 'pf'."""
+    path = f"/sys/class/net/{pf}/device/sriov_numvfs"
+    cmd(f"echo 0 > {path}", shell=True)
+    cmd(f"echo 1 > {path}", shell=True)
+    time.sleep(2)
+
+    vf_dir = f"/sys/class/net/{pf}/device/virtfn0/net"
+    entries = os.listdir(vf_dir) if os.path.isdir(vf_dir) else []
+    if not entries:
+        raise KsftSkipEx(f"VF not found for {pf}")
+    ip(f"link set dev {entries[0]} up")
+
+    pf_pci = get_pci_addr(pf)
+    vf_idx = get_vf_port_index(pf_pci)
+    ksft_pr(f"Created VF {vf_idx} on PF {pf} ({pf_pci})")
+    return pf_pci, vf_idx
+
+
+# --- Rate operation helpers ---
+
+
+def rate_new(devnl, dev_pci, node_name, **kwargs):
+    """Creates rate node."""
+    params = {
+        "bus-name": "pci",
+        "dev-name": dev_pci,
+        "rate-node-name": node_name,
+    }
+    params.update(kwargs)
+    try:
+        devnl.rate_new(params)
+    except NlError as e:
+        if e.error == errno.EOPNOTSUPP:
+            raise KsftSkipEx("rate_new not supported") from e
+        raise KsftFailEx("rate_new failed") from e
+
+
+def rate_get(devnl, dev_pci, node_name):
+    """Gets rate node."""
+    params = {
+        "bus-name": "pci",
+        "dev-name": dev_pci,
+        "rate-node-name": node_name,
+    }
+    return devnl.rate_get(params)
+
+
+def rate_get_leaf(devnl, dev_pci, port_index):
+    """Gets rate leaf (VF)."""
+    params = {
+        "bus-name": "pci",
+        "dev-name": dev_pci,
+        "port-index": port_index,
+    }
+    return devnl.rate_get(params)
+
+
+def rate_del(devnl, dev_pci, node_name):
+    """Deletes rate node."""
+    devnl.rate_del({
+        "bus-name": "pci",
+        "dev-name": dev_pci,
+        "rate-node-name": node_name,
+    })
+
+
+def rate_set_leaf(devnl, dev_pci, port_index, **kwargs):
+    """Sets rate attributes on a leaf (VF)."""
+    params = {
+        "bus-name": "pci",
+        "dev-name": dev_pci,
+        "port-index": port_index,
+    }
+    params.update(kwargs)
+    try:
+        devnl.rate_set(params)
+    except NlError as e:
+        if e.error == errno.EOPNOTSUPP:
+            raise KsftSkipEx("rate_set not supported") from e
+        raise KsftFailEx("rate_set failed") from e
+
+
+def rate_set_leaf_parent(devnl, dev_pci, port_index,
+                         parent_name, parent_dev_pci=None):
+    """Sets a leaf's parent, optionally cross-esw."""
+    params = {
+        "bus-name": "pci",
+        "dev-name": dev_pci,
+        "port-index": port_index,
+        "rate-parent-node-name": parent_name,
+    }
+    if parent_dev_pci:
+        params["parent-dev"] = {
+            "bus-name": "pci",
+            "dev-name": parent_dev_pci,
+        }
+    try:
+        devnl.rate_set(params)
+    except NlError as e:
+        if e.error == errno.EOPNOTSUPP:
+            raise KsftSkipEx("rate_set not supported") from e
+        if parent_dev_pci and e.error == errno.ENODEV:
+            raise KsftSkipEx("Cross-esw scheduling not supported") from e
+        raise KsftFailEx("rate_set failed") from e
+
+
+def rate_clear_leaf_parent(devnl, dev_pci, port_index):
+    """Clears a leaf's parent."""
+    rate_set_leaf_parent(devnl, dev_pci, port_index, "")
+
+
+def rate_set_node(devnl, dev_pci, node_name, **kwargs):
+    """Sets rate attributes on a node."""
+    params = {
+        "bus-name": "pci",
+        "dev-name": dev_pci,
+        "rate-node-name": node_name,
+    }
+    params.update(kwargs)
+    devnl.rate_set(params)
+
+
+# --- Test cases ---
+
+
+def test_same_esw_parent(cfg):
+    """Assigns PF0's VF to PF0's group (same esw baseline)."""
+    pf0, _ = discover_pfs(cfg)
+    pf0_pci, vf0_idx = setup_esw(pf0)
+    defer(cleanup_esw, pf0)
+
+    rate_new(cfg.devnl, pf0_pci, "group0")
+    defer(rate_del, cfg.devnl, pf0_pci, "group0")
+    ksft_pr("rate-new succeeded")
+
+    rate_set_leaf_parent(cfg.devnl, pf0_pci, vf0_idx, "group0")
+    defer(rate_clear_leaf_parent, cfg.devnl, pf0_pci, vf0_idx)
+
+    ksft_pr("Same-esw parent assignment succeeded")
+
+
+def test_cross_esw_parent(cfg):
+    """Sets cross-esw parent, then clear it."""
+    pf0, pf1 = discover_pfs(cfg)
+    pf0_pci, _ = setup_esw(pf0)
+    defer(cleanup_esw, pf0)
+    pf1_pci, vf1_idx = setup_esw(pf1)
+    defer(cleanup_esw, pf1)
+
+    rate_new(cfg.devnl, pf0_pci, "group1")
+    defer(rate_del, cfg.devnl, pf0_pci, "group1")
+    ksft_pr("rate-new succeeded")
+
+    rate_set_leaf_parent(cfg.devnl, pf1_pci, vf1_idx,
+                         "group1", parent_dev_pci=pf0_pci)
+    defer(rate_clear_leaf_parent, cfg.devnl, pf1_pci, vf1_idx)
+
+    ksft_pr("Cross-esw parent set and clear succeeded")
+
+
+def test_tx_rates_on_cross_esw(cfg):
+    """Sets tx_max on group and tx_share on leaves in a cross-esw setup."""
+    pf0, pf1 = discover_pfs(cfg)
+    pf0_pci, vf0_idx = setup_esw(pf0)
+    defer(cleanup_esw, pf0)
+    pf1_pci, vf1_idx = setup_esw(pf1)
+    defer(cleanup_esw, pf1)
+
+    rate_new(cfg.devnl, pf0_pci, "group2", **{"rate-tx-max": 10000000})
+    defer(rate_del, cfg.devnl, pf0_pci, "group2")
+    ksft_pr("rate-new succeeded")
+
+    rate_set_leaf_parent(cfg.devnl, pf1_pci, vf1_idx,
+                         "group2", parent_dev_pci=pf0_pci)
+    defer(rate_clear_leaf_parent, cfg.devnl, pf1_pci, vf1_idx)
+    ksft_pr("set parent cross-esw succeeded")
+
+    rate_set_leaf_parent(cfg.devnl, pf0_pci, vf0_idx, "group2")
+    defer(rate_clear_leaf_parent, cfg.devnl, pf0_pci, vf0_idx)
+    ksft_pr("set parent same esw succeeded")
+
+    rate_set_leaf(cfg.devnl, pf0_pci, vf0_idx, **{"rate-tx-share": 1000000})
+    rate = rate_get_leaf(cfg.devnl, pf0_pci, vf0_idx)
+    ksft_eq(rate["rate-tx-share"], 1000000)
+    rate_set_leaf(cfg.devnl, pf1_pci, vf1_idx, **{"rate-tx-share": 2000000})
+    rate = rate_get_leaf(cfg.devnl, pf1_pci, vf1_idx)
+    ksft_eq(rate["rate-tx-share"], 2000000)
+    rate_set_node(cfg.devnl, pf0_pci, "group2", **{"rate-tx-max": 250000000})
+    rate = rate_get(cfg.devnl, pf0_pci, "group2")
+    ksft_eq(rate["rate-tx-max"], 250000000)
+
+    ksft_pr("tx_max and tx_share set on cross-esw group")
+
+
+def main() -> None:
+    """Main function."""
+
+    with NetDrvEnv(__file__, nsim_test=False) as cfg:
+        cfg.devnl = DevlinkFamily()
+
+        ksft_run(
+            cases=[
+                test_same_esw_parent,
+                test_cross_esw_parent,
+                test_tx_rates_on_cross_esw,
+            ],
+            args=(cfg,),
+        )
+    ksft_exit()
+
+
+if __name__ == "__main__":
+    main()
-- 
2.44.0


  parent reply	other threads:[~2026-03-24 12:31 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-24 12:28 [PATCH net-next V8 00/14] devlink and mlx5: Support cross-function rate scheduling Tariq Toukan
2026-03-24 12:28 ` [PATCH net-next V8 01/14] devlink: Update nested instance locking comment Tariq Toukan
2026-03-24 13:05   ` Jiri Pirko
2026-03-24 12:28 ` [PATCH net-next V8 02/14] devlink: Add helpers to lock nested-in instances Tariq Toukan
2026-03-24 12:28 ` [PATCH net-next V8 03/14] devlink: Migrate from info->user_ptr to info->ctx Tariq Toukan
2026-03-24 13:05   ` Jiri Pirko
2026-03-24 12:28 ` [PATCH net-next V8 04/14] devlink: Decouple rate storage from associated devlink object Tariq Toukan
2026-03-24 12:28 ` [PATCH net-next V8 05/14] devlink: Add parent dev to devlink API Tariq Toukan
2026-03-24 12:28 ` [PATCH net-next V8 06/14] devlink: Allow parent dev for rate-set and rate-new Tariq Toukan
2026-03-24 19:16   ` Cosmin Ratiu
2026-03-24 12:28 ` [PATCH net-next V8 07/14] devlink: Allow rate node parents from other devlinks Tariq Toukan
2026-03-24 12:28 ` [PATCH net-next V8 08/14] net/mlx5: qos: Use mlx5_lag_query_bond_speed to query LAG speed Tariq Toukan
2026-03-24 12:28 ` [PATCH net-next V8 09/14] net/mlx5: qos: Expose a function to clear a vport's parent Tariq Toukan
2026-03-24 12:28 ` [PATCH net-next V8 10/14] net/mlx5: qos: Model the root node in the scheduling hierarchy Tariq Toukan
2026-03-24 12:28 ` [PATCH net-next V8 11/14] net/mlx5: qos: Remove qos domains and use shd lock Tariq Toukan
2026-03-24 12:28 ` [PATCH net-next V8 12/14] net/mlx5: qos: Support cross-device tx scheduling Tariq Toukan
2026-03-24 12:28 ` Tariq Toukan [this message]
2026-03-24 12:28 ` [PATCH net-next V8 14/14] net/mlx5: Document devlink rates Tariq Toukan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260324122848.36731-14-tariqt@nvidia.com \
    --to=tariqt@nvidia.com \
    --cc=ajayachandra@nvidia.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=chuck.lever@oracle.com \
    --cc=cjubran@nvidia.com \
    --cc=corbet@lwn.net \
    --cc=cratiu@nvidia.com \
    --cc=daniel.zahka@gmail.com \
    --cc=danielj@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=donald.hunter@gmail.com \
    --cc=dw@davidwei.uk \
    --cc=edumazet@google.com \
    --cc=gal@nvidia.com \
    --cc=horms@kernel.org \
    --cc=jiri@nvidia.com \
    --cc=jiri@resnulli.us \
    --cc=kees@kernel.org \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=matttbe@kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=moshe@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=petrm@nvidia.com \
    --cc=saeedm@nvidia.com \
    --cc=sdf@fomichev.me \
    --cc=shayd@nvidia.com \
    --cc=skhan@linuxfoundation.org \
    --cc=vadim.fedorenko@linux.dev \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox