public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next V9 00/14] devlink and mlx5: Support cross-function rate scheduling
@ 2026-03-26  6:59 Tariq Toukan
  2026-03-26  6:59 ` [PATCH net-next V9 01/14] devlink: Update nested instance locking comment Tariq Toukan
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: Tariq Toukan @ 2026-03-26  6:59 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Donald Hunter, Simon Horman, Jiri Pirko, Jonathan Corbet,
	Shuah Khan, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Shuah Khan, Chuck Lever, Matthieu Baerts (NGI0),
	Carolina Jubran, Cosmin Ratiu, Dragos Tatulea, Jacob Keller,
	Shahar Shitrit, Daniel Zahka, Parav Pandit, Adithya Jayachandran,
	Kees Cook, Shay Drori, Daniel Jurgens, Moshe Shemesh,
	Willem de Bruijn, David Wei, Petr Machata, Stanislav Fomichev,
	Daniel Borkmann, Joe Damato, Nikolay Aleksandrov, Vadim Fedorenko,
	Michael S. Tsirkin, Antonio Quartulli, Allison Henderson,
	Bui Quang Minh, Nimrod Oren, netdev, linux-kernel, linux-doc,
	linux-rdma, linux-kselftest, Gal Pressman, Jiri Pirko

Hi,

This series by Cosmin adds support for cross-function rate scheduling in
devlink and mlx5.
See detailed explanation by Cosmin below [0].

Regards,
Tariq

[0]
devlink objects support rate management for TX scheduling, which
involves maintaining a tree of rate nodes that corresponds to TX
schedulers in hardware. 'man devlink-rate' has the full details.

The tree of rate nodes is maintained per devlink object, protected by
the devlink lock.

There exists hardware capable of instantiating TX scheduling trees
spanning multiple functions of the same physical device (and thus
devlink objects) and therefore the current API and locking scheme is
insufficient.

This patch series changes the devlink rate implementation and API to
allow supporting such hardware and managing TX scheduling trees across
multiple functions of a physical device.

Modeling this requires having devlink rate nodes with parents in other
devlink objects. A naive approach that relies on the current
one-lock-per-devlink model is impossible, as it would require in some
cases acquiring multiple devlink locks in the correct order.

The solution proposed in this patch series makes use of the recently
introduced shared devlink instance [1] to manage rate hierarchy changes
across multiple functions.

V1 of this patch series was sent a long time ago [2], using a different
approach of storing rates in a shared rate domain with special locking
rules. This new approach uses standard devlink instances and nesting.

The first part of the series adds support to devlink rates for
maintaining the rate tree across multiple functions.

The second part changes the mlx5 implementation to make use of this (and
cleans up remnants of the previous approach, involving rate domains).

The neat part about using the shared devlink object is that it works for
SFs as well, which are already nested in their parent PF instances. So
with this series, complex scheduling trees spanning multiple SFs across
multiple PFs of the same NIC can now be supported.

Patches:

devlink rate changes for cross-device TX scheduling:
devlink: Add helpers to lock nested-in instances
devlink: Migrate from info->user_ptr to info->ctx
devlink: Decouple rate storage from associated devlink object
devlink: Add parent dev to devlink API
devlink: Allow parent dev for rate-set and rate-new
devlink: Allow rate node parents from other devlinks

mlx5 support for cross-device TX scheduling:
net/mlx5: qos: Use mlx5_lag_query_bond_speed to query LAG speed
net/mlx5: qos: Expose a function to clear a vport's parent
net/mlx5: qos: Model the root node in the scheduling hierarchy
net/mlx5: qos: Remove qos domains and use shd lock
net/mlx5: qos: Support cross-device tx scheduling
selftests: drv-net: Add test for cross-esw rate scheduling
net/mlx5: Document devlink rates

[1] https://lore.kernel.org/all/20260312100407.551173-1-jiri@resnulli.us/T/#u
[2] https://lore.kernel.org/netdev/20250213180134.323929-1-tariqt@nvidia.com/

V9:
- Link to V8:
  https://lore.kernel.org/all/20260324122848.36731-1-tariqt@nvidia.com/
- Updated netlink_gen.c policy related to index, missed during a rebase.
- Fixed off-by-one error in devlink_get_parent_from_attrs_lock.
- Fixed crash in qos.c when setting a parent to NULL.

Cosmin Ratiu (14):
  devlink: Update nested instance locking comment
  devlink: Add helpers to lock nested-in instances
  devlink: Migrate from info->user_ptr to info->ctx
  devlink: Decouple rate storage from associated devlink object
  devlink: Add parent dev to devlink API
  devlink: Allow parent dev for rate-set and rate-new
  devlink: Allow rate node parents from other devlinks
  net/mlx5: qos: Use mlx5_lag_query_bond_speed to query LAG speed
  net/mlx5: qos: Expose a function to clear a vport's parent
  net/mlx5: qos: Model the root node in the scheduling hierarchy
  net/mlx5: qos: Remove qos domains and use shd lock
  net/mlx5: qos: Support cross-device tx scheduling
  selftests: drv-net: Add test for cross-esw rate scheduling
  net/mlx5: Document devlink rates

 Documentation/netlink/specs/devlink.yaml      |  24 +-
 .../networking/devlink/devlink-port.rst       |   2 +
 Documentation/networking/devlink/index.rst    |   4 +-
 Documentation/networking/devlink/mlx5.rst     |  33 ++
 .../net/ethernet/mellanox/mlx5/core/devlink.c |   1 +
 .../mellanox/mlx5/core/esw/devlink_port.c     |   2 +-
 .../net/ethernet/mellanox/mlx5/core/esw/qos.c | 532 +++++++-----------
 .../net/ethernet/mellanox/mlx5/core/esw/qos.h |   3 -
 .../net/ethernet/mellanox/mlx5/core/eswitch.c |   8 -
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  15 +-
 include/net/devlink.h                         |   5 +
 include/uapi/linux/devlink.h                  |   2 +
 net/devlink/core.c                            |  42 ++
 net/devlink/dev.c                             |  16 +-
 net/devlink/devl_internal.h                   |  19 +
 net/devlink/dpipe.c                           |  14 +-
 net/devlink/health.c                          |  12 +-
 net/devlink/linecard.c                        |   4 +-
 net/devlink/netlink.c                         |  82 ++-
 net/devlink/netlink_gen.c                     |  24 +-
 net/devlink/netlink_gen.h                     |   8 +
 net/devlink/param.c                           |   4 +-
 net/devlink/port.c                            |  18 +-
 net/devlink/rate.c                            | 290 ++++++++--
 net/devlink/region.c                          |   6 +-
 net/devlink/resource.c                        |   6 +-
 net/devlink/sb.c                              |  22 +-
 net/devlink/trap.c                            |  12 +-
 .../testing/selftests/drivers/net/hw/Makefile |   1 +
 .../drivers/net/hw/devlink_rate_cross_esw.py  | 300 ++++++++++
 30 files changed, 1032 insertions(+), 479 deletions(-)
 create mode 100755 tools/testing/selftests/drivers/net/hw/devlink_rate_cross_esw.py


base-commit: b50a48b65cb0c955728894b24ba5ecc3cd4b8c7d
-- 
2.44.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-03-26  7:03 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-26  6:59 [PATCH net-next V9 00/14] devlink and mlx5: Support cross-function rate scheduling Tariq Toukan
2026-03-26  6:59 ` [PATCH net-next V9 01/14] devlink: Update nested instance locking comment Tariq Toukan
2026-03-26  6:59 ` [PATCH net-next V9 02/14] devlink: Add helpers to lock nested-in instances Tariq Toukan
2026-03-26  6:59 ` [PATCH net-next V9 03/14] devlink: Migrate from info->user_ptr to info->ctx Tariq Toukan
2026-03-26  6:59 ` [PATCH net-next V9 04/14] devlink: Decouple rate storage from associated devlink object Tariq Toukan
2026-03-26  6:59 ` [PATCH net-next V9 05/14] devlink: Add parent dev to devlink API Tariq Toukan
2026-03-26  6:59 ` [PATCH net-next V9 06/14] devlink: Allow parent dev for rate-set and rate-new Tariq Toukan
2026-03-26  6:59 ` [PATCH net-next V9 07/14] devlink: Allow rate node parents from other devlinks Tariq Toukan
2026-03-26  6:59 ` [PATCH net-next V9 08/14] net/mlx5: qos: Use mlx5_lag_query_bond_speed to query LAG speed Tariq Toukan
2026-03-26  6:59 ` [PATCH net-next V9 09/14] net/mlx5: qos: Expose a function to clear a vport's parent Tariq Toukan
2026-03-26  6:59 ` [PATCH net-next V9 10/14] net/mlx5: qos: Model the root node in the scheduling hierarchy Tariq Toukan
2026-03-26  6:59 ` [PATCH net-next V9 11/14] net/mlx5: qos: Remove qos domains and use shd lock Tariq Toukan
2026-03-26  6:59 ` [PATCH net-next V9 12/14] net/mlx5: qos: Support cross-device tx scheduling Tariq Toukan
2026-03-26  6:59 ` [PATCH net-next V9 13/14] selftests: drv-net: Add test for cross-esw rate scheduling Tariq Toukan
2026-03-26  6:59 ` [PATCH net-next V9 14/14] net/mlx5: Document devlink rates Tariq Toukan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox