From: Jiri Pirko <jiri@resnulli.us>
To: netdev@vger.kernel.org
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, horms@kernel.org, donald.hunter@gmail.com,
corbet@lwn.net, skhan@linuxfoundation.org, saeedm@nvidia.com,
leon@kernel.org, tariqt@nvidia.com, mbloch@nvidia.com,
przemyslaw.kitszel@intel.com, mschmidt@redhat.com,
andrew+netdev@lunn.ch, rostedt@goodmis.org, mhiramat@kernel.org,
mathieu.desnoyers@efficios.com, chuck.lever@oracle.com,
matttbe@kernel.org, cjubran@nvidia.com, daniel.zahka@gmail.com,
linux-doc@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-trace-kernel@vger.kernel.org
Subject: [PATCH net-next v3 00/13] devlink: introduce shared devlink instance for PFs on same chip
Date: Wed, 4 Mar 2026 17:00:09 +0100 [thread overview]
Message-ID: <20260304160022.6114-1-jiri@resnulli.us> (raw)
From: Jiri Pirko <jiri@nvidia.com>
Multiple PFs on a network adapter often reside on the same physical
chip, running a single firmware. Some resources and configurations
are inherently shared among these PFs - PTP clocks, VF group rates,
firmware parameters, and others. Today there is no good object in
the devlink model to attach these chip-wide configuration knobs to.
Drivers resort to workarounds like pinning shared state to PF0 or
maintaining ad-hoc internal structures (e.g., ice_adapter) that are
invisible to userspace.
This problem was discussed extensively starting with Przemek Kitszel's
"whole device devlink instance" RFC for the ice driver [1]. Several
approaches for representing the parent instance were considered:
using a partial PCI BDF as the dev_name (breaks when PFs have different
BDFs in VMs), creating a per-driver bus, using auxiliary devices, or
using faux devices. All of these required a backing struct device for
the parent devlink instance, which does not naturally exist - there is
no PCI device that represents the chip as a whole.
This patchset takes a different approach: allow devlink instances to
exist without any backing struct device. The instance is identified
purely by its internal index, exposed over devlin netlink. This avoids
fabricating fake devices and keeps the devlink handle semantics clean.
The first ten patches prepare the devlink core for device-less
instances by decoupling the handle from the parent device. The last
three introduce the shared devlink infrastructure and its first user
in the mlx5 driver.
Example output showing the shared instance and nesting:
pci/0000:08:00.0: index 0
nested_devlink:
auxiliary/mlx5_core.eth.0
devlink_index/1: index 1
nested_devlink:
pci/0000:08:00.0
pci/0000:08:00.1
auxiliary/mlx5_core.eth.0: index 2
pci/0000:08:00.1: index 3
nested_devlink:
auxiliary/mlx5_core.eth.1
auxiliary/mlx5_core.eth.1: index 4
[1] https://lore.kernel.org/netdev/20250219164410.35665-1-przemyslaw.kitszel@intel.com/
---
Decoupled from "devlink and mlx5: Support cross-function rate scheduling"
patchset to maintain 15-patches limit.
See individual patches for changelog.
Jiri Pirko (13):
devlink: expose devlink instance index over netlink
devlink: add helpers to get bus_name/dev_name
devlink: avoid extra iterations when found devlink is not registered
devlink: allow to use devlink index as a command handle
devlink: support index-based lookup via bus_name/dev_name handle
devlink: support index-based notification filtering
devlink: introduce __devlink_alloc() with dev driver pointer
devlink: add devlink_dev_driver_name() helper and use it in trace
events
devlink: add devl_warn() helper and use it in port warnings
devlink: allow devlink instance allocation without a backing device
devlink: introduce shared devlink instance for PFs on same chip
documentation: networking: add shared devlink documentation
net/mlx5: Add a shared devlink instance for PFs on same chip
Documentation/netlink/specs/devlink.yaml | 56 +++
.../networking/devlink/devlink-shared.rst | 97 +++++
Documentation/networking/devlink/index.rst | 1 +
.../net/ethernet/mellanox/mlx5/core/Makefile | 5 +-
.../net/ethernet/mellanox/mlx5/core/main.c | 17 +
.../ethernet/mellanox/mlx5/core/sh_devlink.c | 61 +++
.../ethernet/mellanox/mlx5/core/sh_devlink.h | 12 +
include/linux/mlx5/driver.h | 1 +
include/net/devlink.h | 10 +
include/trace/events/devlink.h | 36 +-
include/uapi/linux/devlink.h | 4 +
net/devlink/Makefile | 2 +-
net/devlink/core.c | 91 ++++-
net/devlink/dev.c | 8 +-
net/devlink/devl_internal.h | 34 +-
net/devlink/netlink.c | 57 ++-
net/devlink/netlink_gen.c | 350 +++++++++++-------
net/devlink/port.c | 19 +-
net/devlink/sh_dev.c | 161 ++++++++
19 files changed, 813 insertions(+), 209 deletions(-)
create mode 100644 Documentation/networking/devlink/devlink-shared.rst
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sh_devlink.c
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sh_devlink.h
create mode 100644 net/devlink/sh_dev.c
--
2.51.1
next reply other threads:[~2026-03-04 16:00 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-04 16:00 Jiri Pirko [this message]
2026-03-04 16:00 ` [PATCH net-next v3 01/13] devlink: expose devlink instance index over netlink Jiri Pirko
2026-03-07 3:32 ` Jakub Kicinski
2026-03-07 7:52 ` Jiri Pirko
2026-03-04 16:00 ` [PATCH net-next v3 02/13] devlink: add helpers to get bus_name/dev_name Jiri Pirko
2026-03-04 16:00 ` [PATCH net-next v3 03/13] devlink: avoid extra iterations when found devlink is not registered Jiri Pirko
2026-03-04 16:00 ` [PATCH net-next v3 04/13] devlink: allow to use devlink index as a command handle Jiri Pirko
2026-03-04 16:00 ` [PATCH net-next v3 05/13] devlink: support index-based lookup via bus_name/dev_name handle Jiri Pirko
2026-03-04 16:00 ` [PATCH net-next v3 06/13] devlink: support index-based notification filtering Jiri Pirko
2026-03-04 16:00 ` [PATCH net-next v3 07/13] devlink: introduce __devlink_alloc() with dev driver pointer Jiri Pirko
2026-03-04 16:00 ` [PATCH net-next v3 08/13] devlink: add devlink_dev_driver_name() helper and use it in trace events Jiri Pirko
2026-03-04 16:00 ` [PATCH net-next v3 09/13] devlink: add devl_warn() helper and use it in port warnings Jiri Pirko
2026-03-04 16:00 ` [PATCH net-next v3 10/13] devlink: allow devlink instance allocation without a backing device Jiri Pirko
2026-03-04 16:00 ` [PATCH net-next v3 11/13] devlink: introduce shared devlink instance for PFs on same chip Jiri Pirko
2026-03-04 16:00 ` [PATCH net-next v3 12/13] documentation: networking: add shared devlink documentation Jiri Pirko
2026-03-04 16:00 ` [PATCH net-next v3 13/13] net/mlx5: Add a shared devlink instance for PFs on same chip Jiri Pirko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260304160022.6114-1-jiri@resnulli.us \
--to=jiri@resnulli.us \
--cc=andrew+netdev@lunn.ch \
--cc=chuck.lever@oracle.com \
--cc=cjubran@nvidia.com \
--cc=corbet@lwn.net \
--cc=daniel.zahka@gmail.com \
--cc=davem@davemloft.net \
--cc=donald.hunter@gmail.com \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=matttbe@kernel.org \
--cc=mbloch@nvidia.com \
--cc=mhiramat@kernel.org \
--cc=mschmidt@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=przemyslaw.kitszel@intel.com \
--cc=rostedt@goodmis.org \
--cc=saeedm@nvidia.com \
--cc=skhan@linuxfoundation.org \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox