Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next 00/14] net: ethtool: let ops locked drivers run without rtnl_lock
@ 2026-05-28 23:16 Jakub Kicinski
  2026-05-28 23:16 ` [PATCH net-next 01/14] net: ethtool: cmis_cdb: hold instance lock for ops locked devices Jakub Kicinski
                   ` (14 more replies)
  0 siblings, 15 replies; 19+ messages in thread
From: Jakub Kicinski @ 2026-05-28 23:16 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, michael.chan,
	joshwash, tariqt, haiyangz, linux, maxime.chevallier, willemb,
	ernis, sdf.kernel, kory.maincent, danieller, idosch,
	Jakub Kicinski

We have been slowly moving towards removing the rtnl_lock dependency
in driver ops since the concept of "ops-locked" drivers have been
introduced last year. Since last year will take the netdev instance
lock before invoking any ndo or ethtool op of "ops-locked" drivers.

We dipped our toes into rtnl_lock-less ops with the queue binding API.
Queue stats, NAPI, and other netdev-netlink objects are also queried
without holding rtnl_lock already. It's time to take the next logical
step and lift the requirement from ethtool ops.

The direct motivation for this patchset is that ethtool ops often
involve communicating with device FW, and may take a long time
to complete. Aggressive polling of device state on machines
with 10+ NICs have been shown to significantly increase rtnl_lock
pressure.

There's a handful of areas which still need rtnl_lock (see below).
I decided to convert everything to rtnl_lock-less by default, and
add a set of flags which let the drivers request rtnl_lock to still
be taken. I don't love this, but I'm worried that opt-in would be
even more confusing.

Known issues / exclusions:
 - qdiscs - qdisc configuration currently assumes rtnl_lock, this
   is mostly impacting set_channels callback. qdisc config is probably
   the easiest one of the exclusions to tackle, it's fairly self-contained.
 - features - even tho feature changes are (correctly) plumbed to
   the driver thru ndos they are part of ethtool uAPI. ethtool itself
   calls netdev_features_change() if it spotted device changing features
   in response to the request. And some drivers call it themselves.
   Since features have to propagate to upper and lower devices anything
   that touches features is quite hard to take from under rtnl_lock.
 - phylink - phylink and SFP depend on rtnl_lock today, I suspect
   that this is purely for historic reasons. I started poking at
   it and don't really see a need for a global lock. But accessing
   the netdev instance lock from the SFP entry points will require
   some attention from the phylink folks.
 - phydev - similar to phylink, looks quite doable. But no ops-locked
   driver currently has a phydev (fbnic only uses phylink) so phydev
   related paths retain a ASSERT_RTNL() for now.

Tested on mlx5, bnxt and fbnic.

Jakub Kicinski (14):
  net: ethtool: cmis_cdb: hold instance lock for ops locked devices
  net: ethtool: make sure __ethtool_get_link_ksettings() is ops-locked
  net: ethtool: serialize broadcast notification sequence allocation
  net: ethtool: relax ethnl_req_get_phydev() locking assertion
  net: ethtool: make dev->hwprov ops-protected
  net: ethtool: optionally skip rtnl_lock on Netlink path for GET ops
  net: ethtool: optionally skip rtnl_lock on Netlink path for SET ops
  net: ethtool: optionally skip rtnl_lock in cable test handlers
  net: ethtool: optionally skip rtnl_lock in ethnl_tsinfo_dumpit()
  net: ethtool: optionally skip rtnl_lock in ethnl_act_module_fw_flash()
  net: ethtool: optionally skip rtnl_lock in RSS context handlers
  net: ethtool: ioctl: concentrate the locking
  net: ethtool: optionally skip rtnl_lock on IOCTL path
  docs: net: ethtool: document ops-locked drivers and op_needs_rtnl

 Documentation/networking/netdev-features.rst  |   7 +
 Documentation/networking/netdevices.rst       |  17 ++-
 include/linux/ethtool.h                       |  38 ++++-
 include/linux/netdevice.h                     |   3 +
 include/linux/phy_link_topology.h             |   5 +
 include/net/netdev_lock.h                     |  17 +++
 net/ethtool/common.h                          |  76 ++++++++++
 net/ethtool/netlink.h                         |   1 -
 .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c |   4 +
 drivers/net/ethernet/google/gve/gve_ethtool.c |   2 +
 .../ethernet/mellanox/mlx5/core/en_ethtool.c  |   3 +
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |   2 +
 .../mellanox/mlx5/core/ipoib/ethtool.c        |   2 +
 .../net/ethernet/meta/fbnic/fbnic_ethtool.c   |   5 +
 .../ethernet/microsoft/mana/mana_ethtool.c    |   2 +
 drivers/net/netdevsim/ethtool.c               |   1 +
 drivers/net/phy/phy_device.c                  |   3 +
 drivers/net/phy/phy_link_topology.c           |  10 ++
 net/core/dev_ioctl.c                          |   4 +-
 net/ethtool/cabletest.c                       |  12 +-
 net/ethtool/cmis_cdb.c                        |   3 +
 net/ethtool/cmis_fw_update.c                  |   8 +-
 net/ethtool/ioctl.c                           | 135 ++++++++++++++----
 net/ethtool/linkinfo.c                        |   4 +-
 net/ethtool/linkmodes.c                       |   4 +-
 net/ethtool/mm.c                              |   5 +-
 net/ethtool/module.c                          |   8 +-
 net/ethtool/netlink.c                         |  62 +++++---
 net/ethtool/phy.c                             |   1 -
 net/ethtool/rss.c                             |  21 +--
 net/ethtool/tsconfig.c                        |  10 +-
 net/ethtool/tsinfo.c                          |  32 ++---
 32 files changed, 385 insertions(+), 122 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2026-05-29 14:27 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-28 23:16 [PATCH net-next 00/14] net: ethtool: let ops locked drivers run without rtnl_lock Jakub Kicinski
2026-05-28 23:16 ` [PATCH net-next 01/14] net: ethtool: cmis_cdb: hold instance lock for ops locked devices Jakub Kicinski
2026-05-29 11:25   ` Jakub Sitnicki
2026-05-28 23:16 ` [PATCH net-next 02/14] net: ethtool: make sure __ethtool_get_link_ksettings() is ops-locked Jakub Kicinski
2026-05-28 23:16 ` [PATCH net-next 03/14] net: ethtool: serialize broadcast notification sequence allocation Jakub Kicinski
2026-05-28 23:16 ` [PATCH net-next 04/14] net: ethtool: relax ethnl_req_get_phydev() locking assertion Jakub Kicinski
2026-05-29  8:43   ` Maxime Chevallier
2026-05-29 14:27     ` Jakub Kicinski
2026-05-28 23:16 ` [PATCH net-next 05/14] net: ethtool: make dev->hwprov ops-protected Jakub Kicinski
2026-05-28 23:16 ` [PATCH net-next 06/14] net: ethtool: optionally skip rtnl_lock on Netlink path for GET ops Jakub Kicinski
2026-05-28 23:16 ` [PATCH net-next 07/14] net: ethtool: optionally skip rtnl_lock on Netlink path for SET ops Jakub Kicinski
2026-05-28 23:16 ` [PATCH net-next 08/14] net: ethtool: optionally skip rtnl_lock in cable test handlers Jakub Kicinski
2026-05-28 23:16 ` [PATCH net-next 09/14] net: ethtool: optionally skip rtnl_lock in ethnl_tsinfo_dumpit() Jakub Kicinski
2026-05-28 23:16 ` [PATCH net-next 10/14] net: ethtool: optionally skip rtnl_lock in ethnl_act_module_fw_flash() Jakub Kicinski
2026-05-28 23:16 ` [PATCH net-next 11/14] net: ethtool: optionally skip rtnl_lock in RSS context handlers Jakub Kicinski
2026-05-28 23:16 ` [PATCH net-next 12/14] net: ethtool: ioctl: concentrate the locking Jakub Kicinski
2026-05-28 23:16 ` [PATCH net-next 13/14] net: ethtool: optionally skip rtnl_lock on IOCTL path Jakub Kicinski
2026-05-28 23:16 ` [PATCH net-next 14/14] docs: net: ethtool: document ops-locked drivers and op_needs_rtnl Jakub Kicinski
2026-05-29  7:41 ` [syzbot ci] Re: net: ethtool: let ops locked drivers run without rtnl_lock syzbot ci

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox