public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Kory Maincent <kory.maincent@bootlin.com>
To: Corey Leavitt via B4 Relay <devnull+corey.leavitt.info@kernel.org>
Cc: corey@leavitt.info, Oleksij Rempel <o.rempel@pengutronix.de>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Heiner Kallweit <hkallweit1@gmail.com>,
	Russell King <linux@armlinux.org.uk>,
	Andrew Lunn <andrew@lunn.ch>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC net-next 4/4] net: phy: own phydev->psec via PSE notifier and remove fwnode_mdio hook
Date: Fri, 24 Apr 2026 14:36:16 +0200	[thread overview]
Message-ID: <20260424143616.54d747a8@kmaincent-XPS-13-7390> (raw)
In-Reply-To: <20260423-pse-notifier-decouple-v1-4-86ed750a9d62@leavitt.info>

On Thu, 23 Apr 2026 01:42:17 -0600
Corey Leavitt via B4 Relay <devnull+corey.leavitt.info@kernel.org> wrote:

> From: Corey Leavitt <corey@leavitt.info>
> 
> Transfer ownership of phydev->psec from fwnode_mdio to the phy
> subsystem itself. The phy subsystem now subscribes to the pse-pd
> notifier chain and manages psec attach/detach in response to PSE
> controller lifecycle events, while fwnode_mdio loses its PSE
> awareness entirely.
> 
> Split phy_device_register() into a public entry point that takes
> rtnl_lock() and a phy_device_register_locked() variant that assumes
> rtnl is already held. Callers that already hold rtnl (the SFP
> module state machine via __sfp_sm_event) use the _locked form to
> avoid deadlock; all other callers use the unchanged public API.
> This pair mirrors the register_netdevice() / register_netdev()
> split convention already established in the core networking stack.
> rtnl must span the full registration sequence through device_add(),
> not just phy_try_attach_pse(): a PSE_REGISTERED event firing between
> a narrow attach lock and device_add() would walk mdio_bus_type, find
> the phy not yet on the bus, and leave it permanently unattached.
> 
> With rtnl held across the full registration sequence:
> 
>   - At phy_device_register_locked(), phy_try_attach_pse() attempts
>     an of_pse_control_get() for phys whose DT pses phandle resolves
>     now. If the controller is already registered, psec is attached
>     before device_add() makes the phy visible on mdio_bus_type.
>     If the controller is not yet registered, the swallow-error path
>     leaves psec NULL and relies on the subsequent notifier event.
> 
>   - On PSE_REGISTERED: an rtnl-guarded bus walk retries the attach
>     for every registered phy whose psec is still NULL. This is the
>     "phy was enumerated before the PSE controller loaded" case,
>     which is the root cause of the boot-time probe-retry storm on
>     systems with a modular PSE controller driver. Because the
>     pse_controller_notifier is fired synchronously, a concurrent
>     pse_controller_register() either (a) completes list_add and
>     releases pse_list_mutex before this function takes rtnl, in
>     which case phy_try_attach_pse() finds the controller in the
>     list and attaches; or (b) fires its notifier during this
>     function, in which case the callback blocks on rtnl until this
>     function returns, then walks the bus and finds the phy fully
>     registered (attaching if psec is still NULL).
> 
>   - On PSE_UNREGISTERED: an rtnl-guarded bus walk releases every
>     phydev->psec that targets the departing controller before
>     pse_release_pis() frees pcdev->pi. Without this, a phy still
>     holding a pse_control reference would cause a use-after-free
>     in __pse_control_release's pcdev->pi[psec->id] access, and the
>     PSE driver module could not finish unloading while any phy
>     still held a reference via module_put().
> 
> Introduce phy_try_attach_pse() as the rtnl-guarded helper used by
> both the register path and the notifier walk. Holding rtnl across
> of_pse_control_get() is safe because pse_list_mutex is never held
> in the opposite order.
> 
> Expose pse_control_matches_pcdev() as a predicate so subscribers
> can identify which of their held pse_control references target a
> given controller, without leaking the struct pse_controller_dev *
> out of pse_control opacity.
> 
> Move the final pse_control_put() of phydev->psec from
> phy_device_remove() to phy_device_release(). The kobject release
> callback runs only after every reference on the device has been
> dropped, including the bus iterator references taken by
> bus_for_each_dev() in the notifier walk, which means by the time
> release fires no concurrent reader or writer of phydev->psec can
> exist. The mdio_bus_type klist is set up in bus_register() with
> klist_devices_get() / klist_devices_put() (drivers/base/bus.c),
> which bracket each iteration step with get_device() / put_device()
> on the underlying struct device; that reference defers the release
> callback from firing until the walk has advanced past this phy.
> Keeping phy_device_remove() unchanged avoids introducing a new
> locking contract on its many callers (sfp, fixed_phy, xgbe, hns,
> netsec, bcm_sf2, mdiobus_unregister).
> 
> Finally, delete fwnode_find_pse_control() and its call site in
> fwnode_mdiobus_register_phy(), and drop the PSE header from
> fwnode_mdio.c. This removes the probe-time -EPROBE_DEFER coupling
> between mdio and pse-pd that caused the boot-hang regression on
> systems with a modular PSE controller driver and a DT phy with a
> pses phandle: the MDIO/DSA probe no longer sees any PSE-originated
> -EPROBE_DEFER, so the probe-retry storm is gone. fwnode_mdio is
> now PSE-agnostic.
> 
> Fixes: fa2f0454174c ("net: pse-pd: Introduce attached_phydev to pse control")

This patch depends on the other patches (2 and 3) which do not have any fixes
tag. So either all of them have the fixes tag or none of them have it.

Except from this, your design seems a great idea!
The only thing that puzzle me is to have the RTNLOCK hold in the
phy_device_register function. This is an important change but I am not a netdev
sufficient expert to know if that could break things.
Jakub, Russell what do you think?

Regards,
-- 
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

  reply	other threads:[~2026-04-24 12:36 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-23  7:42 [PATCH RFC net-next 0/4] net: pse-pd: decouple controller lookup from MDIO probe Corey Leavitt via B4 Relay
2026-04-23  7:42 ` [PATCH RFC net-next 1/4] net: pse-pd: scope pse_control regulator handle to kref lifetime Corey Leavitt via B4 Relay
2026-04-24  8:36   ` Kory Maincent
2026-04-23  7:42 ` [PATCH RFC net-next 2/4] net: pse-pd: add notifier chain for controller lifecycle events Corey Leavitt via B4 Relay
2026-04-23  7:42 ` [PATCH RFC net-next 3/4] net: pse-pd: fire lifecycle events on controller register/unregister Corey Leavitt via B4 Relay
2026-04-23  7:42 ` [PATCH RFC net-next 4/4] net: phy: own phydev->psec via PSE notifier and remove fwnode_mdio hook Corey Leavitt via B4 Relay
2026-04-24 12:36   ` Kory Maincent [this message]
2026-04-23  9:05 ` [PATCH RFC net-next 0/4] net: pse-pd: decouple controller lookup from MDIO probe Kory Maincent
2026-04-23  9:48   ` Corey Leavitt
2026-04-24 12:41     ` Kory Maincent
2026-04-26 21:25       ` Carlo Szelinsky
2026-05-02 20:10       ` Carlo Szelinsky
  -- strict thread matches above, loose matches on Subject: below --
2026-04-23  7:22 Corey Leavitt
2026-04-23  7:23 ` [PATCH RFC net-next 4/4] net: phy: own phydev->psec via PSE notifier and remove fwnode_mdio hook Corey Leavitt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260424143616.54d747a8@kmaincent-XPS-13-7390 \
    --to=kory.maincent@bootlin.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=andrew@lunn.ch \
    --cc=corey@leavitt.info \
    --cc=davem@davemloft.net \
    --cc=devnull+corey.leavitt.info@kernel.org \
    --cc=edumazet@google.com \
    --cc=hkallweit1@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=netdev@vger.kernel.org \
    --cc=o.rempel@pengutronix.de \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox