Disappearance of network PHYs

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* Disappearance of network PHYs
@ 2026-03-11 15:34 Vladimir Oltean
  2026-03-11 15:55 ` Maxime Chevallier
  0 siblings, 1 reply; 5+ messages in thread
From: Vladimir Oltean @ 2026-03-11 15:34 UTC (permalink / raw)
  To: Heiner Kallweit, Andrew Lunn, Russell King
  Cc: Wei Fang, Maxime Chevallier, netdev

Hi,

This is a follow-up from this thread:
https://lore.kernel.org/netdev/PAXPR04MB8510CB77C95D4D044FE62C64889BA@PAXPR04MB8510.eurprd04.prod.outlook.com/

I picked up from where Russell, Maxime and Wei left the discussion more
than 1 month ago, and quickly prototyped patches that would mechanically
remove the kernel crashes caused by invalid phydev dereferences from the
MAC driver of phylib/phylink, when the phydev goes away.

(far from perfect, may still have deadlocks or lockdep splats) PoC patches at:
https://github.com/vladimiroltean/linux/tree/phy-remove

I ran out of steam because I'm not really sure what we want given what's
possible, and I don't want the effort/discussion to completely die away,
so I'm asking the PHY maintainers and other interested people for advice,
while explaining what I found to be (not) possible.

Mechanically, what happens now in my branch, with a disappeared phydev but
there's nothing that can crash the kernel, is that the netdev carrier state is
in holdover mode. Meaning: if the link was up before, it still is up; if it was
down, it still is down. Furthermore, traffic still works if it worked
before. This is because the PHY is not an active component to the data
path (I am excluding things such as PHY timestamping), and I've patched the
state machine to preserve the last state and do no further work.

However, if I stop and start again a disappeared phydev, phy_start()
leaves it in the PHY_DOWN state, to avoid some WARN_ON()s. This is
admittedly inconsistent.

This serves as an illustration of the most complicated part of surviving
the loss of one of your providers - what to do afterwards? The MAC driver
may have done stateful stuff with the PHY prior to it going away.
The netdev->phydev pointer persists, but even if the phydev later comes
back - it's no longer the same phydev and those operations need to be
repeated. But how to repeat those operations, when
(1) no one kept track of them
(2) the netdev->phydev that the MAC is holding on to is not the same as
    the new phydev that gets created on rebind

So even if the netdev->phydev pointer lingers on, it is effectively junk and in
everyone's best interest that the MAC driver gets rid of it ASAP. And then do
what?

There are 2 distinct cases to think about:
1. MAC driver connects to the PHY at ndo_open() and disconnects at ndo_stop().
   I can see something like a forced admin down from the kernel (somehow).
2. MAC driver connects to the PHY at probe() and disconnects at remove().
   I don't see how these can survive the loss of netdev->phydev in a meaningful
   way (meaning: have a way to recover when it comes back).

Actually DSA belongs to case 2, which complicates the discussion, since it is
one of the reasons we don't consider device links as good.

However, I must point this out. Device links provide a very reasonable and clean
answer to what should the MAC driver do when its PHY goes away. Unbinding the
MAC makes sure that none of its internal assumptions about PHY state will be
violated when the PHY later binds back, and it doesn't require complex tracking
either. It also scales to multiple (and different kinds of) providers, which
can also go away, in much the same way.

Sure, I don't like the side effects of that answer when applied to DSA either,
but maybe that's something we can work on, while not fully rejecting it.

Some ideas, mostly listed as conversation starters:
- Modify DSA to connect to the PHY at ndo_open() time.
- Modify DSA to register a separate struct device (with generic DSA port driver)
  for each port. Link the net_device parent device with this port device.
  The PHY device link unbinds only the port device, which can be later rebound
  via sysfs. Solution gets repeated for whatever other switchdev/multi-port
  NIC driver is written that uses external providers. We modify
  Documentation/networking/switchdev.rst to make driver authors aware of
  the problem.
- Create an optional notifier chain that the PHY is going away, which the MAC
  monitors and informs phylib that it does. Drivers that don't inform phylib
  get unbound via the device link mechanism. Those who monitor the notifier
  don't get unbound.
- A combination of the above

Any other thoughts welcome.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Disappearance of network PHYs
  2026-03-11 15:34 Disappearance of network PHYs Vladimir Oltean
@ 2026-03-11 15:55 ` Maxime Chevallier
  2026-03-11 18:17   ` Andrew Lunn
  2026-03-11 21:18   ` Vladimir Oltean
  0 siblings, 2 replies; 5+ messages in thread
From: Maxime Chevallier @ 2026-03-11 15:55 UTC (permalink / raw)
  To: Vladimir Oltean, Heiner Kallweit, Andrew Lunn, Russell King
  Cc: Wei Fang, netdev

Hi Vladimir,

On 11/03/2026 16:34, Vladimir Oltean wrote:
> Hi,
> 
> This is a follow-up from this thread:
> https://lore.kernel.org/netdev/PAXPR04MB8510CB77C95D4D044FE62C64889BA@PAXPR04MB8510.eurprd04.prod.outlook.com/
> 
> I picked up from where Russell, Maxime and Wei left the discussion more
> than 1 month ago, and quickly prototyped patches that would mechanically
> remove the kernel crashes caused by invalid phydev dereferences from the
> MAC driver of phylib/phylink, when the phydev goes away.
> 
> (far from perfect, may still have deadlocks or lockdep splats) PoC patches at:
> https://github.com/vladimiroltean/linux/tree/phy-remove
> 
> I ran out of steam because I'm not really sure what we want given what's
> possible, and I don't want the effort/discussion to completely die away,
> so I'm asking the PHY maintainers and other interested people for advice,
> while explaining what I found to be (not) possible.

Thanks for keeping this discussion alive :)

> 
> Mechanically, what happens now in my branch, with a disappeared phydev but
> there's nothing that can crash the kernel, is that the netdev carrier state is
> in holdover mode. Meaning: if the link was up before, it still is up; if it was
> down, it still is down. Furthermore, traffic still works if it worked
> before. This is because the PHY is not an active component to the data
> path (I am excluding things such as PHY timestamping), and I've patched the
> state machine to preserve the last state and do no further work.
> 
> However, if I stop and start again a disappeared phydev, phy_start()
> leaves it in the PHY_DOWN state, to avoid some WARN_ON()s. This is
> admittedly inconsistent.
> 
> This serves as an illustration of the most complicated part of surviving
> the loss of one of your providers - what to do afterwards? The MAC driver
> may have done stateful stuff with the PHY prior to it going away.
> The netdev->phydev pointer persists, but even if the phydev later comes
> back - it's no longer the same phydev and those operations need to be
> repeated. But how to repeat those operations, when
> (1) no one kept track of them
> (2) the netdev->phydev that the MAC is holding on to is not the same as
>     the new phydev that gets created on rebind
> 
> So even if the netdev->phydev pointer lingers on, it is effectively junk and in
> everyone's best interest that the MAC driver gets rid of it ASAP. And then do
> what?
> 
> There are 2 distinct cases to think about:
> 1. MAC driver connects to the PHY at ndo_open() and disconnects at ndo_stop().
>    I can see something like a forced admin down from the kernel (somehow).
> 2. MAC driver connects to the PHY at probe() and disconnects at remove().
>    I don't see how these can survive the loss of netdev->phydev in a meaningful
>    way (meaning: have a way to recover when it comes back).
> 
> Actually DSA belongs to case 2, which complicates the discussion, since it is
> one of the reasons we don't consider device links as good.
> 
> However, I must point this out. Device links provide a very reasonable and clean
> answer to what should the MAC driver do when its PHY goes away. Unbinding the
> MAC makes sure that none of its internal assumptions about PHY state will be
> violated when the PHY later binds back, and it doesn't require complex tracking
> either. It also scales to multiple (and different kinds of) providers, which
> can also go away, in much the same way.
> 
> Sure, I don't like the side effects of that answer when applied to DSA either,
> but maybe that's something we can work on, while not fully rejecting it.
> 
> Some ideas, mostly listed as conversation starters:
> - Modify DSA to connect to the PHY at ndo_open() time.
> - Modify DSA to register a separate struct device (with generic DSA port driver)
>   for each port. Link the net_device parent device with this port device.
>   The PHY device link unbinds only the port device, which can be later rebound
>   via sysfs. Solution gets repeated for whatever other switchdev/multi-port
>   NIC driver is written that uses external providers. We modify
>   Documentation/networking/switchdev.rst to make driver authors aware of
>   the problem.
> - Create an optional notifier chain that the PHY is going away, which the MAC
>   monitors and informs phylib that it does. Drivers that don't inform phylib
>   get unbound via the device link mechanism. Those who monitor the notifier
>   don't get unbound.
> - A combination of the above

One other thing could be to rely on phylink to handle that ? It's the
one part of the net stack that already handles PHY devices suddenly
disappearing, with the SFP case. It has all the logic in place to
maintain the netdev's phydev pointer without the MAC driver having to
deal with that.

And for MAC drivers that don't use phylink, we could consider falling
back to the fw_devlink approach that was proposed, i.e. big hammer
solution that has nasty side-effects but that doesn't crash the kernel
ang gives user a chance to recover, provided the side-effects in
question didn't kick them out of their HW ?

Maxime


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Disappearance of network PHYs
  2026-03-11 15:55 ` Maxime Chevallier
@ 2026-03-11 18:17   ` Andrew Lunn
  2026-03-11 20:34     ` Vladimir Oltean
  2026-03-11 21:18   ` Vladimir Oltean
  1 sibling, 1 reply; 5+ messages in thread
From: Andrew Lunn @ 2026-03-11 18:17 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: Vladimir Oltean, Heiner Kallweit, Russell King, Wei Fang, netdev

> One other thing could be to rely on phylink to handle that ? It's the
> one part of the net stack that already handles PHY devices suddenly
> disappearing, with the SFP case. It has all the logic in place to
> maintain the netdev's phydev pointer without the MAC driver having to
> deal with that.

We might want to consider the different use cases.

1) Hot pluggable hardware, and it disappears. As you say, phylink can
handle this.

2) Root unbinds the device, while it is in use, for hardware which is
only cold pluggable? Cannot we just consider this a foot gun?
Exploding is O.K?

Is there a good use case for root unbinding the PHY driver? We should
only add complexity if there is a good use case.

   Andrew

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Disappearance of network PHYs
  2026-03-11 18:17   ` Andrew Lunn
@ 2026-03-11 20:34     ` Vladimir Oltean
  0 siblings, 0 replies; 5+ messages in thread
From: Vladimir Oltean @ 2026-03-11 20:34 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Maxime Chevallier, Heiner Kallweit, Russell King, Wei Fang,
	netdev

On Wed, Mar 11, 2026 at 07:17:18PM +0100, Andrew Lunn wrote:
> > One other thing could be to rely on phylink to handle that ? It's the
> > one part of the net stack that already handles PHY devices suddenly
> > disappearing, with the SFP case. It has all the logic in place to
> > maintain the netdev's phydev pointer without the MAC driver having to
> > deal with that.
> 
> We might want to consider the different use cases.
> 
> 1) Hot pluggable hardware, and it disappears. As you say, phylink can
> handle this.
> 
> 2) Root unbinds the device, while it is in use, for hardware which is
> only cold pluggable? Cannot we just consider this a foot gun?
> Exploding is O.K?
> 
> Is there a good use case for root unbinding the PHY driver? We should
> only add complexity if there is a good use case.
> 
>    Andrew

I do have a customer specifically targetting equipment for explosive
zones (EX-A, EX-P), which has components powered from independent power
domains that can be administratively powered down and taken out for
servicing, and then plugged in.

The requirements that were formulated to me are:
- it is known ahead of time, typically by a few seconds in advance, that
  a power domain will be turned off.
- the rest of the system must continue to run with no interruption
- the system must pick up the components after being powered back up

Their system is a ZynqMP with 3 SJA1105 switches forming a single DSA
tree.
i.  2 of the 3 switches are in EX areas, and the main switch (in the main
    power domain) must continue to run when the EX area is powered down
ii. one on-board PHY of the main switch, with an M12 connector, is also
    part of an EX area and can be powered down independently

There are isolators on the *GMII and SPI lines, pull ups for MDIO, so
electrically the boards can handle the missing components, but software
is a bit limited, and I had to figure out what is missing.

First off, the customer is OK with unbinding the driver from a device
when the power is about to go down. Linux mostly* makes no access to the
registers of unbound devices, so you can do anything to them in that state.
It also cleanly fails to probe if you try that and there's no power
applied (because probing typically accesses registers, which would fail),
operation which you can retry later with well defined semantics.
So, binding/unbinding was chosen as the simplest tool that gets the job
done.

The DSA assumptions preventing use case i. from working are that a tree
is only set up when complete (all switches present) and torn down when the
first switch goes missing. There is no "degraded" state - I had to add it,
and rewrite DSA tree probing such that each switch contributes independently
with just its bits, rather than the tree doing it all centrally.

For use case ii, the phylib code is actually mostly fine already. You
can unbind a PHY driver as long as the MAC hasn't connected to it, and
nothing bad will happen, the PHY state machine doesn't run, ethtool
operations don't pass through to the PHY, etc. The problem is again in
DSA, which connects to the PHY at probe time and disconnects at remove.
I've moved these operations to ndo_open() and ndo_stop(), and as long as
I run "ip link set swp0 down" prior to unbinding the PHY driver, things
are again fine.

*actually the EX area may be unpowered even at boot time, and in that
case, being that MDIO is an enumerated bus, the kernel will fail to
create a device for the PHY in the EX area. That is completely distinct
from having a device with no driver.

We solved this by disabling PHY enumeration by using an
"ethernet-phy-idAAAA.BBBB" style compatible string.  This tells the
kernel the PHY is always present, so it will create a device for it
without trying to access any PHY ID register. When we apply power and
then run phy_connect(), it works.

My involvement with Wei's device link patch is actually just that I want
to proactively protect this use case from breaking (and to upstream it).
Of course a device link between the PHY in the EX area and the main
switch would throw a wrench in the whole use case ii. This makes it a
bit fragile, given the fact that, as explained, device links seem pretty
common sense for single-port NICs. So it is in my best interest that we
have a clear model where unbinding the driver from a PHY doesn't "explode"
(no reference to EX areas intended), even in uncontrolled situations
where the netdev isn't carefully disconnected from the PHY first.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Disappearance of network PHYs
  2026-03-11 15:55 ` Maxime Chevallier
  2026-03-11 18:17   ` Andrew Lunn
@ 2026-03-11 21:18   ` Vladimir Oltean
  1 sibling, 0 replies; 5+ messages in thread
From: Vladimir Oltean @ 2026-03-11 21:18 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: Heiner Kallweit, Andrew Lunn, Russell King, Wei Fang, netdev

Hi Maxime,

On Wed, Mar 11, 2026 at 04:55:48PM +0100, Maxime Chevallier wrote:
> One other thing could be to rely on phylink to handle that ? It's the
> one part of the net stack that already handles PHY devices suddenly
> disappearing, with the SFP case. It has all the logic in place to
> maintain the netdev's phydev pointer without the MAC driver having to
> deal with that.
> 
> And for MAC drivers that don't use phylink, we could consider falling
> back to the fw_devlink approach that was proposed, i.e. big hammer
> solution that has nasty side-effects but that doesn't crash the kernel
> ang gives user a chance to recover, provided the side-effects in
> question didn't kick them out of their HW ?

Phylink calls phylink_bringup_phy() from 3 places:
- phylink_connect_phy() or phylink_fwnode_phy_connect(), if the PHY is
  on board
- phylink_sfp_config_phy() if the PHY is on an SFP bus

When an SFP module is removed, the SFP state machine detects this and
calls sfp_remove_phy(), which triggers phylink_sfp_disconnect_phy() ->
phylink_disconnect_phy(). Then it calls phy_device_remove(sfp->mod_phy),
phy_device_free(sfp->mod_phy).

The on-board case is a bit different. The difference is that phylink has
no trigger to do phylink_bringup_phy() / phylink_disconnect_phy() as it
has for the SFP bus. The MAC driver is the trigger, as phylink is currently
written.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-11 21:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-11 15:34 Disappearance of network PHYs Vladimir Oltean
2026-03-11 15:55 ` Maxime Chevallier
2026-03-11 18:17   ` Andrew Lunn
2026-03-11 20:34     ` Vladimir Oltean
2026-03-11 21:18   ` Vladimir Oltean

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox